WO2024229018A2 - Système crispr-cas12f compact modifié - Google Patents

Système crispr-cas12f compact modifié Download PDF

Info

Publication number
WO2024229018A2
WO2024229018A2 PCT/US2024/027039 US2024027039W WO2024229018A2 WO 2024229018 A2 WO2024229018 A2 WO 2024229018A2 US 2024027039 W US2024027039 W US 2024027039W WO 2024229018 A2 WO2024229018 A2 WO 2024229018A2
Authority
WO
WIPO (PCT)
Prior art keywords
ascas12f
engineered
protein
sgrna
aspects
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2024/027039
Other languages
English (en)
Other versions
WO2024229018A3 (fr
Inventor
Weixin Tang
Siyuan ZOU
Tong Wu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Chicago
Original Assignee
University of Chicago
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Chicago filed Critical University of Chicago
Publication of WO2024229018A2 publication Critical patent/WO2024229018A2/fr
Publication of WO2024229018A3 publication Critical patent/WO2024229018A3/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]

Definitions

  • CRISPR-Cas clustered regularly interspaced short palindromic repeats and CRISPR-associated proteins
  • Adeno-associated viruses are the leading candidates for in vivo delivery of gene- editing agents, owing to their long application history in the clinic, general lack of pathogenicity and immunogenicity, and programmable tissue tropism.
  • AAV vectors have a maximum packaging capacity of approximately 4.7 kb, a capacity insufficient to accommodate SpCas9 (1,368 amino acids) or AsCas12a (1,307 amino acids) and their essential auxiliary components.
  • Compact CRISPR-Cas systems offer versatile treatment options for genetic disorders, in part due to their reduced size making them amenable for in vivo viral vector based delivery, however, their application is often limited by relatively modest to weak gene-editing activity. [0005] There is a need in the art for new compact CRISPR-Cas systems with robust editing activity levels, suitable for in vivo vector mediated use.
  • engineered compact Cas12f enzymes such as engineered AsCas12f enzymes, these are engineered RNA-guided polynucleotide (e.g., DNA) endonucleases.
  • engineered compact Cas12f enzymes have approximately 11.3-fold more potent editing activity relative to the parent enzyme, e.g., AsCas12f.
  • engineered compact Cas12f enzymes display higher DNA cleavage activity than wild- type AsCas12f in vitro and/or in vivo. Furthermore, in some aspects, engineered compact Cas12f enzymes are about a third of the size of the widely used SpCas9 protein. In some aspects, enzymes disclosed herein, such as engineered AsCas12f enzymes, function broadly in mammalian cells (e.g., human cells), and can deliver high levels of insertions and deletions (e.g., approximately 70%) at user-specified genomic loci.
  • sgRNA structure-guided single guide RNA
  • sgRNA engineering provides improved molecules, such as sgRNA-v2, a compact guide RNA that is about 33% shorter than the full-length wild type sgRNA.
  • engineered sgRNA molecules have on par activity and/or increased activity when compared to full-length wild type sgRNA.
  • engineered compact CRISPR/Cas systems e.g., engineered AsCas12f systems
  • engineered compact sgRNA enable robust and faithful gene editing in mammalian cells.
  • technologies described herein include but are not limited to nucleic acids, polynucleotides, peptides, proteins, enzymes, protein-oligonucleotide complexes (including enzyme-oligonucleotide complexes), vectors, compositions, methods of use, methods of manufacturing, and/or kits.
  • kits which may be in a suitable container, that can be used to achieve the described methods.
  • a kit may include one or more buffers, such as buffers for nucleic acids or for reactions involving nucleic acids.
  • Other enzymes may be included in kits in addition to an engineered Cas12f enzymes. 201118018.1 - 2 - [0011]
  • engineered Cas12f polypeptide comprising, comprising at least, or comprising at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more amino acid substitutions relative to a wild-type (WT) Cas12f polypeptide, wherein the substitution increases affinity of the engineered Cas12f protein to nucleic acids relative to WT Cas12f.
  • the substitutions comprise introduction of one or more basic residues at a nucleic acid interfacing site.
  • the Cas12f polypeptide is AsCas12f
  • AsCas12f comprises, comprises at least, or comprises at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 amino acid substitutions are relative to wild-type (WT) AsCas12f polypeptide represented by SEQ ID NO: 1, and the substitution(s) increases affinity of the engineered AsCas12f protein to nucleic acids relative to WT AsCas12f.
  • the one or more or 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 amino acid substitutions comprise a substitution at amino acid 196, 199, 276, 281, 327, 328, 364, or any combination thereof.
  • the substitutions is or is not D196K, N199K, G276R, D281K, T327K, N328G, D364K, D364R, or any combination thereof, relative to WT AsCas12f.
  • the substitution is or is not D196K relative to WT AsCas12f.
  • the substitution is or is not N199K relative to WT AsCas12f.
  • the substitution is or is not G276R relative to WT AsCas12f. In some aspects, the substitution is or is not D281K relative to WT AsCas12f. In some aspects, the substitution is or is not T327K relative to WT AsCas12f. In some aspects, the substitution is or is not N328G relative to WT AsCas12f. In some aspects, the substitution is or is not D364K relative to WT AsCas12f. In some aspects, the substitution is or is not D364R relative to WT AsCas12f. In some aspects, the substitutions are or are not D196K, N199K, and N328G relative to WT AsCas12f.
  • the Cas12f polypeptide is AsCas12f
  • AsCas12f comprises, comprises at least, or comprises at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 amino acid substitutions relative to wild-type (WT) AsCas12f polypeptide represented by SEQ ID NO: 1
  • the engineered Cas12f protein comprises a primary amino acid sequence with at least, at most or exactly 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to SEQ ID NOs: 12, 14, or 16 (or any range derivable therein).
  • engineered AsCas12f protein comprising primary amino acid sequence with at least, at most or exactly 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 12 (or any range derivable 201118018.1 - 3 - therein).
  • the substitutions are or are not D196K, N199K, N328G, and D364R relative to WT AsCas12f.
  • engineered AsCas12f protein comprising primary amino acid sequence with at least, at most, or exactly 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to SEQ ID NO: 14 (or any range derivable therein).
  • the substitutions are or are not D196K, N199K, G276R, N328G, and D364R relative to WT AsCas12f.
  • engineered AsCas12f protein comprising primary amino acid sequence with at least, at most, or exactly 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to enhanced AsCas12f (enCas12f), represented by SEQ ID NO: 16 (or any range derivable therein).
  • engineered AsCas12f proteins that have increased gene-editing efficacy relative to WT AsCas12f protein.
  • the engineered AsCas12f protein has greater than or equal to about a 4.5 fold improvement in gene- editing efficacy relative to WT AsCas12f protein. In some aspects, the engineered AsCas12f protein has greater than or equal to about an 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or greater than 15 fold improvement in gene-editing efficacy relative to WT AsCas12f protein. In some aspects, the engineered AsCas12f protein has a reduced level of sequence context dependence when compared to CasMINI and/or UnCas12f. In some aspects, the engineered AsCas12f protein does not comprise a substitution at E44, D51, and/or Y52 relative to WT AsCas12f protein.
  • the engineered AsCas12f protein does not comprise substitutions E44A, D51A, and/or Y52A relative to WT AsCas12f protein. In some aspects, the engineered AsCas12f protein does not comprise a substitution at W17, H72, Y76, S92, R101, R121, R298, Y343, and/or Y351 relative to WT AsCas12f protein. In some aspects, the engineered AsCas12f protein does not comprise substitutions W17A, H72A, Y76A, S92A, R101A, R121A, R298A, Y343A, and/or Y351A relative to WT AsCas12f protein.
  • the engineered AsCas12f protein includes or excludes a D225A substation relative to WT AsCas12f protein. In some aspects, the engineered AsCas12f protein is or is not fused to one or more heterologous proteins, domains, and/or polypeptides. In some aspects, the engineered AsCas12f protein includes or excludes a fused heterologous protein, domain, and/or polypeptide provides transcriptional activation activity. In some aspects, the fused heterologous protein, domain, and/or polypeptide transcriptional inhibition activity. In 201118018.1 - 4 - some aspects, the fused heterologous protein, domain, and/or polypeptide base editing activity.
  • the fused heterologous protein, domain, and/or polypeptide provides endonuclease activity.
  • methods of making engineered Cas12f proteins comprising expression of a polynucleotide encoding the engineered AsCas12f protein in a host cell, and purification of the engineered Cas12f protein.
  • oligonucleotides encoding an engineered AsCas12f structure-guided single guide RNA (sgRNA) comprising an oligonucleotide sequence that is truncated relative to WT AsCas12f sgRNA as represented by SEQ ID NOs: 17 or 18.
  • the spacer-distal region of stem 5 of the engineered AsCas12f sgRNA is truncated and/or removed (or absent) relative to WT AsCas12f sgRNA.
  • nucleotides U(-47) to U(-15) of the engineered AsCas12f sgRNA are removed relative to WT AsCas12f sgRNA.
  • the spacer-proximal region of stem 5 of the engineered AsCas12f sgRNA is not truncated by greater than or equal to 3 nucleotides relative to WT AsCas12f sgRNA. In some aspects of the provided oligonucleotide, stem 5 of the engineered AsCas12f sgRNA is removed relative to WT AsCas12f sgRNA.
  • the engineered AsCas12f sgRNA comprises an oligonucleotide sequence with at least, at most, or exactly 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity SEQ ID NOs: 19 or 20 (or any range derivable therein).
  • stem 2 (SEQ ID NO: 22) of the engineered AsCas12f sgRNA is not mutated relative to WT AsCas12f sgRNA.
  • stem 3 (SEQ ID NO: 23) and/or stem 4 (SEQ ID NO: 24) of the engineered AsCas12f sgRNA are truncated and/or removed (or absent) relative to WT AsCas12f sgRNA.
  • the oligonucleotide sequence is less than 135, 134, 133, 132, 131, 130, 129, 128, 127, 126, 125, 124, 123, 122, 121, or 120 nucleotides (or any range derivable therein) in length. In some aspects, the oligonucleotide sequence is 122 nucleotides.
  • the oligonucleotide sequence encodes an engineered AsCas12f sgRNA that is at least, at most, or exactly 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% identical to sgRNA-v2 (SEQ ID NO: 19 or 20) (or any range derivable therein).
  • the oligonucleotide sequence encodes an engineered AsCas12f sgRNA that is sgRNA-v2 (SEQ ID NO: 19 or 20).
  • oligonucleotides provided herein are comprised within a polynucleotide that comprises a U6 promoter operably coupled to the oligonucleotide sequence encoding the engineered AsCas12f sgRNA, and wherein the engineered AsCas12f sgRNA has increased expression levels relative to full-length WT AsCas12f sgRNA (SEQ ID NO: 17 or 18) operably coupled to a U6 promoter.
  • the increased expression is about a fourfold increase.
  • the oligonucleotide sequence encodes an engineered AsCas12f sgRNA that, when complexed with an AsCas12f protein, has comparable and/or higher indel level generation levels when compared to full-length WT AsCas12f sgRNA (SEQ ID NO: 17 or 18) complexed with an AsCas12f protein.
  • SEQ ID NO: 17 or 18 full-length WT AsCas12f sgRNA
  • polynucleotides encoding proteins, polypeptides, and/or oligonucleotides described herein.
  • provided herein are polynucleotides comprising a coding sequence encoding an engineered Cas12f protein as described herein.
  • polynucleotides comprising an oligonucleotide sequence encoding an engineered AsCas12f sgRNA as described herein.
  • a polynucleotide encodes an engineered AsCas12f protein, wherein the engineered AsCas12f includes or excludes substitutions D196K, N199K, G276R, D281K, T327K, N328G, D364K, D364R, or any combination thereof, relative to WT AsCas12f (SEQ ID NO: 1).
  • a polynucleotide comprises an oligonucleotide sequence encoding an engineered AsCas12f sgRNA that is at least, at most, or exactly 90% identical to sgRNA- v2 (SEQ ID NOs: 19 or 20).
  • a polynucleotide comprises an oligonucleotide sequence encoding an engineered AsCas12f sgRNA that is at least, at most, or exactly 90% identical to sgRNA-v2 (SEQ ID NOs: 19 or 20), and a sequence encoding an engineered AsCas12f protein, wherein the engineered AsCas12f includes or excludes substitutions D196K, N199K, G276R, D281K, T327K, N328G, D364K, D364R, or any combination thereof, relative to WT AsCas12f (SEQ ID NO: 1).
  • vectors and/or constructs comprising polynucleotides described herein.
  • viral vectors comprising a polynucleotide encoding an engineered AsCas12f and/or an engineered AsCas12f sgRNA.
  • a viral vector comprises a polynucleotide comprising an oligonucleotide sequence encoding an engineered AsCas12f sgRNA that is at least 90% identical to sgRNA- v2 (SEQ ID NO: 19 or 20), and a sequence encoding an engineered AsCas12f protein, wherein the engineered AsCas12f comprises substitutions D196K, N199K, G276R, D281K, T327K, N328G, D364K, D364R, or any combination thereof, relative to WT AsCas12f (SEQ ID NO: 1).
  • the polynucleotide is under the transcriptional control of a promoter.
  • the promoter is or is not a U6 promoter and/or a CMV promoter.
  • the vector is a cosmid, plasmid, and/or viral construct.
  • the vector is a plasmid comprised in a liposome.
  • the vector is a viral construct.
  • the viral construct is or is not a lentivirus, retrovirus, adenovirus, or adeno-associated virus (AAV) viral construct.
  • the viral construct is an AAV construct.
  • compositions comprising polypeptides, polynucleotides, oligonucleotides, vectors, and/or viral constructs described herein.
  • cells comprising polypeptides, polynucleotides, oligonucleotides, vectors, and/or viral constructs described herein.
  • kits comprising polypeptides, polynucleotides, oligonucleotides, vectors, and/or viral constructs described herein.
  • a subject has, is expected to have, is at risk for, and/or is diagnosed with a disease.
  • the disease is cancer, an autoimmune disorder, an infection, and/or a genetic disorder.
  • crystallized AsCas12f-sgRNA-DNA complexes as represented in FIGs.3B-3C, FIGs.13A-13B, FIGs.14A-14C, FIG.15A, FIGs.
  • Aspect 1 is an engineered Cas12f polypeptide comprising, one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more) amino acid substitutions relative to a wild- type (WT) Cas12f polypeptide, wherein the substitution increases affinity of the engineered Cas12f protein to nucleic acids relative to WT Cas12f.
  • WT wild- type
  • Aspect 2 is the engineered Cas12f protein of aspect 1, wherein the substitutions comprise introduction of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more) basic residues at a nucleic acid interfacing site.
  • substitutions comprise introduction of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more) basic residues at a nucleic acid interfacing site.
  • Aspect 3 is the engineered Cas12f polypeptide of aspect 1 or 2, wherein the Cas12f polypeptide is AsCas12f, the one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more) amino acid substitutions are relative to wild-type (WT) AsCas12f polypeptide represented by SEQ ID NO: 1, and wherein the substitution increases affinity of the engineered AsCas12f protein to nucleic acids relative to WT AsCas12f.
  • WT wild-type
  • Aspect 4 is the engineered AsCas12f protein of aspect 3, wherein the one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more) amino acid substitutions comprise a substitution at amino acid 196, 199, 276, 281, 327, 328, 364, or any combination thereof.
  • Aspect 5 is the engineered AsCas12f protein of aspect 3 or 4, wherein the substitutions are D196K, N199K, G276R, D281K, T327K, N328G, D364K, D364R, or any combination thereof, relative to WT AsCas12f.
  • Aspect 6 is the engineered AsCas12f protein of any one of aspects 3 to 5, comprising substitution D196K relative to WT AsCas12f.
  • Aspect 7 is the engineered AsCas12f protein of any one of aspects 3 to 5, comprising substitution N199K relative to WT AsCas12f.
  • Aspect 8 is the engineered AsCas12f protein of any one of aspects 3 to 5, comprising substitution G276R relative to WT AsCas12f.
  • Aspect 9 is the engineered AsCas12f protein of any one of aspects 3 to 5, comprising substitution D281K relative to WT AsCas12f.
  • Aspect 10 is the engineered AsCas12f protein of any one of aspects 3 to 5, comprising substitution T327K relative to WT AsCas12f.
  • Aspect 11 is the engineered AsCas12f protein of any one of aspects 3 to 5, comprising substitution N328G relative to WT AsCas12f.
  • Aspect 12 is the engineered AsCas12f protein of any one of aspects 3 to 5, comprising substitution D364K relative to WT AsCas12f.
  • Aspect 13 is the engineered AsCas12f protein of any one of aspects 3 to 5, comprising substitution D364R relative to WT AsCas12f.
  • Aspect 14 is the engineered AsCas12f protein of any one of aspects 3 to 5, comprising substitutions D196K, N199K, and N328G relative to WT AsCas12f.
  • Aspect 15 is the engineered AsCas12f protein of any one of aspects 3 to 14, wherein the engineered AsCas12f protein comprises a primary amino acid sequence with at least 80% sequence identity to SEQ ID NO: 12.
  • Aspect 16 is the engineered AsCas12f protein of any one of aspects 3 to 5, comprising substitutions D196K, N199K, N328G, and D364R relative to WT AsCas12f.
  • Aspect 17 is the engineered AsCas12f protein of any one of aspects 3 to 16, wherein the engineered AsCas12f protein comprises a primary amino acid sequence with at least about 80% sequence identity to SEQ ID NO: 14. 201118018.1 - 8 - [0043]
  • Aspect 18 is the engineered AsCas12f protein of any one of aspects 3 to 5, comprising substitutions D196K, N199K, G276R, N328G, and D364R relative to WT AsCas12f.
  • Aspect 19 is the engineered AsCas12f protein of any one of aspects 3 to 18, wherein the engineered AsCas12f protein comprises a primary amino acid sequence with at least 80% sequence identity to enhanced AsCas12f (enCas12f), represented by SEQ ID NO: 16.
  • Aspect 20 is the engineered AsCas12f protein of any one of aspects 3 to 19, wherein the engineered AsCas12f protein has increased gene-editing efficacy relative to WT AsCas12f protein.
  • Aspect 21 is the engineered AsCas12f protein of any one of aspects 3 to 20, wherein the engineered AsCas12f protein has greater than or equal to about a 4.5 fold improvement in gene-editing efficacy relative to WT AsCas12f protein.
  • Aspect 22 is the engineered AsCas12f protein of any one of aspects 3 to 21, wherein the engineered AsCas12f protein has greater than or equal to about an 11 fold improvement in gene-editing efficacy relative to WT AsCas12f protein.
  • Aspect 23 is the engineered AsCas12f protein of any one of aspects 3 to 22, wherein the engineered AsCas12f protein has a reduced level of sequence context dependence when compared to CasMINI and/or UnCas12f.
  • Aspect 24 is the engineered AsCas12f protein of any one of aspects 3 to 23, wherein the engineered AsCas12f protein does not comprise a substitution at E44, D51, and/or Y52 relative to WT AsCas12f protein.
  • Aspect 25 is the engineered AsCas12f protein of aspect 24, wherein the engineered AsCas12f protein does not comprise substitutions E44A, D51A, and Y52A relative to WT AsCas12f protein.
  • Aspect 26 is the engineered AsCas12f protein of any one of aspects 3 to 25, wherein the engineered AsCas12f protein does not comprise a substitution at W17, H72, Y76, S92, R101, R121, R298, Y343, and/or Y351 relative to WT AsCas12f protein.
  • Aspect 27 is the engineered AsCas12f protein of aspect 26, wherein the engineered AsCas12f protein does not comprise substitutions W17A, H72A, Y76A, S92A, R101A, R121A, R298A, Y343A, and Y351A relative to WT AsCas12f protein.
  • Aspect 28 is the engineered AsCas12f protein of any one of aspects 3 to 27, wherein the engineered AsCas12f protein comprises a D225A substation relative to WT AsCas12f protein.
  • Aspect 29 is the engineered AsCas12f protein of any one of aspects 3 to 28, wherein the engineered AsCas12f protein is fused to one or more heterologous proteins, domains, and/or polypeptides.
  • Aspect 30 is the engineered AsCas12f protein of aspect 29, wherein the AsCas12f protein is fused to heterologous protein, domain, and/or polypeptide that provides transcriptional activation activity.
  • Aspect 31 is the engineered AsCas12f protein of aspect 29, wherein the AsCas12f protein is fused to heterologous protein, domain, and/or polypeptide that provides transcriptional inhibition activity.
  • Aspect 32 is the engineered AsCas12f protein of aspect 30, wherein the AsCas12f protein is fused to heterologous protein, domain, and/or polypeptide that provides base editing activity.
  • Aspect 33 is the method of making the engineered Cas12f protein of any one of aspects 3 to 32, the method comprising, expression of a polynucleotide encoding the engineered AsCas12f protein in a host cell, and purification of the engineered Cas12f protein.
  • Aspect 34 is an oligonucleotide encoding an engineered AsCas12f structure-guided single guide RNA (sgRNA) comprising, an oligonucleotide sequence that is truncated relative to WT AsCas12f sgRNA as represented by SEQ ID NOs: 17 or 18.
  • Aspect 35 is the oligonucleotide of aspect 34, wherein the spacer-distal region of stem 5 of the engineered AsCas12f sgRNA is truncated and/or removed relative to WT AsCas12f sgRNA.
  • Aspect 36 is the oligonucleotide of aspect 34, wherein nucleotides U(-47) to U(-15) of the engineered AsCas12f sgRNA are removed relative to WT AsCas12f sgRNA.
  • Aspect 37 is the oligonucleotide of aspect 34, wherein the spacer-proximal region of stem 5 of the engineered AsCas12f sgRNA is not truncated by greater than or equal to 3 nucleotides relative to WT AsCas12f sgRNA.
  • Aspect 38 is the oligonucleotide of aspect 34, wherein stem 5 of the engineered AsCas12f sgRNA is removed relative to WT AsCas12f sgRNA.
  • Aspect 39 is the oligonucleotide of aspect 38, wherein the engineered AsCas12f sgRNA comprises an oligonucleotide sequence with at least 80% sequence identity to SEQ ID NOs: 19 or 20.
  • Aspect 40 is the oligonucleotide of any one of aspects 34 to 39, wherein stem 2 (SEQ ID NO: 22) of the engineered AsCas12f sgRNA is not mutated relative to WT AsCas12f sgRNA. 201118018.1 - 10 - [0066]
  • Aspect 41 is the oligonucleotide of any one of aspects 34 to 40, wherein stem 3 (SEQ ID NO: 23) and/or stem 4 (SEQ ID NO: 24) of the engineered AsCas12f sgRNA are truncated and/or removed relative to WT AsCas12f sgRNA.
  • Aspect 42 is the oligonucleotide of any one of aspects 34 to 41, wherein the oligonucleotide sequence is or is less than 130 nucleotides.
  • Aspect 43 is the oligonucleotide of any one of aspects 34 to 42, wherein the oligonucleotide sequence is 122 nucleotides.
  • Aspect 44 is the oligonucleotide of any one of aspects 34 to 43, wherein the oligonucleotide sequence encodes an engineered AsCas12f sgRNA that is at least 80% identical to sgRNA-v2 (SEQ ID NO: 19 or 20).
  • Aspect 45 is the oligonucleotide of any one of aspects 34 to 44, wherein the oligonucleotide sequence encodes an engineered AsCas12f sgRNA that is at least 90% identical to sgRNA-v2 (SEQ ID NO: 19 or 20).
  • Aspect 46 is the oligonucleotide of any one of aspects 34 to 45, wherein the oligonucleotide sequence encodes an engineered AsCas12f sgRNA that is sgRNA-v2 (SEQ ID NO: 19 or 20).
  • Aspect 47 is the oligonucleotide of any one of aspects 34 to 46, wherein the oligonucleotide comprises a U6 promoter operably coupled to the oligonucleotide sequence encoding the engineered AsCas12f sgRNA, and wherein the engineered AsCas12f sgRNA has increased expression levels relative to full-length WT AsCas12f sgRNA (SEQ ID NO: 17 or 18) operably coupled to a U6 promoter.
  • Aspect 48 is the oligonucleotide of aspect 47, wherein the increased expression is about a fourfold increase.
  • Aspect 49 is the oligonucleotide of any one of aspects 34 to 48, wherein the oligonucleotide sequence encodes an engineered AsCas12f sgRNA that, when complexed with an AsCas12f protein, has comparable and/or higher indel level generation levels when compared to full-length WT AsCas12f sgRNA (SEQ ID NO: 17 or 18) complexed with an AsCas12f protein.
  • Aspect 49.1 is an engineered AsCas12f sgRNA encoded by the oligonucleotide according to any one of aspects 34-49.
  • Aspect 50 is a polynucleotide comprising a coding sequence encoding an engineered Cas12f protein according to any one of aspects 1 to 33.
  • Aspect 51 is a polynucleotide comprising an oligonucleotide sequence encoding an engineered AsCas12f sgRNA according to any one of aspects 34 to 49.
  • Aspect 52 is a polynucleotide encoding an engineered AsCas12f protein, wherein the engineered AsCas12f comprises substitutions D196K, N199K, G276R, D281K, T327K, N328G, D364K, D364R, or any combination thereof, relative to WT AsCas12f (SEQ ID NO: 1).
  • Aspect 53 is a polynucleotide comprising an oligonucleotide sequence encoding an engineered AsCas12f sgRNA that is at least 90% identical to sgRNA-v2 (SEQ ID NOs: 19 or 20).
  • Aspect 54 is a polynucleotide comprising an oligonucleotide sequence encoding an engineered AsCas12f sgRNA that is at least 90% identical to sgRNA-v2 (SEQ ID NOs: 19 or 20), and a sequence encoding an engineered AsCas12f protein, wherein the engineered AsCas12f comprises substitutions D196K, N199K, G276R, D281K, T327K, N328G, D364K, D364R, or any combination thereof, relative to WT AsCas12f (SEQ ID NO: 1).
  • Aspect 55 is a viral construct comprising a polynucleotide according to any one of aspects 50 to 54.
  • Aspect 56 is a vector comprising a polynucleotide encoding an engineered AsCas12f and/or an engineered AsCas12f sgRNA.
  • Aspect 57 is the vector of aspect 56, wherein the polynucleotide comprises an oligonucleotide sequence encoding an engineered AsCas12f sgRNA that is at least 90% identical to sgRNA-v2 (SEQ ID NO: 19 or 20), and a sequence encoding an engineered AsCas12f protein, wherein the engineered AsCas12f comprises substitutions D196K, N199K, G276R, D281K, T327K, N328G, D364K, D364R, or any combination thereof, relative to WT AsCas12f (SEQ ID NO: 1).
  • Aspect 58 is the vector of aspect 56 or 57, wherein the polynucleotide is under the transcriptional control of a promoter.
  • Aspect 59 is the vector of aspect 58, wherein the promoter is a U6 promoter and/or a CMV promoter.
  • Aspect 60 is the vector of any one of aspects 56 to 59, wherein the vector is a cosmid, plasmid, and/or viral construct.
  • Aspect 61 is the vector of aspect 60, wherein the vector is a plasmid comprised in a liposome.
  • Aspect 62 is the vector of aspect 60, wherein the vector is a viral construct.
  • Aspect 63 is the viral construct of aspect 62, wherein the viral construct is a lentivirus, retrovirus, adenovirus, or adeno-associated virus (AAV) viral construct. 201118018.1 - 12 -
  • Aspect 64 is the viral construct of aspect 63, wherein viral construct is an AAV construct.
  • Aspect 65 is a composition comprising the polypeptide, polynucleotide, oligonucleotide, vector, and/or viral construct of any one of aspects 1 to 64.
  • Aspect 66 is a cell comprising the polypeptide, polynucleotide, oligonucleotide, vector, and/or viral construct of any one of aspects 1 to 64.
  • Aspect 67 is a kit comprising the polypeptide, polynucleotide, oligonucleotide, vector, and/or viral construct of any one of aspects 1 to 64.
  • Aspect 68 is a method of treating a subject comprising administration of the polypeptide, polynucleotide, oligonucleotide, vector, viral construct, composition, and/or cell of any one of aspects 1 to 66.
  • Aspect 69 is the method of aspect 68, wherein the subject is has, is expected to have, and/or is diagnosed with a disease.
  • Aspect 70 is the method of aspect 69, wherein the disease is cancer, an autoimmune disorder, an infection, and/or a genetic disorder.
  • Aspect 71 is a method of editing a polynucleotide comprising, contacting the polynucleotide with the engineered polypeptide or oligonucleotide or sgRNA encoded therein according to any one of aspects 1-32, or 34-49.1.
  • Aspect 72 is the method of aspect 71, wherein the method comprises contacting the polynucleotide with the engineered polypeptide according to any one of aspects 1-32, and the method comprises contacting the polynucleotide with the engineered oligonucleotide or sgRNA encoded therein according to any one of aspects 34-49.1.
  • Aspect 73 is the method of aspect 71 or 72, wherein the editing comprises a single stranded or double stranded cleavage of the polynucleotide, or base-editing of the polynucleotide.
  • Aspect 74 is the method of any one of aspects 71-73, wherein the editing occurs in a cell.
  • Aspect 75 is the use of the polypeptide, polynucleotide, oligonucleotide, vector, viral construct, kit, composition, and/or cell of any one of aspects 1-67.
  • Aspect 76 is the use of aspect 75, wherein the use is medicinal and/or for biomedical research.
  • any one or more of the aspects disclosed herein may be specifically excluded from any one or more of another aspect disclosed herein. 201118018.1 - 13 - [0104]
  • the use of the word “a” or “an” when used in conjunction with the term “comprising” in the claims and/or the specification may mean “one,” but it is also consistent with the meaning of “one or more,” “at least one,” and “one or more than one.”
  • any aspect discussed herein can be implemented with respect to any method or composition of the invention, and vice versa.
  • compositions and kits of the invention can be used to achieve methods of the invention.
  • sequence as used herein in reference to a polynucleotide refers to the nucleotide sequence such as “A” for adenosine, “G” for guanine, “C” for cytosine, “T” for thymine, “U” for uracil, and “N” for “A”/“C”/“U”/“T”/“G”.
  • sequence refers to the nucleotide sequence such as “A” for adenosine, “G” for guanine, “C” for cytosine, “T” for thymine, “U” for uracil, and “N” for “A”/“C”/“U”/“T”/“G”.
  • the terms “or” and “and/or” are utilized to describe multiple components in combination or exclusive of one another.
  • x, y, and/or z can refer to “x” alone, “y” alone, “z” alone, “x, y, and z,” “(x and y) or z,” “x or (y and z),” or “x or y or z.” It is specifically contemplated that x, y, or z may be specifically excluded from an aspect.
  • the term “comprising,” which is synonymous with “including,” “containing,” or “characterized by,” is inclusive or open-ended and does not exclude additional, unrecited elements or method steps. The phrase “consisting of” excludes any element, step, or ingredient not specified.
  • wild-type refers to the endogenous version of a molecule that occurs naturally in an organism.
  • wild- type versions of a protein or polypeptide are employed, however, in many aspects of the disclosure, a modified protein or polypeptide is employed to generate an immune response.
  • a “modified protein” or “modified polypeptide” or “engineered protein” or “engineered polypeptide” or a “variant” refers to a protein or polypeptide whose chemical structure, particularly its amino acid sequence, is altered with respect to the wild-type protein or polypeptide.
  • a modified/variant protein 201118018.1 - 14 - or polypeptide has at least one modified activity or function (recognizing that proteins or polypeptides may have multiple activities or functions). It is specifically contemplated that a modified/variant protein or polypeptide may be altered with respect to one activity or function yet retain a wild-type activity or function in other respects, such as immunogenicity.
  • a protein is specifically mentioned herein, it is in general a reference to a native (wild-type) or recombinant (modified) protein or, optionally, a protein in which any signal sequence has been removed.
  • the protein may be isolated directly from the organism of which it is native, produced by recombinant DNA/exogenous expression methods, or produced by solid phase peptide synthesis (SPPS) or other in vitro methods.
  • SPPS solid phase peptide synthesis
  • the term “recombinant” may be used in conjunction with a polypeptide or the name of a specific polypeptide, and this generally refers to a polypeptide produced from a nucleic acid molecule that has been manipulated in vitro or that is a replication product of such a molecule.
  • the size of a protein or polypeptide may comprise, but is not limited to, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210,
  • polypeptides may be mutated by truncation, rendering them shorter than their corresponding wild-type form, also, they might be altered by fusing or conjugating a heterologous protein or polypeptide sequence with a particular function (e.g., for targeting or localization (e.g., site- directed enzymatic purposes beyond solely endonuclease activity), for enhanced immunogenicity, for purification purposes, etc.).
  • a heterologous protein or polypeptide sequence with a particular function (e.g., for targeting or localization (e.g., site- directed enzymatic purposes beyond solely endonuclease activity), for enhanced immunogenicity, for purification purposes, etc.).
  • amino acids may be substituted for other amino acids in a protein or polypeptide sequence with or without appreciable loss of interactive binding capacity with structures such as, for example, antigen-binding regions of antibodies or binding sites on 201118018.1 - 15 - substrate molecules. Since it is the interactive capacity and nature of a protein that defines that protein’s functional activity, certain amino acid substitutions can be made in a protein sequence and in its corresponding DNA coding sequence, and nevertheless produce a protein with similar or desirable properties. It is thus contemplated by the inventors that various changes may be made in the DNA sequences of genes which encode proteins without appreciable loss of their biological utility or activity.
  • amino acid sequence variants of the disclosure can be substitutional, insertional, or deletion variants.
  • a variation in a polypeptide of the disclosure may affect, affect at least, or affect at most 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more non-contiguous or contiguous amino acids of the protein or polypeptide, as compared to wild-type (or any range derivable therein).
  • a variant can comprise an amino acid sequence that is at least or at most 50%, 60%, 70%, 80%, or 90%, including all values and ranges there between, identical to any sequence provided or referenced herein.
  • a variant can include or exclude 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more substitute amino acids (or any range derivable therein).
  • amino acid and nucleic acid sequences may include additional residues, such as additional N- or C-terminal amino acids, or 5 ⁇ or 3 ⁇ sequences, respectively, and yet still be essentially identical as set forth in one of the sequences disclosed herein, so long as the sequence meets the criteria set forth above, including the maintenance of biological protein activity where protein expression is concerned.
  • the addition of terminal sequences particularly applies to nucleic acid sequences that may, for example, include various non-coding sequences flanking either of the 5 ⁇ or 3 ⁇ portions of the coding region.
  • Deletion variants typically lack one or more residues of the native or wild type protein. Individual residues can be deleted or a number of contiguous amino acids can be deleted. A stop codon may be introduced (by substitution or insertion) into an encoding nucleic acid sequence to generate a truncated protein.
  • Insertional mutants typically involve the addition of amino acid residues at a non- terminal point in the polypeptide. This may include the insertion of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more) amino acid residues.
  • Terminal additions may 201118018.1 - 16 - also be generated and can include fusion proteins which are multimers or concatemers of one or more peptides or polypeptides described or referenced herein.
  • Substitutional variants typically contain the exchange of one amino acid for another at one or more sites within the protein or polypeptide, and may be designed to modulate one or more properties of the polypeptide, with or without the loss of other functions or properties. Substitutions may be conservative, that is, one amino acid is replaced with one of similar chemical properties. “Conservative amino acid substitutions” may involve exchange of a member of one amino acid class with another member of the same class.
  • Conservative substitutions are well known in the art and include, for example, the changes of: alanine to serine; arginine to lysine; asparagine to glutamine or histidine; aspartate to glutamate; cysteine to serine; glutamine to asparagine; glutamate to aspartate; glycine to proline; histidine to asparagine or glutamine; isoleucine to leucine or valine; leucine to valine or isoleucine; lysine to arginine; methionine to leucine or isoleucine; phenylalanine to tyrosine, leucine or methionine; serine to threonine; threonine to serine; tryptophan to tyrosine; tyrosine to tryptophan or phenylalanine; and valine to isoleucine or leucine.
  • amino acid substitutions may encompass non-naturally occurring amino acid residues, which are typically incorporated by chemical peptide synthesis rather than by synthesis in biological systems. These include peptidomimetics or other reversed or inverted forms of amino acid moieties.
  • substitutions may be “non-conservative”, such that a function or activity of the polypeptide is affected. Non-conservative changes typically involve substituting an amino acid residue with one that is chemically dissimilar, such as a polar or charged amino acid for a nonpolar or uncharged amino acid, and vice versa. Non-conservative substitutions may involve the exchange of a member of one of the amino acid classes for a member from another class.
  • polypeptides can determine suitable variants of polypeptides as set forth herein using well-known techniques. Utilizing the information provided herein, one skilled in the art may identify suitable areas of a molecule (e.g., a Cas12f enzyme and/or associated sgRNA) that may be changed without destroying activity by targeting regions not believed to be important for one or more specific activities. Utilizing the information provided herein, the skilled artisan will also be able to identify amino acid residues and/or nucleic acid residues and portions of the molecules that are conserved among similar proteins, polypeptides, and/or polynucleotides.
  • suitable areas of a molecule e.g., a Cas12f enzyme and/or associated sgRNA
  • hydropathy index of amino acids may be considered.
  • the hydropathy profile of a protein is calculated by assigning each amino acid a numerical value (“hydropathy index”) and then repetitively averaging these values along the peptide chain. Each amino acid has been assigned a value based on its hydrophobicity and charge characteristics.
  • the importance of the hydropathy amino acid index in conferring interactive biologic function on a protein is generally understood in the art (Kyte et al., J.
  • hydrophilicity values have been assigned to these amino acid residues: arginine (+3.0); lysine (+3.0); aspartate (+3.0 ⁇ 1); glutamate (+3.0 ⁇ 1); serine (+0.3); asparagine (+0.2); glutamine (+0.2); glycine (0); threonine ( ⁇ 0.4); proline ( ⁇ 0.5 ⁇ 1); alanine ( ⁇ 0.5); histidine ( ⁇ 0.5); cysteine ( ⁇ 1.0); methionine ( ⁇ 1.3); valine ( ⁇ 1.5); leucine ( ⁇ 1.8); isoleucine ( ⁇ 1.8); tyrosine ( ⁇ 2.3); phenylalanine ( ⁇ 2.5); and tryptophan ( ⁇ 3.4).
  • the substitution of amino acids whose hydrophilicity values are within ⁇ 2 are included, 201118018.1 - 18 - in other aspects, those which are within ⁇ 1 are included, and in still other aspects, those within ⁇ 0.5 are included.
  • amino acid substitutions are made that: (1) reduce susceptibility to proteolysis, (2) reduce susceptibility to oxidation, (3) alter binding affinity for forming protein-protein and/or protein-polynucleotide complexes, (4) alter ligand or antigen binding affinities, and/or (5) confer or modify other physicochemical or functional properties on such polypeptides.
  • single or multiple amino acid substitutions may be made in the naturally occurring sequence.
  • substitutions can be made in that portion of the enzyme that lies outside the domain(s) forming intermolecular contacts.
  • nucleic acid sequences can exist in a variety of instances such as: isolated segments and recombinant vectors of incorporated sequences or recombinant polynucleotides encoding an enzyme, or a fragment, derivative, or variant thereof, polynucleotides sufficient for use as hybridization probes, PCR primers or sequencing primers for identifying, analyzing, mutating or amplifying a polynucleotide encoding a polypeptide, anti-sense nucleic acids for inhibiting expression of a polynucleotide, ancillary components of CRISPR/Cas systems, functional oligonucleotides, donor constructs, rescue constructs, and complementary sequences of the foregoing described herein.
  • Nucleic acids encoding fusion 201118018.1 - 19 - proteins that include the proteins/polypeptides described herein are also contemplated.
  • the nucleic acids can be single-stranded or double-stranded and can comprise RNA and/or DNA nucleotides and artificial variants thereof (e.g., peptide nucleic acids).
  • polynucleotide refers to a nucleic acid molecule that either is recombinant or has been isolated from total genomic nucleic acid.
  • polynucleotide oligonucleotides (e.g., nucleic acids typically 200 residues or less, or 100 residues or less in length), recombinant vectors, including, for example, plasmids, cosmids, phage, viruses, and the like.
  • Polynucleotides include, in certain aspects, regulatory sequences, isolated substantially away from their naturally occurring genes or protein encoding sequences. Polynucleotides may be single- stranded (coding or antisense) or double- stranded, and may be RNA, DNA (genomic, cDNA or synthetic), analogs thereof, or a combination thereof.
  • RNA species such as but not limited to, CRISPR/Cas system ancillary components (e.g., sgRNAs).
  • this term encompasses genomic sequences, expression cassettes, cDNA sequences, and smaller engineered nucleic acid segments that express, or may be adapted to express, proteins, polypeptides, domains, peptides, fusion proteins, mutants, and functional RNA species.
  • a nucleic acid encoding all or part of a polypeptide and/or functional RNA species may contain a contiguous nucleic acid sequence encoding all or a portion of such a polypeptide and/or functional RNA species.
  • polypeptide and/or functional RNA species may be encoded by nucleic acids containing variations having slightly different nucleic acid sequences but, nonetheless, encode the same or substantially similar protein and/or RNA species.
  • polynucleotide variants having substantial identity to the sequences disclosed herein; those comprising at least 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% or higher sequence identity, including all values and ranges there between, compared to a polynucleotide sequence provided herein using the methods described herein (e.g., BLAST analysis using standard parameters).
  • the isolated polynucleotide will comprise a nucleotide sequence encoding a polypeptide and/or functional RNA species that has at least 90%, preferably 95% and above, identity to an amino acid 201118018.1 - 20 - sequence and/or RNA sequence described herein, over the entire length of the sequence; or a nucleotide sequence complementary to said isolated polynucleotide.
  • the nucleic acid segments regardless of the length of the coding sequence itself, may be combined with other nucleic acid sequences, such as promoters, polyadenylation signals, additional restriction enzyme sites, multiple cloning sites, other coding segments, and the like, such that their overall length may vary considerably.
  • the nucleic acids can be any length.
  • nucleic acid fragments of almost any length may be employed, with the total length preferably being limited by the ease of preparation and use in the intended recombinant nucleic acid protocol.
  • a nucleic acid sequence may encode a polypeptide sequence with additional heterologous coding sequences, for example to allow for purification of the polypeptide, transport, secretion, post-translational modification, or for therapeutic benefits such as targeting or efficacy.
  • a tag or other heterologous polypeptide may be added to the modified polypeptide-encoding sequence, wherein “heterologous” refers to a polypeptide that is not the same as the modified polypeptide.
  • any composition of the invention may be used in any method of the invention, and any method of the invention may be used to produce or to utilize any composition of the invention.
  • Aspects of an aspect set forth in the Examples are also aspects that may be implemented in the context of aspects discussed elsewhere in a different Example or elsewhere in the application, such as in the Summary, Detailed Description, Claims, and Description of the Drawings. [0133]
  • a variety of aspects are discussed throughout this application. Any aspect discussed with respect to one aspect applies to other aspects as well and vice versa. Each aspect described herein is understood to be aspects that are applicable to all aspects. It is contemplated that any aspect discussed herein can be implemented with respect to any method or composition, and vice versa.
  • compositions and kits can be used to achieve methods disclosed herein.
  • FIGs. 1A-1G Engineering AsCas12f for increased genome-editing efficiency.
  • FIG.1A Domain organization of AsCas12f1 compared with SpCas9, AsCas12a, DpbCasX, and UnCas12f1. HNH, REC, and RuvC domains are indicated. Protein lengths are drawn to scale. aa: amino acid.
  • FIG. 1B Sequence alignment of AsCas12f1 and its homologous proteins. Representative regions are shown from SEQ ID NOs: 1-10, respectively, with candidates for mutagenesis highlighted in red boxes.
  • FIG. 1C Workflow to determine the cellular activity of AsCas12f and variants thereof.
  • FIG. 1D Workflow to determine the cellular activity of AsCas12f and variants thereof.
  • FIG. 1D Workflow to determine the cellular activity of AsCas12f and variants thereof.
  • FIG. 1D Workflow to determine the cellular activity of AsCas12f and variants thereof.
  • FIG. 1D and (FIG. 1E), Indel levels at TP53-1 (FIG. 1D) and HEXA (FIG. 1E) loci generated by AsCas12f variants that bear one, two, three, four, or five single-point mutations.
  • FIG. 9B A list of mutations included in exemplary AsCas12f variant is provided in FIG. 9B
  • FIGs. 2A-2E Genome editing facilitated by engineered AsCas12f systems.
  • FIG.2B 201118018.1 - 22 - Box-and-whisker plot of indel frequencies delivered by AsCas12f and UnCas12f systems shown in (FIG. 2A).
  • FIG. 2C and (FIG. 2D), Indel frequencies mediated by wild-type AsCas12f (SEQ ID NO: 1; left columns), enAsCas12f (SEQ ID NO: 16; middle columns), and CasMINI-ge4.1 (right columns) in HCT116 (top) and HeLa (bottom) cells.
  • FIG.2D Box- and-whisker plot of indel frequencies delivered by enAsCas12f and AsCas12a.
  • the N-lobe contains a wedge (WED) domain and a recognition (REC) domain.
  • the C-lobe includes a RuvC nuclease domain and a zinc finger (ZF) motif.
  • FIG. 3B Unsharpened cryo-EM map for the AsCas12f-gRNA-DNA complex (contoured at a level of 0.020).
  • FIG. 3C Top: atomic model of the AsCas12f-gRNA-DNA complex. Bottom: zoomed-in views of a number of residues mutated in enAsCas12f (D196K, N199K, G276R) (SEQ ID NO: 16).
  • FIGs.4A-4F Structure-guided engineering of the AsCas12f gRNA.
  • FIG.4A Indel frequencies mediated by enAsCas12f with engineered AsCas12f gRNAs at HEXA and PDCD1 loci. The structures of engineered gRNAs are shown in FIG.19B.
  • FIG.4B Structure of sgRNA-v2 (SEQ ID NO: 20).
  • FIG.4C Time-course in vitro DNA cleavage using full- length sgRNA and sgRNA-v2.
  • the assay was conducted using enAsCas12f at 37 °C. Data points were fitted to one-phase exponential association curves.
  • Associated gel images were provide in FIG.19E.
  • FIG.4D Indel frequencies mediated by the full-length sgRNA (SEQ ID NO: 18; left columns) and sgRNA-v2 (SEQ ID NO: 20; right columns) in complex with enAsCas12f (SEQ ID NO: 16) at denoted genomic loci in HEK293T cells.
  • FIG.4D Indel frequencies mediated by the full-length sgRNA (SEQ ID NO: 18; left columns) and sgRNA-v2 (SEQ ID NO: 20; right columns) in complex with enAsCas12f (SEQ ID NO: 16) at denoted genomic loci in HEK293T cells.
  • FIG.4F Relative abundance of full-length sgRNA and sgRNA-v2 targeting HEXA and PDCD1 loci in HEK293T cells. Two independent replicates were carried out in FIG.4A, FIG.4C, and FIG.4D.
  • FIGs. 5A-5F Genome-wide specificity of AsCas12f-v4.1 and enAsCas12f.
  • FIG. 5A On-target indel frequencies in GUIDE-seq samples for HEXA, TP53-2, MRPL39, APOB, and PDCD1 using AsCAS12f-WT (SEQ ID NO: 1; left columns), AsCas12f-v4.1 (SEQ ID NO: 14; middle columns), and enAsCas12f (D196K, N199K, G276R) (SEQ ID NO: 16; right columns).
  • FIGs.5B-5F Off-target editing sites for wild-type AsCas12f (SEQ ID NO: 1), AsCas12f-v4.1 (SEQ ID NO: 14), and enAsCas12f (SEQ ID NO: 16) with gRNAs targeting HEXA (FIG. 5B), TP53-2 (FIG. 5C), PDCD1 (FIG. 5D), APOB (FIG. 5E), and MRPL39 (FIG.5F) loci (SEQ ID NOs: 39, 47, 43, 35, 41, respectively, with 3 nucleotides of the 5 ⁇ PAM included), reported by GUIDE-seq in HEK293T cells. Mismatch positions were highlighted in color.
  • FIG. 6 Phylogenetic analysis. Phylogenetic tree of Cas12f family proteins. Hypothetical proteins are denoted by Genbank protein accession numbers.
  • FIG.7 Full sequence alignment of AsCas12f and other Cas12f family proteins. Alignment was performed using the T-Coffee multiple sequence alignment program (SEQ ID NOs: 1-10).
  • FIG.8 Full sequence alignment of AsCas12f and other Cas12f family proteins. Alignment was performed using the T-Coffee multiple sequence alignment program (SEQ ID NOs: 1-10).
  • FIGs.9A-9B AsCas12f variants with increased gene-editing efficiency.
  • FIG. 9A Indel level at the TP53-2 locus generated by AsCas12f variants that bear one, two, three, four, or five single-point mutations.
  • FIG. 9B List of mutations included in each AsCas12f variant in (FIG.9A) and FIGs.1D-1E.
  • FIG. 10A-10C Representative deep sequencing data processed by CRISPEResso2 showing DNA-editing patterns generated by wild-type AsCas12f and enAsCas12f.
  • FIG. 10A Metaplots showing the positions of insertions, deletions, and substitutions from samples edited by wild-type AsCas12f (left) (SEQ ID NO: 1) and enAsCas12f (right) (SEQ ID NO: 16).
  • FIG. 10B Frequencies of insertions, deletions, and substitutions observed near the PAM sequence from samples edited by wild-type AsCas12f (top) and enAsCas12f (bottom) in a HEXA reference sequence (SEQ ID NO: 57).
  • FIG.10C Raw sequencing reads from samples edited by wild-type AsCas12f (top) and enAsCas12f (bottom). Sequence variants with ⁇ 1% of total reads are shown.
  • the exemplary edited 201118018.1 - 24 - sequences comprising indels and/or substitutions were aligned to the HEXA reference sequence (SEQ ID NO: 57).
  • FIGs. 11A-11F Performance of engineered AsCas12f across a wide range of target sites with different PAM sequences.
  • FIG. 11D Fold changes of indel frequencies generated by engineered AsCas12f variants compared to wild-type AsCas12f (for each gene, enzymes tested are ordered as follows: AsCas12f-WT first columns; AsCas12f-v3.2 second columns; AsCas12f-v4.1 third columns; and enAsCas12f fourth columns).
  • FIG. 11F Fold changes of indel frequencies generated by CasMINI (right columns) compared to wild-type UnCas12f paired with ge4.1 (left columns).
  • FIGs.12A-12B Single-particle cryo-EM analysis of the AsCas12f-gRNA-DNA complex.
  • FIG.12A Data processing workflow. A representative micrograph is shown along with a 50 nm scale bar.
  • FSC Fourier shell correlation
  • FIGs.14A-14C Comparison of the AsCas12f and UnCas12f complexes.
  • FIGs. 15A-15B Dimerization is essential for the activity of AsCas12f.
  • FIG. 15A Zoom-in views of the dimer interfaces.
  • FIG. 15B Indel frequencies generated by wild- type AsCas12f and AsCas12f variants bearing mutations that disrupt the dimer interfaces. 201118018.1 - 25 - [0152]
  • FIGs. 16A-16C PAM and sgRNA recognition by AsCas12f.
  • FIG. 16B Recognition of the non-target polynucleotide strand (FIG. 16A) and the target polynucleotide strand (FIG.16B). Numberings of sgRNA and DNA are shown in FIG.17A.
  • FIG.16C Indel frequencies generated by wild-type AsCas12f and AsCas12f variants bearing mutations that disrupt interactions between the protein and DNA (blue, variants Y76A, S92A, R101A, R298A, and Y343A) or sgRNA (yellow, variants W17A, H72A, R121A, and Y351A).
  • FIGs. 17A-17D Comparison of AsCas12f and UnCas12f gRNAs.
  • FIG. 17A Secondary structure scheme of the wild-type AsCas12f sgRNA (SEQ ID NO: 58 with a targeting region against VEGFA-1 (SEQ ID NO: 50); SEQ ID NO: 18 generic) and regions interacting with exemplary VEGFA-1 target DNA (SEQ ID NOs: 54 and 55).
  • FIGs. 17B- 17C Structure of the sgRNA and the target dsDNA in AsCas12f-sgRNA-DNA (FIG. 17B) and UnCas12f-sgRNA-DNA (FIG. 17C) complexes.
  • FIG. 17D Superimposition of UnCas12f sgRNA (violet) and AsCas12f sgRNA (red).
  • FIGs.18A-18E AsCas12f-gRNA interactions.
  • FIG.18A W17.1 forms a ⁇ - ⁇ interaction with G(-67) of the gRNA.
  • FIG.18B H72.2 forms a hydrogen bond with A(-131) at O 2 position.
  • FIG.18C Y76.2 forms a hydrogen bond with the phosphate backbone of A(- 129) of the gRNA.
  • FIG.18D R121.1 forms a hydrogen bond with the phosphate backbone of C(-105) of the gRNA.
  • FIG.18E Y351.1 forms a ⁇ – ⁇ interaction with C(1) of the gRNA.
  • FIGs. 19A-19C Various truncations of stem 5 (as noted in FIG.17A, e.g., colored in the grey and yellow boxes; SEQ ID NOs: 26-31).
  • FIG. 19D Modifications of stem 3 (as noted in FIG. 17A, e.g., colored in the blue box; SEQ ID NOs: 32-34).
  • FIG. 19E Gel electrophoresis monitoring in vitro DNA cleavage over time courses using full-length sgRNA and sgRNA-v2. The assay was conducted using enAsCas12f at 37 °C. [0156] FIGs.20A-20D.
  • FIG. 20A Western blot showing the protein levels of Flag-tagged AsCas12f variants in HEK293T cells (enAsCas12f, AsCas12f-v4.1, and AsCas12f-WT), GAPDH was utilized as loading control.
  • FIG. 20B SDS-PAGE analysis of wild-type AsCas12f and enAsCas12f proteins used for in vitro DNA cleavage experiments.
  • FIGs. 21A-21D Comparison between enAsCas12f, AsCas12a, and SpCas9.
  • FIG. 21A Indel frequencies mediated by enAsCas12f (left columns) and AsCas12a (right columns) in HEK293T cells were measured at a number of genomic loci.
  • Indel 201118018.1 - 26 - frequencies mediated by enAsCas12f (left columns) and SpCas9 (right columns) in HEK293T cells were measured at a number of genomic loci.
  • SpCas9 sgRNAs for PDCD1, TP53-1, and VEGFA-1 were designed to recognize sites proximal to the PAM sites targeted by AsCas12f sgRNAs.
  • FIGs. 21C Schematic construct design for CRISPRa mediated by different Cas proteins (e.g., AsCas12f-WT, enAsCas12f, AsCas12a, and SpCas9) fused to transcriptional activator VP64-p65-Rta (VPR).
  • FIG.21D Gene activation fold change differences mediated by different CRISPRa constructs in HEK293T cells. Transcription activation was measured by the relative RNA level of HBG, IL1RN, and HBB transcripts normalized to GAPDH. Fold changes were normalized to RNA levels in cells transfected with a non-target sgRNA.
  • FIG.22A On-target GUIDE-seq tag integration efficiency measured by targeted amplicon sequencing (for each gene, enzymes tested are ordered as follows: AsCas12f-WT left columns; AsCas12f-v4.1 middle columns; and enAsCas12f right columns).
  • FIGs. 22B-22F Off-target editing sites for wild-type AsCas12f, AsCas12f-v4.1, and enAsCas12f with gRNAs targeting HEXA (FIG. 22B), TP53-2 (FIG. 22C), PDCD1 (FIG. 22D), APOB (FIG.
  • CRISPR systems have been functionalized with various effector proteins to enable programmed editing of the genome (see e.g., Komor, A.C., et al., 2016; Gaudelli, N.M., et al., and Anzalone, A.V., et al., 2019), epigenome (as summarized in e.g., Nakamura, M., et al., 201118018.1 - 27 - 2021), transcriptome (as summarized in e.g., Terns, M.P., et al., 2018), and epitranscriptome (see e.g., Liu, X.M., et al., 2019; and Wilson, C., et al., 2020) in a wide range of organisms.
  • epigenome as summarized in e.g., Nakamura, M., et al., 201118018.1 - 27 - 2021
  • transcriptome as summarized in e
  • CRISPR-Cas systems are broadly distributed in bacteria and archaea with remarkable evolutionary plasticity and functional diversity (see e.g., Koonin, E.V., et al., 2022).
  • Streptococcus pyogenes Cas9 (SpCas9, type II-A) (see e.g., Cong, L., et al., 2013; Mali, P., et al., 2013; and Jinek, M., et al., 2013) and Acidaminococcus sp. BV3L6 Cas12a (AsCas12a, type V-A) (see e.g., Zetsche, B., et al., 2015) have repeatedly been shown to provide potent gene-editing activity and broad tissue compatibility ex vivo (see e.g., Pickar-Oliver, A.
  • Adeno-associated viruses are the leading candidates for in vivo delivery of gene-editing agents (see e.g., Yin H., et al., 2017; and Lino, C.A., et al., 2018), owing to their long application history in the clinic, lack of pathogenicity and immunogenicity, and programmable tissue tropism.
  • AAV vectors have a maximum packaging capacity of about 4.7 kb, a size that is insufficient to accommodate SpCas9 (e.g., about 1,368 amino acids) or AsCas12a (e.g., about 1,307 amino acids) and their essential auxiliary components (e.g., regulatory sequences, sgRNA sequences, homology based rescue constructs, etc.; See FIG 1. A).
  • SpCas9 e.g., about 1,368 amino acids
  • AsCas12a e.g., about 1,307 amino acids
  • essential auxiliary components e.g., regulatory sequences, sgRNA sequences, homology based rescue constructs, etc.; See FIG 1. A).
  • the packaging obstacle can be partially addressed using split Cas proteins (see e.g., Wright, A.V., et al., 2015; Zetsche, B., et al., 2015; Nihongaki, Y., et al., 2015; and Chew, W.L., et al., 2016), but these designs often lead to lower efficiency as a cell must be infected by at least two different AAV particles to acquire an intact CRISPR complex. Instead, Cas proteins of comparable nuclease activity but smaller sizes provide a more straightforward solution to the delivery challenge and may further advance clinical applications of gene-editing agents. However, while compact CRISPR-Cas systems offer versatile treatment options for genetic disorders, their application is often limited by modest to low relative gene-editing activity.
  • CasX Cas12e, V-E, 986 amino acids
  • Cas ⁇ Cas12j, V-J, 700–800 amino acids
  • Cas12f also known as Cas14, V-F, 400–700 amino acids
  • the IscB and TnpB family proteins ( ⁇ 400 amino acids) are putative ancestors of Cas9 and Cas12a, and have also been shown to confer RNA-guided nuclease activity (see e.g., Kapitonov 201118018.1 - 28 - Vladimir, V., et al., 2016; Altae-Tran, H., et al., 2021; Kato, K., et al., 2022; Hirano, S., et al., 2022; Karvelis T., et al., 2021; and Schuler, G., et al., 2022).
  • Cas12f proteins are of particular interest given their small sizes (FIG.1 A) and unique dimerization-mediated DNA- targeting mechanism (see e.g., Takeda, S.N., et al., 2021; and Xiao, R., et al., 2021). Initially identified as single-stranded DNA (ssDNA)-specific nucleases (see e.g., Harrington Lucas, B., et al., 2018), Cas12f proteins were later demonstrated capable of cleaving double-stranded DNA (dsDNA) with 5′ T-rich protospacer adjacent motifs (PAMs) (see e.g., Karvelis, T., et al., 2020).
  • ssDNA single-stranded DNA
  • PAMs 5′ T-rich protospacer adjacent motifs
  • UnCas12f has been further improved by protein and guide RNA (gRNA) engineering (see e.g., Xu, X., et al., 2021; and Kim, D.Y., et al., 2022a).
  • gRNA protein and guide RNA
  • AAV- mediated delivery of UnCas12f and AsCas12f and their gRNAs resulted in successful genome editing in human embryonic kidney (HEK) 293 cells, U2-OS cells, Huh-7 cells, and laboratory mice (see e.g., Wu, Z., et al., 2021; Kim, D.Y., et al., 2022a, and Kim, D.Y., et al., 2022b), highlighting the therapeutic potential of the Cas12f family of enzymes.
  • HEK human embryonic kidney
  • Cas12f systems have a significant margin for improvement as gene-editing agents.
  • AsCas12f systems were engineered to obtain AsCas12f variants that generate programmed double-stranded breaks (DSBs) about 2- to 11- fold more efficiently than the wild-type AsCas12f protein when targeting the human genome.
  • DSBs programmed double-stranded breaks
  • cryo-EM structure of AsCas12f in complex with the single guide RNA (sgRNA) and the target DNA at a 2.9 ⁇ resolution.
  • cryo-EM structures that significantly advance the field’s mechanistic understanding of type V-F CRISPR systems.
  • engineered truncated sgRNA s.
  • engineered truncated sgRNAs were designed that retained Cas mediated DNA-targeting and cleavage capacity.
  • truncated sgRNAs with 72 nt sequences removed from the 193 nt wild type sgRNA were produced, and said engineered truncated sgRNAs retain ribonucleoprotein mediated DNA-targeting and cleavage activity greater than about, or about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 201118018.1 - 29 - 90%, 95%, 100%, 105%, 110%, 115%, or 120%, or any range derivable therein, of ribonucleoprotein mediated DNA-targeting and cleavage activity of non-engineered control sgRNA.
  • off-target editing can be assessed by genome-wide sequencing methods, exome sequencing methods, common off-target site specific sequencing, site-specific sequencing of relatively homologous off-target sites, and/or other polynucleotide identification methods.
  • off-target editing can be assessed by genome-wide, unbiased identification of double- stranded breaks enabled by sequencing (GUIDE-seq) (see e.g., Tsai, S.Q., et al., 2015).
  • RNA and/or endo DNA nuclease activity are technologies for site-specific endonuclease activity.
  • site-specific DNA endonuclease activity e.g., endo RNA and/or endo DNA nuclease activity.
  • enzymatic endonuclease activity generates single stranded and/or double stranded breaks.
  • presented herein are engineered Cas12f enzymes with increased gene-editing activity relative to non-engineered Cas12f enzymes.
  • engineered enzymes presented herein can have greater than or equal to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12, or any range derivable therein, more potent gene-editing activity relative to non- engineered parental Cas12f enzymes.
  • engineered Cas12f enzymes that are less than 90%, 89%, 88%, 87%, 86%, 85%, 84%, 83%, 82%, 81%, 80%, 79%, 78%, 77%, 76%, 75%, 74%, 73%, 72%, 71%, 70%, 69%, 68%, 67%, 66%, 65%, 64%, 63%, 62%, 61%, 60%, 59%, 58%, 57%, 56%, 55%, 54%, 53%, 52%, 51%, 50%, 49%, 48%, 47%, 46%, 45%, 44%, 43%, 42%, 41%, 40%, 39%, 38%, 37%, 36%, 35%, 34%, 33%, 32%, 31%, or 30%, or any range derivable therein, the size (e.g., primary amino acid sequence length) than the widely used SpCas9 protein.
  • the size e.g., primary amino acid sequence length
  • the size e.g., primary amino acid sequence length
  • Cas12f enzymes e.g., engineered Cas12f enzymes
  • Cas12f enzymes function broadly in mammalian cells, such as human cells.
  • Cas12f enzymes provided herein can facilitate greater than or equal to about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 6
  • engineered Cas12f enzymes provided herein can facilitate site specific endonuclease activity with minimal off-target editing.
  • technologies provided herein provide site-specific endonuclease activity at greater than or equal to about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%,
  • technologies provided herein provide site-specific endonuclease activity at greater than or equal to about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 fold, or any range derivable therein, target polynucleotide population cleavage rates relative to a control enzyme and/or sgRNA species (e.g., a control non-engineered enzyme, a control engineered enzyme, a control non-engineered sgRNA, a control engineered sgRNA, etc.).
  • a control enzyme and/or sgRNA species e.g., a control non-engineered enzyme, a control engineered enzyme, a control non-engineered sgRNA, a control engineered sgRNA, etc.
  • CRISPR/Cas components The current disclosure provides polynucleotides, proteins, polypeptides, vectors, and methods and compositions comprising any one or more of the aforementioned components.
  • polynucleotides may encode sequences comprising CRISPR/Cas proteins/polypeptides, and/or ancillary RNA components (e.g., CRISPR RNA (crRNA), trans- activating CRISPR RNA (tracrRNA), single guide RNA (sgRNA), etc.).
  • CRISPR RNA CRISPR RNA
  • tracrRNA trans- activating CRISPR RNA
  • sgRNA single guide RNA
  • one or more of the CRISPR/Cas proteins/polypeptides and/or ancillary RNA components may be engineered as described herein.
  • polynucleotides, proteins, polypeptides, and/or peptide sequences for wild type or mutant versions of various genes, such as site-specific target genes have been previously disclosed, and may be found in the recognized computerized databases.
  • polynucleotides, proteins, polypeptides, and/or peptide sequences for wild type versions of various effector proteins and/or RNA molecules have been previously disclosed, and may be 201118018.1 - 31 - found in the recognized computerized databases.
  • Two commonly used databases are the National Center for Biotechnology Information’s Genbank and GenPept databases (on the World Wide Web at ncbi.nlm.nih.gov/) and The Universal Protein Resource (UniProt; on the World Wide Web at uniprot.org).
  • compositions of the disclosure there is between about 0.001 mg and about 10 mg of total polypeptide, peptide, and/or protein per ml.
  • the concentration of protein in a composition can be about, at least about or at most about 0.001, 0.010, 0.050, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0 mg/ml or more (or any range derivable therein).
  • methods, compositions, polynucleotides, polypeptides, and/or vectors comprising engineered Cas12f enzymes coupled with engineered Cas12f sgRNA oligonucleotides.
  • provided herein are methods, compositions, polynucleotides, and/or vectors comprising engineered Cas12f enzymes coupled with non-engineered Cas12f sgRNA oligonucleotides. In certain aspects, provided herein are methods, compositions, polynucleotides, and/or vectors comprising wild type Cas12f enzymes, coupled with engineered Cas12f sgRNA oligonucleotides.
  • A. Cas12f [0171] In some aspects, provided herein are Cas12f enzymes. In certain aspects, Cas12f enzymes are uncultured archaeon Cas12f (UnCas12f) enzymes.
  • Cas12f enzymes are Acidibacillus sulfuroxidans Cas12f (AsCas12f) enzymes.
  • AsCas12f Acidibacillus sulfuroxidans Cas12f
  • Cas12f based CRISPR/Cas systems that target polynucleotide species can be utilized in methods described herein.
  • Cas12f enzymes that mediate at least about 0%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 70%, 71%, 72%, 73%
  • methods provided herein comprise contacting a target polynucleotide species with a CRISPR/Cas system 1, 2, 3, 4, or 5 times. In some aspects, methods provided herein comprise contacting a target polynucleotide species with a CRISPR/Cas system for less than or equal to: 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0.75, 0.5, or 0.25 hours, or any range derivable herein. [0175] In some aspects, the Cas12f enzymes suitable for use in technologies described herein are active against polynucleotide species including DNA and/or RNA species.
  • the Cas12f enzymes suitable for use in technologies described herein are not active endonucleases (e.g., are catalytically dead), and do not cleave polynucleotide species including DNA and/or RNA. In some aspects, the Cas12f enzymes suitable for use in technologies described herein are active against DNA polynucleotide species. The sequences associated with any GenBank Accession numbers provided herein are incorporated by reference for all purposes. [0176] In some aspects, the Cas12f protein comprises a AsCas12f protein. In some aspects, an AsCas12f protein is an engineered enzyme.
  • an engineered AsCas12f protein comprises one or more substitutions (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more) that increase the affinity of the engineered AsCas12f protein to nucleic acids relative to WT AsCas12f.
  • an engineered AsCas12f protein comprises one or more substitutions that result in the introduction of one or more basic residues at a nucleic acid interfacing site.
  • an engineered AsCas12f protein comprises one or more amino acid substitutions relative to WT AsCas12f (SEQ ID NO: 1).
  • an engineered AsCas12f protein comprises one or more substitutions at amino acids D196, N199, G276, D281, T327, N328, and/or D364 relative to WT AsCas12f (SEQ ID NO: 1).
  • an engineered AsCas12f protein comprises one or more substitutions at amino acids D196, N199, G276, D281, T327, N328, and/or D364 relative to WT AsCas12f (SEQ ID NO: 1), while retaining at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%, or any range derivable therein, sequence identity relative to WT AsCas12f (SEQ ID NO: 1).
  • an engineered AsCas12f protein does not comprise amino acid substitutions at sites associated with dimer complex formation, sgRNA interaction, and/or target polynucleotide engagement interfaces. In some aspects, an engineered AsCas12f protein does not comprise amino acid substitutions at amino acids W17, E44, D51, Y52, H72, K80, S92, K96, R121, and/or Y351 relative to WT AsCas12f (SEQ ID NO: 1).
  • an engineered AsCas12f protein comprises one or more of D196K, N199K, G276R, D281K, T327K, N328G, D364K, and/or D364R substitutions relative to WT AsCas12f (SEQ ID NO: 1). In some aspects, an engineered AsCas12f protein does not comprise one or more of D196K, N199K, G276R, D281K, T327K, N328G, D364K, and/or D364R substitutions relative to WT AsCas12f (SEQ ID NO: 1).
  • an engineered AsCas12f protein comprises a D196K substitution relative to WT AsCas12f. In some aspects, an engineered AsCas12f protein comprises a N199K substitution relative to WT AsCas12f. In some aspects, an engineered AsCas12f protein comprises a G276R substitution relative to WT AsCas12f. In some aspects, an engineered AsCas12f protein comprises a D281K substitution relative to WT AsCas12f. In some aspects, an engineered AsCas12f protein comprises a T327K substitution relative to WT AsCas12f.
  • an engineered AsCas12f protein comprises a N328G substitution relative to WT AsCas12f. In some aspects, an engineered AsCas12f protein comprises a D364K substitution relative to WT AsCas12f. In some aspects, an engineered AsCas12f protein comprises a D364R substitution relative to WT AsCas12f. In some aspects, an engineered AsCas12f protein is a catalytically dead variant, comprising a D225A substitution.
  • an engineered AsCas12f protein amino acid substitution relative to WT AsCas12f comprises or consists of a D196K and a G276R substitution (e.g., FIG.9B, v2.1).
  • an engineered AsCas12f protein amino acid substitution relative to WT AsCas12f comprises or consists of a D196K and a D364K substitution (e.g., FIG.9B, v2.2).
  • an engineered AsCas12f protein amino acid substitution relative to WT AsCas12f comprises or consists of a D196K and a D364R substitution (e.g., FIG.9B, v2.3).
  • an engineered AsCas12f protein amino acid substitution relative to WT AsCas12f comprises or consists of a D196K and a N328G substitution (e.g., FIG.9B, v2.4).
  • an engineered AsCas12f protein amino acid substitution relative to WT AsCas12f comprises or consists of a D196K, a G276R, and a D364K substitution (e.g., FIG. 9B, v3.1).
  • an engineered AsCas12f protein amino acid substitution relative to WT AsCas12f comprises or consists of a D196K, a N199K, and a N328G substitution (e.g., FIG.
  • an engineered AsCas12f protein amino acid substitution relative to WT AsCas12f comprises or consists of a D196K, a N199K, a N328G, and a D364R substitution (e.g., FIG.9B, v4.1).
  • an engineered AsCas12f protein amino acid substitution relative to WT AsCas12f comprises or consists of a D196K, a N199K, a G276R, and a D364K substitution (e.g., FIG. 9B, v4.2).
  • an engineered AsCas12f protein amino acid substitution relative to WT AsCas12f comprises or consists of a D196K, a 201118018.1 - 34 - N199K, a G276R, and a D364R substitution (e.g., FIG. 9B, v4.3).
  • an engineered AsCas12f protein amino acid substitution relative to WT AsCas12f comprises or consists of a D196K, a N199K, a G276R, a N328G, and a D364K substitution (e.g., FIG.9B, v5.3).
  • an engineered AsCas12f protein amino acid substitution relative to WT AsCas12f comprises or consists of a D196K, a N199K, a G276R, a N328G, and a D364R substitution (e.g., FIG.9B, v5.2).
  • an engineered Cas12f protein comprises or consists of an amino acid sequence or is encoded by a polynucleotide sequence, with about, exactly, or at least about 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%, or any percentage derivable therein, identity to any one of SEQ ID NOs: 1-16.
  • the size of a protein or polypeptide may comprise, but is not limited to, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210,
  • polypeptides may be mutated by truncation, rendering them shorter than their corresponding wild-type form, also or alternatively, they might be altered by fusing or conjugating a heterologous protein or polypeptide sequence with a particular function (e.g., for targeting or localization, for enhanced immunogenicity, for purification purposes, etc.).
  • an engineered AsCas12f protein is a catalytically dead variant, comprising a D225A substitution, and is fused and/or conjugated to a heterologous protein and/or polypeptide sequence with base editing, transcriptional activation, and/or transcriptional repression activity.
  • an engineered AsCas12f protein is fused to a heterologous protein at the N terminus, in-between AsCas12f domains, and/or at the C terminus. In some aspects, an engineered AsCas12f protein is fused to a heterologous protein via a linker.
  • an engineered AsCas12f protein is fused to a transcriptional activator protein, domain, and/or polypeptide, for example but not limited to, HSF1, EDLL, CBF1, TAL, VP16, VP48, 201118018.1 - 35 - VP96, VP192, VP64, VP640, p65, Rta, VP64-p65-Rta (VPR), and/or combinations thereof.
  • a transcriptional activator protein, domain, and/or polypeptide for example but not limited to, HSF1, EDLL, CBF1, TAL, VP16, VP48, 201118018.1 - 35 - VP96, VP192, VP64, VP640, p65, Rta, VP64-p65-Rta (VPR), and/or combinations thereof.
  • an engineered AsCas12f protein is fused to a transcriptional inhibitory protein, domain, and/or polypeptide, such as but not limited to transcriptional repression domains from KRAB (e.g., Kox1 domain), SID4X, MXI1, HP1 (e.g., CS domain), Hes1 (e.g., WRPW domain), and/or combinations thereof.
  • KRAB e.g., Kox1 domain
  • SID4X e.g., MXI1, HP1 (e.g., CS domain)
  • Hes1 e.g., WRPW domain
  • an engineered AsCas12f protein is fused to protein with enzymatic activity, such as but not limited to, TadA, CDA1, ADAR, ADAR1, ADAR2, a deaminase, an adenine base editor, a cytidine deaminase, APOBEC1, first- generation base editor (BE1), BE2, BE3, HF-BE3, BE4-GAM, YE1-BE3, EE-BE3, YE2-BE3, VQR-BE3, VRER-BE3, Sa-BE3, Sa-BE4, SaBE4-Gam, SaKKH-BE3, Cas12a-BE, Target- AID, Target-AID-NG, xBE3, eA3A-BE3, A3A-BE3, BE-PLUS, TAM, CRISPR-X, ABE7.9, ABE7.10, xABE, ABESa, VQR-ABE, VRER-ABE, SaKKH-ABE, Gam, and/or
  • an engineered AsCas12f protein is fused to a polypeptide described in PCT application PCT/US2022/075891 filed on September 2, 2022, and published as WO2023/034959A2 on March 9, 2023, the entirety of which is incorporated herein by reference for the purposes described herein.
  • an engineered AsCas12f protein is fused to a protein, domain, and/or polypeptide that imparts chemical, genetic, and/or physical control for provision of spatiotemporal control of gene editing.
  • the oligonucleotides, polypeptides, polypeptides, proteins, or polynucleotides encoding such polypeptides or proteins of the disclosure may include 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 (or any derivable range therein) or more variant amino acids or nucleic acid substitutions or be at least 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%,
  • the nucleic acid encoding the peptide or polypeptide is codon optimized for expression in a mammal.
  • the peptide or polypeptide is not naturally occurring and/or is in a combination of peptides or polypeptides.
  • the polypeptides of the disclosure may include at least, at most, or exactly 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 11
  • the substitution is with an alanine, arginine, asparagine, aspartic acid, cysteine, glutamine, glutamic acid, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline, serine, threonine, tryptophan, tyrosine, or valine.
  • the polypeptide comprises one or more substitutions at one or more amino acid positions selected from amino acid 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109
  • the protein or polypeptide may comprise amino acids 1 to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113
  • the protein or polypeptide may comprise amino acids 1 to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113
  • the protein, polypeptide, or nucleic acid may comprise, comprise at least, or comprise at most 1 to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108
  • the polypeptide, protein, or nucleic acid may comprise at least, at most, or exactly 1 to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108,
  • nucleic acid molecule or polypeptide starting at position 1 to 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110,
  • a Cas12f protein comprises sequence associated with any one of SEQ ID NOs: 1-10.
  • an engineered Cas12f protein comprises a sequence with substitutions relative to any one of SEQ ID NOs: 1-10.
  • an engineered Cas12f protein or nucleic acid encoding the same comprises or consists of a sequence according to any one of SEQ ID NOs: 11-16.
  • SEQ ID NO: 1 – Wild type AsCas12f (AsCas12f1/1-422AA) see e.g., FIG.
  • SEQ ID NO: 12 Amino Acid sequence of AsCas12f-v3.2: MIKVYRYEIVKPLDLDWKEFGTILRQLQQETRFALNKATQLAWEWMGFSSDYKDNHGEYPKS KDILGYTNVHGYAYHTIKTKAYRLNSGNLSQTIKRATDRFKAYQKEILRGDMSIPSYKRDIP LDLIKENISVNRMNHGDYIASLSLLSNPAKQEMNVKRKISVIIIVRGAGKTIMDRILSGEYQ VSASQIIHDKRKKKWYLNISYDFEPQTRVLDLNKIMGIDLGVAVAVYMAFQHTPARYKLEGG EIENFRRQVESRRISMLRQGKYAGGARGGHGRDKRIKPIEQLRDKIANFRDTTNHRYSRYIV DMAIKEGCGTIQMEDLTGIRDIGSRFLQNWTYYDLQQKIIYKAEEAGIKVIKIDPQYTSQRC SECGNID
  • SEQ ID NO: 14 Amino Acid sequence of AsCas12f-v4.1: MIKVYRYEIVKPLDLDWKEFGTILRQLQQETRFALNKATQLAWEWMGFSSDYKDNHGEYPKS KDILGYTNVHGYAYHTIKTKAYRLNSGNLSQTIKRATDRFKAYQKEILRGDMSIPSYKRDIP LDLIKENISVNRMNHGDYIASLSLLSNPAKQEMNVKRKISVIIIVRGAGKTIMDRILSGEYQ VSASQIIHDKRKKKWYLNISYDFEPQTRVLDLNKIMGIDLGVAVAVYMAFQHTPARYKLEGG EIENFRRQVESRRISMLRQGKYAGGARGGHGRDKRIKPIEQLRDKIANFRDTTNHRYSRYIV DMAIKEGCGTIQMEDLTGIRDIGSRFLQNWTYYDLQQKIIYKAEEAGIKVIKIRPQYTSQRC 201118018.1
  • Protein preparation A variety of proteins can be purified using methods known in the art. Protein purification is a series of processes intended to isolate a single type of protein from a complex mixture. Protein purification is vital for the characterization of the function, structure and interactions of the protein of interest.
  • the starting material is usually a biological tissue or a microbial culture.
  • the various steps in the purification process may free the protein from a matrix that confines it, separate the protein and non-protein parts of the mixture, and finally separate the desired protein from all other proteins. Separation of one protein from all others is 201118018.1 - 48 - typically the most laborious aspect of protein purification. Separation steps exploit differences in protein size, physico-chemical properties and binding affinity. [0192] Evaluating purification yield.
  • the most general method to monitor the purification process is by running a SDS-PAGE of the different steps. This method only gives a rough measure of the amounts of different proteins in the mixture, and it is not able to distinguish between proteins with similar molecular weight. If the protein has a distinguishing spectroscopic feature or an enzymatic activity, this property can be used to detect and quantify the specific protein, and thus to select the fractions of the separation, that contains the protein. If antibodies against the protein are available then western blotting and ELISA can specifically detect and quantify the amount of desired protein. Some proteins function as receptors and can be detected during purification steps by a ligand binding assay, often using a radioactive ligand.
  • the amount of the specific protein has to be compared to the amount of total protein.
  • the latter can be determined by the Bradford total protein assay or by absorbance of light at 280 nm, however some reagents used during the purification process may interfere with the quantification.
  • imidazole commonly used for purification of polyhistidine-tagged recombinant proteins
  • BCA bicinchoninic acid
  • SPR Surface Plasmon Resonance
  • SPR can detect binding of label free molecules on the surface of a chip. If the desired protein is an antibody, binding can be translated to directly to the activity of the protein. One can express the active concentration of the protein as the percent of the total protein. SPR can be a powerful method for quickly determining protein activity and overall yield. It is a powerful technology that requires an instrument to perform.
  • Methods of protein purification The methods used in protein purification can roughly be divided into analytical and preparative methods. The distinction is not exact, but the deciding factor is the amount of protein that can practically be purified with that method.
  • Analytical methods aim to detect and identify a protein in a mixture, whereas preparative methods aim to produce large quantities of the protein for other purposes, such as structural biology or industrial use.
  • the protein has to be brought into solution by breaking the tissue or cells containing it. There are several methods to achieve this: Repeated freezing and thawing, sonication, homogenization by high pressure, filtration (either via cellulose-based 201118018.1 - 49 - depth filters or cross-flow filtration), or permeabilization by organic solvents. The method of choice depends on how fragile the protein is and how sturdy the cells are. After this extraction process soluble proteins will be in the solvent, and can be separated from cell membranes, DNA etc. by centrifugation.
  • the extraction process also extracts proteases, which will start digesting the proteins in the solution. If the protein is sensitive to proteolysis, it is usually desirable to proceed quickly, and keep the extract cooled, to slow down proteolysis.
  • a common first step to isolate proteins is precipitation with ammonium sulfate (NH4)2SO4. This is performed by adding increasing amounts of ammonium sulfate and collecting the different fractions of precipitate protein.
  • NH4 ammonium sulfate
  • the first proteins to be purified are water-soluble proteins. Purification of integral membrane proteins requires disruption of the cell membrane in order to isolate any one particular protein from others that are in the same membrane compartment.
  • a particular membrane fraction can be isolated first, such as isolating mitochondria from cells before purifying a protein located in a mitochondrial membrane.
  • a detergent such as sodium dodecyl sulfate (SDS) can be used to dissolve cell membranes and keep membrane proteins in solution during purification; however, because SDS causes denaturation, milder detergents such as Triton X-100 or CHAPS can be used to retain the protein's native conformation during complete purification.
  • SDS sodium dodecyl sulfate
  • milder detergents such as Triton X-100 or CHAPS can be used to retain the protein's native conformation during complete purification.
  • Centrifugation is a process that uses centrifugal force to separate mixtures of particles of varying masses or densities suspended in a liquid.
  • Non-compacted particles still remaining mostly in the liquid are called the "supernatant” and can be removed from the vessel to separate the supernatant from the pellet.
  • the rate of centrifugation is specified by the angular acceleration applied to the sample, typically measured in comparison to the g. If samples are centrifuged long enough, the particles in the vessel will reach equilibrium wherein the particles accumulate specifically at a point in the vessel where their buoyant density is 201118018.1 - 50 - balanced with centrifugal force. Such an "equilibrium" centrifugation can allow extensive purification of a given particle.
  • Sucrose gradient centrifugation is a linear concentration gradient of sugar (typically sucrose, glycerol, or a silica based density gradient media, like PercollTM) is generated in a tube such that the highest concentration is on the bottom and lowest on top.
  • a protein sample is then layered on top of the gradient and spun at high speeds in an ultracentrifuge. This causes heavy macromolecules to migrate towards the bottom of the tube faster than lighter material. After separating the protein/particles, the gradient is then fractionated and collected.
  • a protein purification protocol contains one or more chromatographic steps. The basic procedure in chromatography is to flow the solution containing the protein through a column packed with various materials.
  • Chromatography can be used to separate protein in solution or denaturing conditions by using porous gels. This technique is known as size exclusion chromatography. The principle is that smaller molecules have to traverse a larger volume in a porous matrix. Consequentially, proteins of a certain range in size will require a variable volume of eluent (solvent) before being collected at the other end of the column of gel.
  • Ion exchange chromatography separates compounds according to the nature and degree of their ionic charge.
  • the column to be used is selected according to its type and strength of charge.
  • Anion exchange resins have a positive charge and are used to retain and separate negatively charged compounds, while cation exchange resins have a negative charge and are used to separate positively charged molecules.
  • a buffer is pumped through the column to equilibrate the opposing charged ions.
  • Affinity Chromatography is a separation technique based upon molecular conformation, which frequently utilizes application specific resins. These resins have ligands attached to their surfaces which are specific for the compounds to be separated. Most frequently, these ligands function in a fashion similar to that of antibody-antigen interactions.
  • the affinity chromatography comprises maltose-binding protein (MBP).
  • MBP maltose-binding protein
  • a tRNA-specific adenosine deaminase enzyme is conjugated and/or fused to an MBP protein.
  • a fusion and/or conjugation can be at the N terminus and/or C terminus of a tRNA-specific adenosine deaminase enzyme.
  • a fusion and/or conjugation can modify enzymatic activity and/or improve enzyme solubility.
  • membrane proteins are glycoproteins and can be purified by lectin affinity chromatography.
  • Detergent-solubilized proteins can be allowed to bind to a chromatography resin that has been modified to have a covalently attached lectin. Proteins that do not bind to the lectin are washed away and then specifically bound glycoproteins can be eluted by adding a high concentration of a sugar that competes with the bound glycoproteins at the lectin binding site.
  • Some lectins have high affinity binding to oligosaccharides of glycoproteins that is hard to compete with sugars, and bound glycoproteins need to be released by denaturing the lectin.
  • a common technique involves engineering a sequence of 6 to 8 histidines into the N- or C-terminal of the protein.
  • the polyhistidine binds strongly to divalent metal ions such as nickel and cobalt.
  • the protein can be passed through a column containing immobilized nickel ions, which binds the polyhistidine tag. All untagged proteins pass through the column.
  • the protein can be eluted with imidazole, which competes with the polyhistidine tag for binding to the column, or by a decrease in pH (typically to 4.5), which decreases the affinity of the tag for the resin.
  • Immunoaffinity chromatography uses the specific binding of an antibody to the target protein to selectively purify the protein. The procedure involves immobilizing an antibody to a column material, which then selectively binds the protein, while everything else flows through. The protein can be eluted by changing the pH or the salinity. Because this method does not involve engineering in a tag, it can be used for proteins from natural sources.
  • Another way to tag proteins is to engineer an antigen peptide tag onto the protein, and then purify the protein on a column or by incubating with a loose resin that is coated with 201118018.1 - 52 - an immobilized antibody.
  • This particular procedure is known as immunoprecipitation. Immunoprecipitation is quite capable of generating an extremely specific interaction which usually results in binding only the desired protein.
  • the purified tagged proteins can then easily be separated from the other proteins in solution and later eluted back into clean solution.
  • Tags can be cleaved by use of a protease. This often involves engineering a protease cleavage site between the tag and the protein.
  • High performance liquid chromatography or high pressure liquid chromatography is a form of chromatography applying high pressure to drive the solutes through the column faster. This means that the diffusion is limited and the resolution is improved.
  • the most common form is "reversed phase" HPLC, where the column material is hydrophobic.
  • the proteins are eluted by a gradient of increasing amounts of an organic solvent, such as acetonitrile. The proteins elute according to their hydrophobicity. After purification by HPLC the protein is in a solution that only contains volatile compounds, and can easily be lyophilized. HPLC purification frequently results in denaturation of the purified proteins and is thus not applicable to proteins that do not spontaneously refold.
  • the protein At the end of a protein purification, the protein often has to be concentrated. Different methods exist. If the solution doesn't contain any other soluble component than the protein in question the protein can be lyophilized (dried). This is commonly done after an HPLC run. This simply removes all volatile component leaving the proteins behind.
  • Ultrafiltration concentrates a protein solution using selective permeable membranes. The function of the membrane is to let the water and small molecules pass through while retaining the protein. The solution is forced against the membrane by mechanical pump or gas pressure or centrifugation.
  • Gel electrophoresis is a common laboratory technique that can be used both as preparative and analytical method. The principle of electrophoresis relies on the movement of a charged ion in an electric field.
  • the proteins are denatured in a solution containing a detergent (SDS). In these conditions, the proteins are unfolded and coated with negatively charged detergent molecules. The proteins in SDS-PAGE are separated on the sole basis of their size.
  • SDS-PAGE are separated on the sole basis of their size.
  • the protein migrate as bands based on size. Each band can be detected using stains such as Coomassie blue dye or silver stain.
  • Preparative methods to purify large amounts of protein require the extraction of the protein from the electrophoretic gel. This extraction may involve excision of the gel containing a band, or eluting the band directly off the gel as it runs off the end of the gel.
  • denaturing condition electrophoresis provides an improved resolution over size exclusion chromatography, but does not scale to large quantity of proteins in a sample as well as the late chromatography columns.
  • Methods of the disclosure may involve purification of proteins by any combination of methods known in the art and/or discussed herein.
  • the protein is purified by a combination of one or more of affinity chromatography, ion exchange chromatograph, and gel filtration chromatography.
  • the affinity chromatography is anti-FLAG.
  • the ion exchange chromatography is heparin.
  • CRISPR/Cas system ancillary components such as functional RNA species, for example but not limited to, CRISPR RNA, trans-activating CRISPR RNA (tracrRNA), and/or sgRNA.
  • RNA species that mediate at least about 0%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%,
  • RNA species are sgRNA molecules.
  • sgRNA engineering provides improved molecules, such as sgRNA-v2, a compact guide RNA that is about 33% shorter than the full-length wild type sgRNA.
  • engineered sgRNA are greater than or about 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, or 35%, or any range derivable therein, shorter than WT sgRNA.
  • such engineered sgRNA molecules have on par activity and/or increased activity when compared to full-length wild type sgRNA.
  • engineered compact CRISPR/Cas systems e.g., engineered AsCas12f systems
  • engineered compact sgRNA enable robust and faithful gene editing in mammalian cells.
  • non-engineered Cas12f proteins are combined with 201118018.1 - 54 - engineered Cas12f sgRNA.
  • engineered Cas12f proteins are combined with non-engineered Cas12f sgRNA.
  • engineered sgRNA do not comprise a greater than or equal to 3 base pair truncation in the spacer-proximal region of stem 5 (SEQ ID NO: 25).
  • engineered sgRNA comprise truncation of the entire stem 5 (SEQ ID NO: 25). In some aspects, engineered sgRNA do not comprise modifications in stem 2 (SEQ ID NO: 22). In some aspects, engineered sgRNA comprise truncations in stem 3 (SEQ ID NO: 23), stem 4 (SEQ ID NO: 24), and/or stem 5 (SEQ ID NO: 25).
  • engineered sgRNA are greater than about, equal to about, or less than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 nucleotides shorter than WT sgRNA.
  • engineered sgRNA are transcribed at levels greater than about, or equal to about 1, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, or 7.5 fold increase when compared to WT sgRNA.
  • methods provided herein comprise contacting a target polynucleotide species with a CRISPR/Cas system 1, 2, 3, 4, or 5 times.
  • methods provided herein comprise contacting a target polynucleotide species with a CRISPR/Cas system for less than or equal to: 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0.75, 0.5, or 0.25 hours, or any range derivable herein.
  • the small guide RNA (sgRNA) for an endonuclease enzyme has the sequence associated with any one of SEQ ID NOs: 17-20.
  • SEQ ID NO: 18 Wild-type (WT) AsCas12f sgRNA: GGGAUUCGUCGGUUCAGCGACGAUAAGCCGAGAAGUGCCAAUAAAACUGUUAAGUGGUUUGG UAACGCUCGGUAAGGUAGCCAAAAGGCUGAAACUCCGUGCACAAAGACCGCACGGACGCUUC ACAUAUAGCUCAUAAACAAGGGUUUGCGAGCUAGCUUGUGGAGUGUGAACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN (SEQ ID NO: 18).
  • WT Wild-type (WT) AsCas12f sgRNA: GGGAUUCGUCGGUUCAGCGACGAUAAGCCGAGAAGUGCCAAUAAAACUGUUAAGUGGUUUGG UAACGCUCGGUAAGGUAGCCAAAAGGCUGAAACUCCGUGCACAAAGACCGCACGGACGCUUC ACAUAUAGCUCAUAAACAAGGGUUUGCGAGCUAGCUUGUGGAGUGU
  • SEQ ID NO: 22 Wild-type (WT) AsCas12f sgRNA “Stem 2” (see e.g., FIG.17A): GCCGAGAAGUGCCAAUAAAACUGUUAAGUGGUUUGGUAACGCUCGGU(SEQ ID NO: 22).
  • SEQ ID NO: 23 Wild-type (WT) AsCas12f sgRNA “Stem 3” (see e.g., FIG.17A): UAGCCAAAAGGCUG(SEQ ID NO: 23).
  • SEQ ID NO: 24 Wild-type (WT) AsCas12f sgRNA “Stem 4” (see e.g., FIG.17A): CCGUGCACAAAGACCGCACGG(SEQ ID NO: 24).
  • SEQ ID NO: 25 Wild-type (WT) AsCas12f sgRNA “Stem 5” (see e.g., FIG.17A): UAGCUCAUAAACAAGGGUUUGCGAGCUA (SEQ ID NO: 25).
  • SEQ ID NO: 26 – Engineered AsCas12f sgRNA “Stem 5-1” see e.g., FIG.19A: AAACUCCGUGCACAAAGACCGCACGGACGCUUCACAUAUAGCUUGUGGAGUGUGAACNNNNN NNNNNNNNNNNNNNN (SEQ ID NO: 26).
  • SEQ ID NO: 27 – Engineered AsCas12f sgRNA “Stem 5-2” (see e.g., FIG.19A): AAACUCCGUGCACAAAGACCGCACGGACGCUUCACAAAGGUGUGGAGUGUGAACNNNNNN NNNNNNNN (SEQ ID NO: 27).
  • SEQ ID NO: 28 Engineered AsCas12f sgRNA “Stem 5-4 ⁇ 1bp” (see e.g., FIG.19B): AAACUCCGUGCACAAAGACCGCACGGACGCUUCAAUAUAGCUCAUAAACAAGGGUUUGCGAG CUAGCUUUGGAGUGUGAACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN (SEQ ID NO: 28).
  • SEQ ID NO: 30 Engineered AsCas12f sgRNA “Stem 5-4 ⁇ 5bp” (see e.g., FIG.19B): AAACUCCGUGCACAAAGACCGCACGGACAUAUAGCUCAUAAACAAGGGUUUGCGAGCUAGCU UGUGAACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN (SEQ ID NO: 30).
  • SEQ ID NO: 31 – Engineered AsCas12f sgRNA “Stem 5-3” see e.g., FIG.19C): AAACUCCGUGCACAAAGACCGCACGGAAAGGUGAACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN (SEQ ID NO: 31).
  • SEQ ID NO: 32 – Engineered AsCas12f sgRNA “Stem 2-1” see e.g., FIG.19D): GGGAUUCGUCGGUUCAGCGACGAUAAGCCGAGAAGUGCUCAAUAAAACUGUUAAGUGGUUUG AGUAACGCUCGGUAAGG (SEQ ID NO: 32).
  • SEQ ID NO: 33 – Engineered AsCas12f sgRNA “Stem 2-2” (see e.g., FIG.19D): GGGAUUCGUCGGUUCAGCGACGAUAAGCCGAGAAGUGCUGCAAUAAAACUGUUAAGUGGUUU GCAGUAACGCUCGGUAAGG (SEQ ID NO: 33).
  • SEQ ID NO: 34 Engineered AsCas12f sgRNA “Stem 2-3” (see e.g., FIG.19D): GGGAUUCGUCGGUUCAGCGACGAUAAGCCGAGCGUUGCCAAUAAAACUGUUAAGUGGUUUGG UAACGCUCGGUAAGG (SEQ ID NO: 34).
  • Assays utilizing provided compositions [0225]
  • assays comprising Cas12f mediated endonuclease activity can be performed on polynucleotides that have reduced secondary structure.
  • polynucleotide secondary structure can be modified by controlling an assay temperature.
  • assays comprising Cas12f mediated endonuclease reactions are performed at a temperature of about 30 o C, about 31 o C, about 32 o C, about 33 o C, about 34 o C, about 35 o C, about 36 o C, about 37 o C, about 38 o C, about 39 o C, about 40 o C, about 41 o C, about 42 o C, about 43 o C, about 44 o C, about 45 o C, about 46 o C, about 47 o C, about 48 o C, about 49 o C, about 50 o C, about 51 o C, about 52 o C, about 53 o C, about 54 o C, or about 55 o C, or any range derivable therein.
  • multiple assays comprising Cas12f mediated endonuclease reactions can be performed in sequential order on one or more target loci.
  • assays comprising Cas12f endonuclease reactions can be performed for about 20 minutes, about 30 minutes, about 40 minutes, about 50 minutes, about 60 minutes, about 70 minutes, about 80 minutes, about 90 minutes, about 100 minutes, about 110 minutes, about 120 minutes, about 130 minutes, about 140 minutes, about 150 minutes, about 160 minutes, about 170 201118018.1 - 57 - minutes, or about 180 minutes, about 190 minutes, about 200 minutes, about 210 minutes, about 220 minutes, about 230 minutes, about 240 minutes, or any range derivable therein.
  • assays comprising Cas12f mediated endonuclease reactions can be performed for less than about 20 minutes. In certain aspects, assays comprising Cas12f mediated endonuclease reactions can be performed for greater than about 240 minutes. [0226] In certain aspects, assays comprising Cas12f mediated endonuclease activity are performed at a controlled pH. In certain aspects, a controlled pH can be between about 5.5 and about 8.5.
  • assays comprising Cas12f mediated endonuclease activity is performed at a pH of about 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7, 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 7.8, 7.9, or 8.0, or any range derivable therein.
  • assays comprising Cas12f mediated endonuclease activity is performed at a pH that is near neutral.
  • assays comprising Cas12f mediated endonuclease activity are performed at a pH of about or exactly 7.5.
  • endonuclease activity can be performed at a Cas12f enzyme concentration of about 1 ⁇ M to about 50 ⁇ M, including any range derivable therein.
  • a Cas12f enzyme concentration is about less than about 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 ⁇ M.
  • methods and/or assays described herein further comprise sequencing of a polynucleotide sequence.
  • sequencing may be done by any known methods for sequencing of nucleic acids.
  • target nucleic acids molecules are sequenced using any suitable sequencing technique known in the art.
  • target nucleic acids molecules are sequenced by Sanger sequencing.
  • the sequencing is single-molecule sequencing-by-synthesis. Single-molecule sequencing is shown for example in U.S. Pat. Nos.: 7,169,560, 6,818,395, 7,282,337, the contents of each of these references is incorporated by reference herein in its entirety.
  • sequencing nucleic acids may include Maxam-Gilbert techniques, Sanger type techniques, Sequencing by Synthesis methods (SBS), Sequencing by Hybridization (SBH), Sequencing by Ligation (SBL), Sequencing by Incorporation (SBI) techniques, massively parallel signature sequencing (MPSS), polony sequencing techniques, nanopore, waveguide and other single molecule detection techniques, reversible terminator techniques, or other sequencing technique now know or may be developed in the future. 201118018.1 - 58 - [0229]
  • the sequencing is Illumina sequencing.
  • Illumina sequencing is based on the amplification of DNA on a solid surface using fold-back PCR and anchored primers.
  • Genomic DNA is fragmented, and adapters are added to the 5' and 3' ends of the fragments.
  • DNA fragments that are attached to the surface of flow cell channels are extended and bridge amplified.
  • the fragments become double stranded, and the double stranded molecules are denatured.
  • Multiple cycles of the solid-phase amplification followed by denaturation can create several million clusters of approximately 1,000 copies of single-stranded DNA molecules of the same template in each channel of the flow cell.
  • Primers, DNA polymerase and four fluorophore-labeled, reversibly terminating nucleotides are used to perform sequential sequencing. After nucleotide incorporation, a laser is used to excite the fluorophores, and an image is captured and the identity of the first base is recorded.
  • Ion Torrent sequencing can be used. (See, e.g., U.S. patent application numbers 2009/0026082, 2009/0127589, 2010/0035252, 2010/0137143, 2010/0188073, 2010/0197507, 2010/0282617, 2010/0300559), 2010/0300895, 2010/0301398, and 2010/0304982), the content of each of which is incorporated by reference herein in its entirety.) Oligonucleotide adaptors are ligated to the ends of target nucleic acid molecules.
  • the adaptors serve as primers for amplification and sequencing of the fragments.
  • the fragments can be attached to a surface and is attached at a resolution such that the fragments are individually resolvable. Addition of one or more nucleotides releases a proton (H+), which signal detected and recorded in a sequencing instrument. The signal strength is proportional to the number of nucleotides incorporated.
  • sequencing a target polynucleotide that is an RNA comprises creating a complementary DNA (cDNA) from the target RNA.
  • sequencing the target RNA comprises reverse transcription.
  • sequencing the target RNA comprises contacting the target RNA with an enzyme capable of transcribing DNA using the target RNA as a template (e.g.
  • a cDNA of the target RNA is sequenced.
  • the sequence of the cDNA is determined, and the cDNA sequence is used to determine the sequence of the target RNA.
  • the target RNA is determined to have a polynucleotide sequence relative to a wild type sequence as a function of engineered Cas12f mediated polynucleotide cleavage.
  • sequencing the target polynucleotides comprises amplification of nucleic acids.
  • Amplification can be done by techniques known in the art, such as PCR, that 201118018.1 - 59 - uses primers, polymerase, deoxynucleoside triphosphates, buffers, and bivalent and monovalent cations in a reaction that generates copies of a target polynucleotide sequence from a single or few copies of the target polynucleotide sequence.
  • the reading of the sequenced target polynucleotide is quantitative and reflective of the proportion of cleavage and/or editing rates in a target polynucleotide.
  • methods described herein comprise analysis of polynucleotide sequences and/or polynucleotide sequencing results.
  • the analysis comprises the removal of adapter sequences, addition and/or reading of barcodes (e.g., random and/or known, added to the 3’ and/or 5’ end of a polynucleotide molecule), mapping to a genome, retainment of only uniquely mapped reads, and/or filtering for high-quality polynucleotide reads in which poorly processed polynucleotide fragments are removed.
  • the analysis comprises estimation of endonuclease cleavage rates, indel creation rates, homologous recombination rates, and/or non-homologous recombination rates for methods comprising enzymes and/or polynucleotides disclosed herein. III.
  • polypeptides and/or oligonucleotides described herein are encoded by a polynucleotide, such as a vector comprising a polynucleotide (e.g., a polynucleotide construct).
  • Vectors comprising polynucleotide constructs according to the present disclosure include all those known in the art, including cosmids, plasmids (e.g., naked or contained in liposomes) and viral constructs (e.g., lentiviral, retroviral, adenoviral, and adeno associated viral constructs) that incorporate a polynucleotide comprising an engineered Cas12f enzyme and/or engineered sgRNA or characteristic portions thereof (e.g., as utilized herein, a “characteristic portion thereof” refers to the portion of said protein required to perform the desired function, e.g., it comprises the ability to target and/or cleave a polynucleotide in a site specific manner).
  • cosmids e.g., naked or contained in liposomes
  • viral constructs e.g., lentiviral, retroviral, adenoviral, and adeno associated viral constructs
  • characteristic portions thereof e.g
  • a construct is a plasmid (i.e., a circular DNA molecule that can autonomously replicate inside a cell).
  • a construct can be a cosmid (e.g., pWE or sCos series).
  • a vector also comprises additional auxiliary components suitable for CRISPR/Cas system mediated site-directed mutagenesis, such as but not limited to, donor 201118018.1 - 60 - constructs and/or rescue constructs.
  • a donor construct and/or rescue construct may comprise a wild type and/or consensus coding sequence for a target gene of interest.
  • a target gene of interest is any gene of interest associated with a phenotype, such as but not limited to, a gene associated with a disease and/or disorder.
  • a construct is a viral construct.
  • a viral construct is a lentivirus, retrovirus, adenovirus, or adeno-associated virus construct.
  • a construct is an adeno-associated virus (AAV) construct (see, e.g., Asokan et al., Mol.
  • AAV adeno-associated virus
  • a viral construct is an adenovirus construct.
  • a viral construct may also be based on or derived from an alphavirus.
  • Alphaviruses include but are not limited to, Sindbis (and VEEV) virus, Aura virus, Babanki virus, Barmah Forest virus, Bebaru virus, Cabassou virus, Chikungunya virus, Eastern equine encephalitis virus, Everglades virus, Fort Morgan virus, Getah virus, Highlands J virus, Kyzylagach virus, Mayaro virus, Me Tri virus, Middelburg virus, Mosso das Pedras virus, Mucambo virus, Ndumu virus, O'nyong-nyong virus, Pixuna virus, Rio Negro virus, Ross River virus, Salmon pancreas disease virus, Semliki Forest virus, Southern elephant seal virus, Tonate virus, Trocara virus, Una virus, Venezuelan equine encephalitis virus, Western equine encephalitis virus, and Whataroa virus.
  • viruses encode nonstructural (e.g., replicon) and structural proteins (e.g., capsid and envelope) that can be translated in the cytoplasm of the host cell.
  • Ross River virus, Sindbis virus, Semliki Forest virus (SFV), and Venezuelan equine encephalitis virus (VEEV) have all been used to develop viral constructs for coding sequence delivery.
  • Pseudotyped viruses may be formed by combining alphaviral envelope glycoproteins and retroviral capsids. Examples of alphaviral constructs can be found in U.S. Publication Nos. 20150050243, 20090305344, and 20060177819; constructs and methods of their making are incorporated herein by reference for the purposes described herein.
  • constructs provided herein can be of different sizes.
  • a construct is a plasmid and can include a total length of up to about 1 kb, up to about 2 kb, up to about 3 kb, up to about 4 kb, up to about 5 kb, up to about 6 kb, up to about 7 kb, up to about 8 kb, up to about 9 kb, up to about 10 kb, up to about 11 kb, up to about 12 kb, up to about 13 kb, up to about 14 kb, or up to about 15 kb.
  • a construct is a plasmid and can have a total length in a range of about 1 kb to about 2 kb, about 1 kb to about 3 kb, about 1 kb to about 4 kb, about 1 kb to about 5 kb, about 1 kb to about 6 kb, about 1 kb to about 7 kb, about 1 kb to about 8 kb, about 1 kb to about 9 kb, about 1 kb to about 10 kb, about 1 kb 201118018.1 - 61 - to about 11 kb, about 1 kb to about 12 kb, about 1 kb to about 13 kb, about 1 kb to about 14 kb, or about 1 kb to about 15 kb.
  • a construct is a viral construct and can have a total number of nucleotides of up to 10 kb. In some aspects, a viral construct can have a total number of nucleotides in the range of about 4.5 kb to 5 kb, or about 4.7 kb.
  • a viral construct can have a total number of nucleotides in the range of about 1 kb to about 2 kb, 1 kb to about 3 kb, about 1 kb to about 4 kb, about 1 kb to about 5 kb, about 1 kb to about 6 kb, about 1 kb to about 7 kb, about 1 kb to about 8 kb, about 1 kb to about 9 kb, about 1 kb to about 1 O kb, about 2 kb to about 3 kb, about 2 kb to about 4 kb, about 2 kb to about 5 kb, about 2 kb to about 6 kb, about 2 kb to about 7 kb, about 2 kb to about 8 kb, about 2 kb to about 9 kb, about 2 kb to about 10 kb, about 3 kb to about 4 kb, about 3 kb to about 5 kb, about 3 kb to
  • a construct is a lentivirus construct and can have a total number of nucleotides of up to 8 kb.
  • a lentivirus construct can have a total number of nucleotides of about 1 kb to about 2 kb, about 1 kb to about 3 kb, about 1 kb to about 4 kb, about 1 kb to about 5 kb, about 1 kb to about 6 kb, about 1 kb to about 7 kb, about 1 kb to about 8 kb, about 2 kb to about 3 kb, about 2 kb to about 4 kb, about 2 kb to about 5 kb, about 2 kb to about 6 kb, about 2 kb to about 7 kb, about 2 kb to about 8 kb, about 3 kb to about 4 kb, about 3 kb to about 4 kb, about 3 kb to about 5 kb, about 2 kb to about 6
  • a construct is an adenovirus construct and can have a total number of nucleotides of up to 8 kb.
  • an adenovirus construct can have a total number of nucleotides in the range of about 1 kb to about 2 kb, about 1 kb to about 3 kb, about 1 kb to about 4 kb, about 1 kb to about 5 kb, about 1 kb to about 6 kb, about 1 kb to about 7 kb, about 1 kb to about 8 kb, about 2 kb to about 3 kb, about 2 kb to about 4 kb, about 2 kb to about 5 kb, about 2 kb to about 6 kb, about 2 kb to about 7 kb, about 2 kb to about 8 kb, about 3 kb to 201118018.1 - 62 - about 4 kb, about 3 kb to about 5 k
  • any of the constructs described herein can further include a control sequence, e.g., a control sequence selected from the group of a transcription initiation sequence, a transcription termination sequence, a promoter sequence, an enhancer sequence, an RNA splicing sequence, a polyadenylation (poly(A)) sequence, a Kozak consensus sequence, and/or additional untranslated regions which may house pre- or post-transcriptional regulatory and/or control elements.
  • a promoter can be a native promoter, a constitutive promoter, an inducible promoter, and/or a tissue-specific promoter.
  • control sequences are described herein.
  • AAV particles that comprise a polynucleotide construct encoding an engineered Cas12f and/or engineered sgRNA oligonucleotide, and an AAV capsid.
  • AAV particles can be described as having a serotype, which is a description of the construct strain and the capsid strain.
  • an AAV particle may be described as AAV2, wherein the particle has an AAV2 capsid and a construct that comprises characteristic AAV2 Inverted Terminal Repeats (ITRs).
  • ITRs Inverted Terminal Repeats
  • an AAV particle may be described as a pseudotype, wherein the capsid and construct are derived from different AAV strains, for example, AAV2/9 would refer to an AAV particle that comprises a construct utilizing the AAV2 ITRs and an AAV9 capsid. Additional examples of pseudotyped AAV vectors include, but are not limited to, AAV2/1, AAV2/2, AAV2/3, AAV2/4, AAV2/5, AAV2/6, AAV2/7, AAV2/8 and AAV2/9. [0244] In some aspects, AAV particles suitable for use according to the present disclosure may comprise or be derived from any natural or recombinant AAV serotype.
  • an AAV according to the present invention is selected from natural serotypes such as AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, and AAV12; or pseudotypes, chimeras, and variants thereof.
  • chimera when referring to an AAV vector, or a "chimeric AAV vector”, refers to an AAV vector which comprises a capsid containing VP1, VP2 and VP3 proteins from at least two different AAV serotypes; or alternatively, which comprises VP1, VP2 and VP3 proteins, at least one of which comprises at least a portion from another 201118018.1 - 63 - AAV serotype.
  • chimeric AAV vectors include, but are not limited to, AAV-DJ, AAV-DJ/8, AAV2G9, AAV2i8, AAV2i8G9, AAV8G9, and AAV9i1.
  • an AAV serotype and/or pseudotype according to the present invention is selected from the group comprising or consisting of AAV1, AAV2, AAV3, AAV 4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV106.1/hu.37, AAV114.3/hu.40, AAV127.2/hu.41, AAV127.5/hu.42, AAV128.1/hu.43, AAV128.3/hu.44, AAV130.4/hu.48, AAV145.1/hu.53, AAV145.5/hu.54, AAV145.6/hu.55, AAV16.12/hu.11, AAV16.3, AAV16.8/hu.10, AAV161.10/hu.60, AAV161.6/hu.61, AAV1-7/rh.48, AAV1- 8/rh.49, AAV2i8, AAV2i
  • an AAV is an AAV variant that has been genetically modified, e.g., by substitution, deletion or addition of one or several amino acid residues in one or more capsid proteins.
  • examples of such variants include, but are not limited to, AAV2 with one or more of Y444F, Y500F, Y730F and/or S662V mutations; AAV3 with one or more of Y705F, Y731F and/or T492V mutations; and AAV6 with one or more of S663V and/or T492V mutations.
  • an AAV capsid is modified to comprise at least one surface-bound saccharide or a derivative thereof.
  • the term "surface-bound", when referring to the at least one saccharide, means that said at least one saccharide is bound to and exposed at the outer surface of the AAV vector.
  • Suitable examples of saccharides include, but are not limited to, monosaccharides, oligosaccharides, polysaccharides, and derivatives thereof. 201118018.1 - 66 - B.
  • AAV constructs [0249]
  • the present disclosure provides polynucleotide vectors (e.g., polynucleotide constructs) that comprise a nucleotide sequence encoding an engineered Cas12f protein and/or engineered sgRNA oligonucleotide.
  • a polynucleotide vector comprising a nucleotide sequence encoding an engineered Cas12f protein and/or engineered sgRNA oligonucleotide, can be comprised in an AAV capsid to produce an AAV particle (e.g., an AAV particle comprises an AAV construct comprised in an AAV capsid).
  • a polynucleotide construct comprises one or more components derived from or modified from a naturally occurring AAV genomic construct.
  • a sequence derived from an AAV construct is an AAV1 construct, an AAV2 construct, an AAV3 construct, an AAV4 construct, an AAV5 construct, an AAV6 construct, an AAV7 construct, an AAV8 construct, an AAV DJ/8 construct, an AAV9 construct, an AAV2.7m8 construct, an AAV8BP2 construct, an AAV293 construct, an AAVPhp.B construct, or AAVPhp.eB construct (see e.g., Chan et al., 2017). Additional exemplary AAV constructs that can be used herein are known in the art. See, e.g., Kanaan et al., Mol. Ther.
  • AAV derived sequences typically include the cis-acting 5' and 3' ITR sequences (see, e.g., B.
  • Typical AAV2-derived ITR sequences are about 145 nucleotides in length. In some aspects, at least or exactly 80% of a typical ITR sequence (e.g., at least or exactly 85%, at least or exactly 90%, at least or exactly 95%, or at least or exactly 100%, etc.) is incorporated into a construct provided herein. The ability to modify these ITR sequences is within the skill of the art. (See, e.g., texts such as Sambrook et al., "Molecular Cloning.
  • any of the coding sequences and/or constructs described herein are flanked by 5' and 3' AAV ITR 201118018.1 - 67 - sequences.
  • the AAV ITR sequences may be obtained from any known AAV, including presently identified AAV types.
  • polynucleotide constructs described in accordance with this disclosure and in a pattern known to the art see, e.g., Asokan et al., Mal.
  • Ther.20: 699- 7080, 2012, which is incorporated herein by reference for the purposes described herein) are typically comprised of, a coding sequence or a portion thereof, at least one and/or control sequence, and optionally 5' and 3' AAV inverted terminal repeats (ITRs).
  • ITRs optionally 5' and 3' AAV inverted terminal repeats
  • provided constructs can be packaged into a capsid to create an AAV particle.
  • An AAV particle may be delivered to a selected target cell.
  • provided constructs comprise an additional optional coding sequence that is a nucleic acid sequence (e.g., inhibitory nucleic acid sequence), heterologous to the construct sequences, which encodes a polypeptide, protein, functional RNA molecule (e.g., miRNA, miRNA inhibitor) or other gene product, of interest.
  • a nucleic acid coding sequence is operatively linked to and/or control components in a manner that permits coding sequence transcription, translation, and/or expression in a cell of a target tissue.
  • an unmodified AAV endogenous genome includes two open reading frames, "cap” and "rep,” which are flanked by ITRs.
  • recombinant AAV constructs similarly comprise one or more open reading frames flanked by ITR sequences.
  • an AAV construct also comprises conventional control elements that are operably linked to the coding sequence in a manner that permits its transcription, translation and/or expression in a cell transfected with the polynucleotide construct or infected with a virus particle produced by the disclosure.
  • an AAV construct optionally comprises a promoter, an enhancer, an untranslated region (e.g., a 5' UTR, 3' UTR), a Kozak sequence, an internal ribosomal entry site (IRES), splicing sites (e.g., an acceptor site, a donor site), a polyadenylation site, or any combination thereof.
  • a construct is an AAV construct.
  • an AAV construct can include at least 500 bp, at least 1 kb, at least 1.5 kb, at least 2 kb, at least 2.5 kb, at least 3 kb, at least 3.5 kb, at least 4 kb, at least 4.5 kb, or at least 4.7 kb.
  • an AAV construct can include at most 7.5 kb, at most 7 kb, at most 6.5 kb, at most 6 kb, at most 5.5 kb, at most 5 kb, at most 4.5 kb, at most 4 kb, at most 3.5 kb, at most 3 kb, or at most 2.5 kb.
  • an AAV construct can include about 1 kb to about 2 kb, about 1 kb to about 3 kb, about 1 kb to about 4 kb, about 1 kb to about 5 kb, about 2 kb to about 3 kb, about 2 kb to about 4 kb, about 2 kb to about 5 kb, about 3 kb to about 4 kb, about 3 kb to about 5 kb, or about 4 kb to about 5 kb.
  • any of the constructs described herein can further include regulatory and/or control sequences, e.g., a control sequence selected from the group of a transcription initiation sequence, a transcription termination sequence, a promoter sequence, an enhancer sequence, an RNA splicing sequence, a polyadenylation (poly(A)) sequence, a Kozak consensus sequence, and/or any combination thereof.
  • a promoter can be a native promoter, a constitutive promoter, an inducible promoter, and/or a tissue-specific promoter.
  • Non-limiting examples of control sequences are described herein and others are known in the art C.
  • an AAV capsid is from or is derived from an AAV capsid of an AAV2, 3, 4, 5, 6, 7, 8, 9, 10, rh8, rhl0, rh39, rh43 or Ancestral serotype, or one or more hybrids thereof.
  • an AAV capsid is from an AAV ancestral serotype.
  • an AAV capsid is an ancestral (Anc) AAV capsid.
  • An Anc capsid is created from a construct sequence that is constructed using evolutionary probabilities and evolutionary modeling to determine a probable ancestral sequence.
  • AAV capsid/construct sequence is not known to have existed in nature.
  • any combination of AAV capsids and AAV constructs may be used in recombinant AAV particles of the present disclosure.
  • D. Exemplary AAV construct components 1.
  • ITRs Inverted Terminal Repeat Sequences
  • AAV derived sequences of a construct typically comprises the cis-acting 5' and 3' ITRs (See, e.g., B. J. Carter, in "Handbook of Parvoviruses", ed., P. Tijsser, CRC Press, pp.
  • An AAV particle of the present disclosure can comprise an AAV construct comprising a coding sequence (e.g., engineered CRISPR/Cas components, such as engineered Cas12f and/or engineered sgRNA) and associated elements flanked by a 5' and a 3' AAV ITR sequences.
  • a coding sequence e.g., engineered CRISPR/Cas components, such as engineered Cas12f and/or engineered sgRNA
  • an ITR is or comprises about 130 nucleic acids. In some aspects, an ITR is or comprises about 145 nucleic acids. In some aspects, all or substantially all of a sequence encoding an ITR is used. In some aspects, an AAV ITR sequence may be obtained 201118018.1 - 69 - from any known AAV, including presently identified mammalian AAV types. In some aspects an ITR is an AAV2 ITR. In some aspects, an ITR is an AAV9 ITR.
  • a non-limiting example of a polynucleotide construct of the present disclosure is a "cisacting" construct comprising a coding sequence, in which said sequence and any associated regulatory elements are flanked by 5' or "left” and 3' or “right” AAV ITR sequences.5' and left designations refer to a position of an ITR sequence relative to an entire construct, read left to right, in a sense direction.
  • a 5' or left ITR is an ITR that is closest to a promoter (e.g., as opposed to a polyadenylation sequence) for a given construct, when a construct is depicted in a sense orientation, linearly.
  • 3' and right designations refer to a position of an ITR sequence relative to an entire construct, read left to right, in a sense direction.
  • a 3' or right ITR is an ITR that is closest to a polyadenylation sequence and/or stop codon (e.g., as opposed to a promoter sequence) for a given construct, when a construct is depicted in a sense orientation, linearly.
  • ITRs as provided herein are depicted in 5' to 3' order in accordance with a sense strand.
  • a 5' or "left” orientation ITR can also be depicted as a 3' or “right” ITR when converting from sense to anti sense direction. Further, it is well within the ability of one of skill in the art to transform a given sense ITR sequence (e.g., a 5'/left AAV ITR) into an antisense sequence (e.g., 3'/right ITR sequence). One of ordinary skill in the art would understand how to modify a given ITR sequence for use as either a 5'/left or 3'/right ITR, or an antisense version thereof.
  • a construct e.g., an AAV construct
  • promoter refers to a DNA sequence recognized by enzymes/proteins that can promote and/or initiate transcription of an operably linked gene.
  • a promoter typically refers to, e.g., a nucleotide sequence to which an RNA polymerase and/or any associated factor binds and from which it can initiate transcription.
  • a construct e.g., an AAV construct
  • a promoter is an inducible promoter, a constitutive promoter, a mammalian cell promoter, a viral promoter, a chimeric promoter, an engineered promoter, a tissue-specific promoter, or any other type of promoter known in the art.
  • a promoter is a RNA polymerase II promoter, such as a mammalian RNA polymerase II promoter.
  • a promoter is a RNA polymerase III promoter, including, but not 201118018.1 - 70 - limited to, a HI promoter, a human U6 promoter, a mouse U6 promoter, or a swine U6 promoter.
  • a promoter will generally be one that is able to promote transcription in a mammalian cell.
  • a variety of promoters are known in the art, which in some aspects, can be used herein.
  • Nonlimiting examples of promoters that can be used herein in some aspects include: human EFl ⁇ , human cytomegalovirus (CMV) (US Patent No.5,168,062, which is incorporated herein by reference for the purposes described herein), human ubiquitin C (UBC), mouse phosphoglycerate kinase 1, polyoma adenovirus, simian virus 40 (SV40), ⁇ -globin, ⁇ -actin, ⁇ - fetoprotein, ⁇ -globin, ⁇ -interferon, ⁇ -glutamyl transferase, mouse mammary tumor virus (MMTV), Rous sarcoma virus, rat insulin, glyceraldehyde-3-phosphate dehydrogenase, metallothionein II (MT II), am
  • a promoter is the CMV immediate early promoter.
  • the promoter is a CAG promoter and/or a CAG/CBA promoter.
  • RNA refers to a nucleotide sequence that, when operably linked with a nucleic acid encoding a gene (e.g., encoding engineered CRISPR/Cas components, such as engineered Cas12f and/or engineered sgRNA oligonucleotides), causes RNA to be transcribed from the nucleic acid in a cell under most or all physiological conditions.
  • a gene e.g., encoding engineered CRISPR/Cas components, such as engineered Cas12f and/or engineered sgRNA oligonucleotides
  • constitutive promoters include, without limitation, the retroviral Rous sarcoma virus (RSV) LTR promoter, the cytomegalovirus (CMV) promoter (see, e.g., Boshart et al., Cell 41:521-530, 1985, which is incorporated herein by reference for the purposes described herein), the SV 40 promoter, the dihydrofolate reductase promoter, the beta-actin promoter, the phosphoglycerol kinase (PGK) promoter, and the EFl-alpha promoter (Invitrogen).
  • RSV Rous sarcoma virus
  • CMV cytomegalovirus
  • Inducible promoters allow regulation of gene expression and can be regulated by exogenously supplied compounds, environmental factors such as temperature, or the presence of a specific physiological state, e.g., acute phase, a particular differentiation state of the cell, or in replicating cells only.
  • Inducible promoters and inducible systems are available from a variety of commercial sources, including, without limitation, Invitrogen, Clontech, and Ariad. Additional examples of inducible promoters are known in the art.
  • inducible promoters regulated by exogenously supplied compounds include the zinc-inducible sheep metallothionein (MT) promoter, the dexamethasone (Dex) inducible mouse mammary tumor virus (MMTV) promoter, the T7 polymerase promoter system (see e.g., WO 98/10088, which is incorporated herein by reference for the purposes described herein); the ecdysone insect promoter (see e.g., No et al., Proc. Natl. Acad Sci.
  • MT zinc-inducible sheep metallothionein
  • Dex dexamethasone
  • MMTV mouse mammary tumor virus
  • T7 polymerase promoter system see e.g., WO 98/10088, which is incorporated herein by reference for the purposes described herein
  • ecdysone insect promoter see e.g., No et al., Proc. Natl. Acad Sci.
  • tissue-specific promoter refers to a promoter that is active only in certain specific cell types and/or tissues (e.g., transcription of a specific gene occurs only within cells expressing transcription regulatory and/or control proteins that bind to the tissue-specific promoter).
  • regulatory and/or control sequences impart tissue-specific gene expression capabilities.
  • tissue-specific regulatory and/or control sequences bind tissue-specific transcription factors that induce transcription in a tissue-specific manner.
  • a tissue-specific promoter is a neuron-specific promoter.
  • a tissue-specific promoter is hematopoietic lineage cell-specific promoter.
  • a tissue-specific promoter is an immune cell-specific promoter. 3.
  • a construct can include an enhancer sequence.
  • enhancer sequence refers to a nucleotide sequence that can increase the level of transcription of a 201118018.1 - 72 - nucleic acid encoding a protein of interest (e.g., a Cas12f enzyme and/or sgRNA oligonucleotide), and/or increase or modify the translational efficiency of a transcript following transcription.
  • enhancer sequences generally 50-1500 bp in length
  • transcription- associated proteins e.g., transcription factors
  • an enhancer sequence is found within an intronic sequence. In some aspects, an enhancer sequence is found in a 3 ⁇ and/or 5 ⁇ UTR. In some aspects, an enhancer region is found downstream of a coding sequence comprising a transgene and proximal to a poly adenylation sequence. Unlike promoter sequences, enhancer sequences can act at much larger distance away from the transcription start site (e.g., as compared to a promoter).
  • Non-limiting examples of enhancers include a woodchuck hepatitis virus post- transcriptional regulatory element (WPRE), RSV enhancer, a CMV enhancer, and/or a SV40 enhancer. 4.
  • WPRE woodchuck hepatitis virus post- transcriptional regulatory element
  • any of the constructs described herein can include an untranslated region (UTR), such as a 5' UTR or a 3' UTR.
  • UTRs of a gene are transcribed but not translated.
  • a 5' UTR starts at the transcription start site and continues to the start codon but does not include the start codon.
  • a 3' UTR starts immediately following the stop codon and continues until the transcriptional termination signal.
  • the regulatory and/or control features of a UTR can be incorporated into any of the constructs, particles, polynucleotides, compositions, kits, or methods as described herein to enhance or otherwise modulate the expression of a gene.
  • Natural 5' UTRs include a sequence that plays a role in translation initiation.
  • a 5' UTR can comprise sequences, like Kozak sequences, which are commonly known to be involved in the process by which the ribosome initiates translation of many genes.
  • Kozak sequences have the consensus sequence CCR(A/G)CCAUGG, where R is a purine (A or G) three bases upstream of the start codon (AUG), and the start codon is followed by another “G”.
  • 5' UTRs also form secondary structures that are involved in elongation factor binding.
  • a 5' UTR is included in any of the constructs described herein.
  • Non-limiting examples of 5' UTRs including those from the following genes: albumin, serum amyloid A, Apolipoprotein A/B/E, transferrin, alpha fetoprotein, erythropoietin, and Factor VIII, can be used to enhance expression of a nucleic acid molecule, such as an mRNA.
  • 3' UTRs are known to have stretches of adenosines and uridines (in the RNA form) or thymidines (in the DNA form) embedded in them. These AU-rich signatures are particularly 201118018.1 - 73 - prevalent in genes with high rates of turnover.
  • AU-rich elements can be separated into three classes (see e.g., Chen et al., Mol. Cell. Biol.15:5777-5788, 1995; Chen et al., Mol. Cell Biol.15:2010-2018, 1995, each of which is incorporated herein by reference for the purposes described herein): Class I AREs contain several dispersed copies of an AUUUA motif within U-rich regions. For example, c- Myc and MyoD mRNAs contain class I AREs. Class II AREs possess two or more overlapping UUAUUUA(U/A) (U/A) nonamers.
  • GM-CSF and TNF-alpha mRNAs are examples that contain class II AREs.
  • Class III AREs are less well defined. These U-rich regions do not contain an AUUUA motif, two well-studied examples of this class are c-Jun and myogenin mRNAs.
  • Most proteins binding to the AREs are known to destabilize the messenger, whereas members of the ELAV family, most notably HuR, have been documented to increase the stability of mRNA.
  • HuR binds to AREs of all the three classes. Engineering the HuR specific binding sites into the 3' UTR of nucleic acid molecules may lead to HuR binding and thus, stabilization of the message in vivo.
  • the introduction, removal, or modification of 3' UTR AREs can be used to modulate the stability of an mRNA encoding a gene of interest.
  • AREs can be removed or mutated to increase the intracellular stability and thus increase translation and production of a protein of interest.
  • non-ARE sequences may be incorporated into the 5' or 3' UTRs.
  • introns or portions of intron sequences may be incorporated into the flanking regions of the polynucleotides in any of the constructs, particles, polynucleotides, compositions, kits, and methods provided herein. Incorporation of intronic sequences may increase protein production as well as mRNA levels. 5.
  • a construct described herein can include an internal ribosome entry site (IRES).
  • IRES forms a complex secondary structure that allows translation initiation to occur from any position with an mRNA immediately downstream from where the IRES is located (see, e.g., Pelletier and Sonenberg, Mol. Cell. Biol. 8(3): 1103-1112, 1988, which is incorporated herein by reference for the purposes described herein).
  • IRES sequences known to those in skilled in the art, including those from, e.g., foot and mouth disease virus (FMDV), encephalomyocarditis virus (EMCV), human rhinovirus (HRV), cricket paralysis virus, human immunodeficiency virus (HIV), hepatitis A virus (HA V), hepatitis C virus (HCV), and poliovirus (PV) (see e.g., Alberts, Molecular Biology of the Cell, Garland 201118018.1 - 74 - Science, 2002; and Hellen et al., Genes Dev. 15(13):1593-612, 2001, each of which are incorporated herein by reference for the purposes described herein).
  • FMDV foot and mouth disease virus
  • EMCV encephalomyocarditis virus
  • HRV human rhinovirus
  • HCV hepatitis A virus
  • HCV hepatitis C virus
  • PV poliovirus
  • an IRES sequence that is incorporated into a construct described herein is the foot and mouth disease virus (FMDV) 2A sequence.
  • the Foot and Mouth Disease Virus 2A sequence is a small peptide (approximately 18 amino acids in length) that has been shown to mediate the cleavage of polyproteins (see e.g., Ryan, MD et al., EMBO 4:928-933, 1994; Mattion et al., J Virology 70:8124-8127, 1996; Furler et al., Gene Therapy 8:864-873, 2001; and Halpin et al., Plant Journal 4:453-459, 1999, each of which is incorporated herein by reference for the purposes described herein).
  • the cleavage activity of the 2A sequence has previously been demonstrated in artificial systems including plasmids and gene therapy constructs (e.g., AAV and retroviruses) (see e.g., Ryan et al., EMBO 4:928-933, 1994; Mattion et al., J Virology 70:8124-8127, 1996; Furler et al., Gene Therapy 8:864-873, 2001; and Halpin et al., Plant Journal 4:453-459, 1999; de Felipe et al., Gene Therapy 6: 198-208, 1999; de Felipe et al., Human Gene Therapy II: 1921-1931, 2000; and Klump et al., Gene Therapy 8:811-817, 2001, each of which is incorporated herein by reference for the purposes described herein).
  • gene therapy constructs e.g., AAV and retroviruses
  • an IRES can be utilized in an AAV construct.
  • a construct can include a polynucleotide internal ribosome entry site (IRES).
  • IRES can be part of a composition comprising more than one construct.
  • an IRES is used to produce more than one polypeptide from a single gene transcript. 6.
  • Splice sites [0276] In some aspects, any of the constructs provided herein can include splice donor and/or splice acceptor sequences, which are functional during RNA processing occurring during transcription. In some aspects, splice sites are involved in trans-splicing. 7.
  • a construct provided herein can include a polyadenylation (poly(A)) signal sequence.
  • poly(A) polyadenylation
  • a poly(A) tail confers mRNA stability and transferability (see e.g., Molecular Biology of the Cell, Third Edition by B.
  • polyadenylation refers to the covalent linkage of a polyadenylyl moiety, or its modified variant, to a messenger RNA molecule. In eukaryotic organisms, most messenger RNA (mRNA) molecules are polyadenylated at the 3' end.
  • a 3' poly(A) tail is a long sequence of adenine nucleotides (e.g., 50, 60, 70, 100, 200, 500, 1000, 2000, 3000, 4000, or 5000) added to the pre-mRNA through the action of an enzyme, polyadenylate polymerase.
  • a poly(A) tail is added onto transcripts that contain a specific sequence, e.g., a poly(A) signal.
  • a poly(A) tail and associated proteins aid in protecting mRNA from degradation by exonucleases. Polyadenylation also plays a role in transcription termination, export of the mRNA from the nucleus, and translation.
  • Polyadenylation typically occurs in the nucleus immediately after transcription of DNA into RNA, but also can occur later in the cytoplasm.
  • an mRNA chain is cleaved through the action of an endonuclease complex associated with RNA polymerase.
  • a cleavage site is usually characterized by the presence of the base sequence AAUAAA near the cleavage site.
  • adenosine residues are added to the free 3' end at the cleavage site.
  • a "poly(A) signal sequence” or “polyadenylation signal sequence” is a sequence that triggers the endonuclease cleavage of an mRNA and the addition of a series of adenosines to the 3' end of the cleaved mRNA.
  • poly(A) signal sequences There are several poly(A) signal sequences that can be used in some aspects, including those derived from bovine growth hormone (bGH) (Woychik et al., Proc. Natl. Acad Sci. U.S.A.81(13):3944-3948, 1984; U.S.
  • Patent No.5,122,458 each of which is incorporated herein by reference for the purposes described herein
  • mouse- ⁇ -globin mouse- ⁇ -globin
  • mouse- ⁇ -globin human collagen
  • polyoma virus Bacillus Virus
  • HSV TK Herpes simplex virus thymidine kinase gene
  • IgG heavy-chain gene polyadenylation signal US 2006/0040354, which is incorporated herein by reference for the purposes described herein
  • human growth hormone hGH
  • SV40 poly(A) site such as the SV40 late and early poly(A) site (see e.g., Schek et al., Mol Cell Biol.
  • the poly(A) signal sequence can be AATAAA.
  • the AATAAA sequence may be substituted with other hexanucleotide sequences with homology to AATAAA and that are capable of signaling polyadenylation, including ATTAAA, AGTAAA, CATAAA, TATAAA, GATAAA, ACTAAA, AATATA, AAGAAA, AATAAT, AAAAAA, AATGAA, AATCAA, AACAAA, AATCAA, AATAAC, AATAGA, AATTAA, or AATAAG (see, e.g., WO 06/12414, which is incorporated herein by reference for the purposes described herein).
  • a poly(A) signal sequence can be a synthetic polyadenylation site (see, e.g., the pCl-neo expression construct of Promega that is based on Levitt et al., Genes Dev. 3(7):1019-1025, 1989, which is incorporated herein by reference for the purposes described herein). 8. Additional sequences [0282]
  • constructs of the present disclosure may comprise a 2A element or sequence.
  • constructs of the present disclosure may include one or more cloning sites. In some such aspects, cloning sites may not be fully removed prior to manufacturing for administration to a subject.
  • cloning sites may have functional roles including as linker sequences, or as portions of a Kozak site. As will be appreciated by those skilled in the art, cloning sites may vary significantly in primary sequence while retaining their desired function.
  • a 2A element is a T2A, P2A, E2A, and/or F2A element.
  • a 2A sequence may comprise an optional 5 ⁇ linker sequence, such as but not limited to GSG (e.g., Glycine, Serine, Glycine). 9.
  • any of the constructs provided herein can optionally include a sequence encoding a destabilizing domain ("a destabilizing sequence") for temporal and/or spatial control of protein expression.
  • destabilizing sequences include sequences encoding a FK506 sequence, a dihydrofolate reductase (DHFR) sequence, or other exemplary destabilizing sequences.
  • DHFR dihydrofolate reductase
  • protein degradation is inhibited, thereby allowing the protein sequence operatively linked to the destabilizing sequence to be actively expressed.
  • protein expression can be detected by conventional means, 201118018.1 - 77 - including enzymatic, radiographic, colorimetric, fluorescence, or other spectrographic assays, fluorescent activating cell sorting (FACS) assays, and/or immunological assays (e.g., enzyme linked immunosorbent assay (ELISA), radioimmunoassay (RIA), and immunohistochemistry).
  • FACS fluorescent activating cell sorting
  • immunological assays e.g., enzyme linked immunosorbent assay (ELISA), radioimmunoassay (RIA), and immunohistochemistry.
  • the destabilizing sequence is a FK506- and rapamycin-binding protein (FKBP12) sequence
  • the stabilizing ligand is Shield-I (Shld1)
  • a destabilizing sequence is a DHFR sequence
  • a stabilizing ligand is trimethoprim (TMP) (see e.g., Iwamoto et al., (2010) Chem Biol 17:981-988, which is incorporated herein by reference for the purposes described herein).
  • constructs provided herein can optionally include a sequence encoding a reporter polypeptide and/or protein ("a reporter sequence").
  • reporter sequences include DNA sequences encoding: a beta-lactamase, a betagalactosidase (LacZ), an alkaline phosphatase, a thymidine kinase, a green fluorescent protein (GFP), a red fluorescent protein, an mCherry fluorescent protein, a yellow fluorescent protein, a chloramphenicol acetyltransferase (CAT), and a luciferase. Additional examples of reporter sequences are known in the art.
  • the reporter sequence When associated with control elements which drive their expression, the reporter sequence can provide signals detectable by conventional means, including enzymatic, radiographic, colorimetric, fluorescence, or other spectrographic assays, fluorescent activating cell sorting (FACS) assays and/or immunological assays (e.g., enzyme linked immunosorbent assay (ELISA), radioimmunoassay (RIA), and immunohistochemistry).
  • FACS fluorescent activating cell sorting
  • immunological assays e.g., enzyme linked immunosorbent assay (ELISA), radioimmunoassay (RIA), and immunohistochemistry.
  • a reporter sequence is a FLAG tag (e.g., a 3xFLAG tag), and the presence of a construct carrying the FLAG tag in a cell is detected by protein binding or detection assays (e.g., Western blots, immunohistochemistry, radioimmunoassay (RIA), mass spectrometry).
  • a reporter sequence is the Lacz gene, and the presence of a construct carrying the Lacz gene in a cell is detected by assays for beta-galactosidase activity.
  • a reporter sequence is a fluorescent protein (e.g., green fluorescent protein (GFP)) or luciferase.
  • GFP green fluorescent protein
  • the presence of a construct carrying the fluorescent protein or luciferase in a cell may be measured by fluorescent imaging techniques (e.g., fluorescent microscopy or FACS) or light production in a luminometer (e.g., a spectrophotometer or an IVIS imaging instrument).
  • kits comprising proteins, polypeptides, polynucleotides, oligonucleotides, vectors, particles, and/or compositions described herein.
  • kits comprising proteins, polypeptides, polynucleotides, oligonucleotides, vectors, particles, and/or compositions described herein.
  • Each kit may also include additional components that are useful for in vivo and/or in vitro utilization of proteins, polypeptides, polynucleotides, oligonucleotides, vectors, particles, and/or compositions disclosed herein.
  • kits may optionally provide additional components that are useful in instructing a practitioner regarding proper use of methods comprising use of proteins, polypeptides, polynucleotides, oligonucleotides, vectors, particles, and/or compositions described herein.
  • a kit may also include components such as but not limited to, buffers, capture reagents, developing reagents, labels, reacting surfaces, means for detection, control samples, instructions, and interpretive information.
  • the kit may also include reagents for polynucleotide, oligonucleotide, polypeptide, protein, vector, and/or particle isolation and/or purification. V.
  • constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein may be comprised in a formulation with one or more additional therapeutic agents.
  • constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein may be comprised in a formulation wherein the formulation comprises pharmaceutically acceptable excipients.
  • constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein may be administered to a cell in an in vitro environment.
  • a cell may be derived from a subject.
  • a cell is an immune cell, a stem cell, an induced pluripotent stem cell, a precursor cell, and/or a terminally differentiated cell.
  • constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein may be administered to a cell in vivo via administration to a subject.
  • constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein are administered to a subject in need thereof.
  • a subject may have, may be diagnosed with, or may be susceptible to a disease, such as an infectious disease, a genetic disorder, an autoimmune disease, and/or cancer.
  • a subject is a mammal.
  • a subject is a domestic animal.
  • a subject is a farm animal.
  • a subject is a zoo animal.
  • a subject is a dog or a cat.
  • a subject is a cow, a horse, a sheep, or a goat.
  • a subject can be but is not limited to, a dog, cat, ferret, rabbit, cow, duck, pig, goat, chicken, horse, llama, camel, ostrich, deer, turkey, dove, sheep, goose, oxen, and/or reindeer.
  • a subject is a human.
  • a subject is equal to, less than, or greater than 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, or 100 years of age.
  • administration regimens comprising constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein comprise administering of more than one composition, such as 2 compositions, 3 compositions, 4 compositions, or more than 4 compositions.
  • compositions such as 2 compositions, 3 compositions, 4 compositions, or more than 4 compositions.
  • constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions of the disclosure may be administered by the same route of administration or by different routes of administration.
  • agents described herein and/or additional therapeutic agents are administered intravenously, intramuscularly, subcutaneously, topically, orally, transdermally, intraperitoneally, intraorbitally, by implantation, by inhalation, intrathecally, intraventricularly, or intranasally.
  • constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein and/or additional therapeutic agents are administered intravenously, intramuscularly, subcutaneously, topically, orally, transdermally, intraperitoneally, intraorbitally, by implantation, by inhalation, intrathecally, intraventricularly, or intranasally.
  • an appropriate dosage may be determined based on the type of disease to be treated and/or prevented, severity, and/or course of the disease, the clinical condition of the individual, the individual's clinical history and response to the treatment, and/or at the discretion of the attending physician.
  • administration to a subject may include various “unit doses.” Unit dose is defined as containing a predetermined-quantity of the therapeutic composition. The quantity to be administered, and the particular route and formulation, is within the skill of determination of those in the clinical arts. A unit dose need not be administered as a single 201118018.1 - 80 - injection but may comprise continuous infusion over a set period of time. In some aspects, a unit dose comprises a single administrable dose.
  • the quantity to be administered depends on the treatment effect desired.
  • An effective dose is understood to refer to an amount necessary to achieve a particular effect.
  • doses in the range from 0.10 mg/kg to 200 mg/kg can affect the functionality of the described agents.
  • doses may comprise a composition comprising an AAV particle in a concentration of about 10 8 to about 10 14 viral genomes per ml.
  • such doses can be administered at multiple times during a day, and/or on multiple days, weeks, or months.
  • precise amounts of the therapeutic composition also depend on the judgment of the practitioner and are peculiar to each individual.
  • Factors affecting dose include physical and clinical state of the patient, the route of administration, the intended goal of treatment (alleviation of symptoms versus cure) and the potency, stability and toxicity of the particular therapeutic substance or other therapies a subject may be undergoing.
  • uptake is species and organ/tissue dependent. The applicable conversion factors and physiological assumptions to be made concerning uptake and concentration measurement are well-known and would permit those of skill in the art to convert one concentration measurement to another and make reasonable comparisons and conclusions regarding the doses, efficacies and results described herein.
  • it will be desirable to have multiple administrations of the composition e.g., 2, 3, 4, 5, 6 or more administrations.
  • the administrations can be at 1, 2, 3, 4, 5, 6, 7, 8, to 5, 6, 7, 8, 9, 10, 11, 12 week, or more than 12 week intervals, including all ranges there between.
  • pharmaceutically acceptable refer to molecular entities and compositions that do not produce an adverse, allergic, or other untoward reaction when administered to an animal or human.
  • pharmaceutically acceptable carrier includes any and all solvents, dispersion media, coatings, anti-bacterial and anti-fungal agents, isotonic and absorption delaying agents, and the like. The use of such media and agents for pharmaceutical active substances is well known in the art.
  • the active compounds can be formulated for parenteral administration, e.g., formulated for injection via the intravenous, intramuscular, subcutaneous, or intraperitoneal routes.
  • parenteral administration e.g., formulated for injection via the intravenous, intramuscular, subcutaneous, or intraperitoneal routes.
  • such compositions can be prepared as either liquid solutions or suspensions; solid forms suitable for use to prepare solutions or suspensions upon the addition of a liquid prior to injection can also be prepared; and, the preparations can also be emulsified.
  • the pharmaceutical forms suitable for injectable use include sterile aqueous solutions or dispersions; formulations including, for example, aqueous propylene glycol; and sterile powders for the extemporaneous preparation of sterile injectable solutions or dispersions.
  • the form must be sterile and must be fluid to the extent that it may be easily injected. It also should be stable under the conditions of manufacture and storage and must be preserved against the contaminating action of microorganisms, such as bacteria and fungi.
  • a composition is proteinaceous
  • the proteinaceous compositions may be formulated into a neutral or salt form.
  • Pharmaceutically acceptable salts include the acid addition salts (formed with the free amino groups of the protein) and which are formed with inorganic acids such as, for example, hydrochloric or phosphoric acids, or such organic acids as acetic, oxalic, tartaric, mandelic, and the like. Salts formed with the free carboxyl groups can also be derived from inorganic bases such as, for example, sodium, potassium, ammonium, calcium, or ferric hydroxides, and such organic bases as isopropylamine, trimethylamine, histidine, procaine and the like.
  • inorganic acids such as, for example, hydrochloric or phosphoric acids, or such organic acids as acetic, oxalic, tartaric, mandelic, and the like.
  • Salts formed with the free carboxyl groups can also be derived from inorganic bases such as, for example, sodium, potassium, ammonium, calcium, or ferric hydroxides, and such organic bases as isopropylamine, trimethylamine,
  • a pharmaceutical composition can include a solvent or dispersion medium containing, for example, water, ethanol, polyol (for example, glycerol, propylene glycol, and liquid polyethylene glycol, and the like), suitable mixtures thereof, and vegetable oils.
  • a solvent or dispersion medium containing, for example, water, ethanol, polyol (for example, glycerol, propylene glycol, and liquid polyethylene glycol, and the like), suitable mixtures thereof, and vegetable oils.
  • the proper fluidity can be maintained, for example, by the use of a coating, such as lecithin, by the maintenance of the required particle size in the case of dispersion, and by the use of surfactants.
  • the prevention of the action of microorganisms can be brought about by various anti-bacterial and anti-fungal agents, for example, parabens, chlorobutanol, phenol, sorbic acid, thimerosal, and the like.
  • sterile injectable solutions are prepared by incorporating the active compounds in the required amount in the appropriate solvent with various other ingredients enumerated above, as required, followed by filtered sterilization or an equivalent procedure. 201118018.1 - 82 - Generally, dispersions are prepared by incorporating the various sterilized active ingredients into a sterile vehicle which contains the basic dispersion medium and the required other ingredients from those enumerated above.
  • compositions described herein may be administered in a manner compatible with the dosage formulation and in such amount as is therapeutically or prophylactically effective.
  • formulations are administered in a variety of dosage forms, such as the type of injectable solutions described above.
  • constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein may be used in a method of preventing, treating, reducing the progression of, and/or reducing the risk of a disease or disorder.
  • constructs, vectors, particles, polypeptides, polynucleotides, and/or compositions described herein may be used in treating a disease or disorder, wherein the disease or disorder is a neurodegenerative disease, an inflammatory disease, an autoimmune disease, a metabolic syndrome, a cancer, a vascular disease, a fibrotic disease, a viral infection, a bacterial infection, a fungal infection, a parasitic infection, a musculoskeletal disease (such as a myopathy), an ocular disease, or a genetic disorder.
  • the disease or disorder is an inflammatory disease.
  • the inflammatory disease is arthritis, psoriatic arthritis, psoriasis, juvenile idiopathic arthritis, asthma, allergic asthma, bronchial asthma, tuberculosis, chronic airway disorder, cystic fibrosis, glomerulonephritis, membranous nephropathy, sarcoidosis, vasculitis, ichthyosis, transplant rejection, interstitial cystitis, atopic dermatitis, or inflammatory bowel disease.
  • the inflammatory bowel disease is Crohn’ disease, ulcerative colitis, inflammatory bowel disease, or celiac disease.
  • the disease or disorder is an autoimmune disease.
  • the autoimmune disease is systemic lupus erythematosus, type 1 diabetes, multiple sclerosis, psoriasis/psoriatic arthritis, inflammatory bowel disease, Addison’s disease, Graves’ disease, Sjogren’s syndrome, Hashimoto’s thyroiditis, Myasthenia gravis, autoimmune vasculitis, pernicious anemia, celiac disease, or rheumatoid arthritis. 201118018.1 - 83 - [0310]
  • the disease or disorder is a metabolic syndrome.
  • the metabolic syndrome is acute pancreatitis, chronic pancreatitis, alcoholic liver steatosis, obesity, glucose intolerance, insulin resistance, hyperglycemia, fatty liver, dyslipidemia, hyperlipidemia, hyperhomocysteinemia, or type 2 diabetes.
  • the metabolic syndrome is alcoholic liver steatosis, obesity, glucose intolerance, insulin resistance, hyperglycemia, fatty liver, dyslipidemia, hyperlipidemia, hyperhomocysteinemia, or type 2 diabetes.
  • the disease or disorder is a cancer.
  • the cancer is pancreatic cancer, breast cancer, kidney cancer, bladder cancer, prostate cancer, testicular cancer, urothelial cancer, endometrial cancer, ovarian cancer, cervical cancer, renal cancer, esophageal cancer, gastrointestinal stromal tumor (GIST), multiple myeloma, cancer of secretory cells, thyroid cancer, gastrointestinal carcinoma, chronic myeloid leukemia, hepatocellular carcinoma, colon cancer, melanoma, malignant glioma, glioblastoma, glioblastoma multiforme, astrocytoma, dysplastic gangliocytoma of the cerebellum, Ewing’s sarcoma, rhabdomyosarcoma, ependymoma, medulloblastoma, ductal adenocarcinoma, adenosquamous carcinoma, nephroblastoma, acinar cell carcinoma, neuroblastoma, or lung cancer.
  • GIST
  • the cancer of secretory cells is non-Hodgkin’s lymphoma, Burkitt’s lymphoma, chronic lymphocytic leukemia, monoclonal gammopathy of undetermined significance (MGUS), plasmacytoma, lymphoplasmacytic lymphoma or acute lymphoblastic leukemia.
  • the disease or disorder is a musculoskeletal disease (such as a myopathy).
  • the musculoskeletal disease is a myopathy, a muscular dystrophy, a muscular atrophy, a muscular wasting, or sarcopenia.
  • the muscular dystrophy is Duchenne muscular dystrophy (DMD), Becker’s disease, myotonic dystrophy, X- linked dilated cardiomyopathy, spinal muscular atrophy (SMA), or metaphyseal chondrodysplasia, Schmid type (MCDS).
  • the myopathy is a skeletal muscle atrophy.
  • the musculoskeletal disease (such as the skeletal muscle atrophy) is triggered by ageing, chronic diseases, stroke, malnutrition, bedrest, orthopedic injury, bone fracture, cachexia, starvation, heart failure, obstructive lung disease, renal failure, Acquired Immunodeficiency Syndrome (AIDS), sepsis, an immune disorder, a cancer, ALS, a burn injury, denervation, diabetes, muscle disuse, limb immobilization, mechanical unload, myositis, or a dystrophy. 201118018.1 - 84 - [0313]
  • the disease or disorder is a musculoskeletal disease.
  • skeletal muscle mass, quality and/or strength are increased.
  • the disease or disorder is a vascular disease.
  • the vascular disease is atherosclerosis, abdominal aortic aneurism, carotid artery disease, deep vein thrombosis, Buerger’s disease, chronic venous hypertension, vascular calcification, telangiectasia or lymphoedema.
  • the disease or disorder is genetic disorder.
  • a genetic disorder is arrhythmogenic right ventricular dysplasia /cardiomyopathy, Brugada Syndrome, Charcot-Marie-Tooth Disease, Cleft Lip and Palate, Cleidocranial Dysplasia, Cystic Fibrosis, Familial Adenomatous Polyposis, Hirschsprungs Disease, Huntington’s Disease, Klinefelter Syndrome, Kneist Syndrome, Marfan Syndrome, Mucopolysaccharidoses, Muscular Dystrophy, Sickle Cell Disease, Von Hippel-Lindau Syndrome, Congenital Deafness, Familial Hypercholesterolemia, Hemochromatosis, Neurofibromatosis type 1, Tay- Sachs Disease, Usher Syndrome, AA amyloidosis, Adrenoleukodystrophy, Ehlers-Danlos Syndrome, Lysosomal disorders, and/or Mitochondrial disorders.
  • the disease or disorder is an ocular disease.
  • the ocular disease is glaucoma, age-related macular degeneration, inflammatory retinal disease, retinal vascular disease, diabetic retinopathy, uveitis, rosacea, Sjogren ⁇ s syndrome, retinitis pigmentosa, retinoschisis, Stargardt disease, Leber congenital amaurosis, or neovascularization in proliferative retinopathy.
  • compositions for use in the methods, such as methods of targeted polynucleotide cleavage, are suitably contained in a pharmaceutically acceptable carrier.
  • the carrier is non-toxic, biocompatible and is selected so as not to detrimentally affect the biological activity of the agent.
  • agents may be formulated into preparations for local delivery (i.e.
  • compositions by coating medical devices and the like.
  • suitable carriers for parenteral delivery via injectable, infusion or irrigation and topical delivery include distilled water, physiological phosphate-buffered saline, normal or lactated Ringer's solutions, dextrose solution, Hank's solution, or propanediol.
  • sterile, fixed oils may be employed as a solvent or suspending medium.
  • any biocompatible oil may be employed including synthetic mono- or diglycerides.
  • fatty acids such as oleic acid find use in the preparation of injectables.
  • the carrier and agent may be compounded as a liquid, suspension, polymerizable or non-polymerizable gel, paste or salve.
  • the carrier may also comprise a delivery vehicle to sustain (i.e., extend, delay or regulate) the delivery of the agent(s) or to enhance the delivery, uptake, stability or pharmacokinetics of the therapeutic agent(s).
  • a delivery vehicle may include, by way of non-limiting examples, microparticles, microspheres, nanospheres or nanoparticles composed of proteins, liposomes, carbohydrates, synthetic organic compounds, inorganic compounds, polymeric or copolymeric hydrogels and polymeric micelles.
  • the actual dosage amount of a composition administered to a patient or subject can be determined by physical and physiological factors such as body weight, severity of condition, the type of disease being treated, previous or concurrent therapeutic interventions, idiopathy of the patient and on the route of administration. The practitioner responsible for administration will, in any event, determine the concentration of active ingredient(s) in a composition and appropriate dose(s) for the individual subject.
  • solutions of pharmaceutical compositions can be prepared in water suitably mixed with a surfactant, such as hydroxypropylcellulose.
  • Dispersions also can be prepared in glycerol, liquid polyethylene glycols, mixtures thereof and in oils.
  • compositions are advantageously administered in the form of injectable compositions either as liquid solutions or suspensions; solid forms suitable or solution in, or suspension in, liquid prior to injection may also be prepared. These preparations also may be emulsified.
  • a typical composition for such purpose comprises a pharmaceutically acceptable carrier.
  • the composition may contain 10 mg or less, 25 mg, 50 mg or up to about 100 mg of human serum albumin per milliliter of 201118018.1 - 86 - phosphate buffered saline.
  • aqueous solutions include aqueous solutions, non-toxic excipients, including salts, preservatives, buffers and the like.
  • non-limiting examples of non-aqueous solvents are propylene glycol, polyethylene glycol, vegetable oil and injectable organic esters such as ethyloleate.
  • non-limiting examples of aqueous carriers include water, alcoholic/aqueous solutions, saline solutions, parenteral vehicles such as sodium chloride, Ringer's dextrose, etc.
  • intravenous vehicles include fluid and nutrient replenishers.
  • Preservatives include antimicrobial agents, antifungal agents, anti-oxidants, chelating agents and inert gases.
  • formulations comprising constructs described herein and/or co- administered formulations may be suitable for oral administration.
  • oral formulations include such typical excipients as, for example, pharmaceutical grades of mannitol, lactose, starch, magnesium stearate, sodium saccharine, cellulose, magnesium carbonate and the like.
  • the compositions take the form of solutions, suspensions, tablets, pills, capsules, sustained release formulations or powders.
  • An effective amount of the pharmaceutical composition is determined based on the intended goal.
  • unit dose refers to physically discrete units suitable for use in a subject, each unit containing a predetermined-quantity of the pharmaceutical composition calculated to produce the desired responses discussed above in association with its administration, i.e., the appropriate route and treatment regimen.
  • the quantity to be administered depends on the protection or effect desired.
  • Precise amounts of the pharmaceutical composition also depend on the judgment of the practitioner and are peculiar to each individual. Factors affecting the dose include the physical and clinical state of the patient, the route of administration, the intended goal of treatment (e.g., alleviation of symptoms versus cure) and the potency, stability and toxicity of the particular therapeutic substance.
  • AsCas12f1 gene fragments codon-optimized for Escherichia coli and human expression were synthesized by Genewiz. Oligonucleotides were ordered from Integrated DNA Technologies. For recombinant AsCas12f1 expression and purification, Escherichia coli- codon-optimized AsCas12f1 was cloned into a pET47b vector following an N-terminal His6- tag. For genome editing in human cells, CMV-driven AsCas12f1 and U6-driven sgRNA were cloned into two separate plasmids of pBR322 origins.
  • CRISPRa catalytically inactive Cas proteins were fused to VPR with an SV40 NLS linker and cloned into the same vector.
  • DNA fragments for plasmid construction were PCR amplified using Phusion U DNA Polymerase (Thermo Fisher, F555S) and assembled by USER enzyme mix (New England Biolabs, M5505L).
  • AsCas12f1 mutants and sgRNA plasmids were generated by site-directed mutagenesis. Key plasmids used in this work will be deposited to Addgene.
  • HEK293T cells were purchased from ATCC (CRL11268) and were cultured in DMEM (Gibco 11995) supplemented with 10% (v/v) fetal bovine serum (Gibco), 1% penicillin and streptomycin (Gibco).
  • HeLa cells were purchased from ATCC (CCL2) and were cultured in DMEM (Gibco 11965) supplemented with 10% (v/v) fetal bovine serum (Gibco), 1% penicillin and streptomycin (Gibco).
  • HCT116 cells were purchased from ATCC (CCL-247) and were cultured in McCoy’s 5A (Gibco 16600) supplemented with 10% (v/v) fetal bovine serum (Gibco), 1% penicillin and streptomycin (Gibco). Cells were grown at 37 °C with 5% CO 2 . One day before transfection, cells were trypsinized and seeded into 96-well plates with 8,000 to 15,000 cells per well. Transfection was carried out when cells reached ⁇ 70% confluency.
  • 120 ng plasmid encoding AsCas12f1 and 120 ng plasmid encoding the sgRNA were transfected using 0.5 ⁇ L Lipofectamine 2000 reagent (Thermo Fisher, 11668019) 201118018.1 - 88 - in 50 ⁇ L optiMEM (Gibco) following the manufacturer’s instructions.
  • Cells were harvested 3 days after transfection for indel analysis or 2 days after transfection for CRISPRa. Evaluation of indel frequency [0331] Cells were lysed with 50 ⁇ L lysis buffer (10 mM Tris-HCl pH 8.0, 0.05% SDS, 20 ⁇ g/mL proteinase K (Thermo Fisher, EO0491)).
  • the lysate was incubated for 60 min at 37 °C, followed by 40 min at 55 °C, 30 min at 85 °C, and 10 min at 95 °C.
  • Target-specific primers were used to amplify 200-400 bp regions surrounding the target site using Taq DNA polymerase (New England Biolabs, M0273L), with 1 ⁇ L cell lysate supplied as templates. Spacer sequences for all genomic target sites are listed in Table 1. Amplicons were further tagged with Illumina TruSeq indexes through PCR. The final PCR products were gel-purified and subjected to 150-bp pair-ended sequencing on an Illumina Miseq platform.
  • Indel frequencies were calculated by CRISPEResso2 (see e.g., Clement, K. et al. 2019) using the Cpf1 mode (for AsCas12f and AsCas12a) or the Cas9 mode (for SpCas9) with 2 bp quantification windows. Shown in FIG.10B are expanded sequences surrounding the targeting site utilized to exemplify indel creation in HEXA. SEQ ID NO: 57 - HEXA reference sequence: TTTTGTATACGCTTCCACAGAAAGGAGCTCTACACCACACCCAA (SEQ ID NO: 57).
  • sgRNAs were prepared by in vitro transcription using T7 RNA polymerase (New England Biolabs, M0251L) following the manufacturer’s protocol. In general, 50 ⁇ L reactions were set up with 2 ⁇ g DNA template, 2 mM NTP mix, 5 mM DTT, and 5 ⁇ L T7 RNA polymerase. Reactions were incubated at 37 °C overnight before treated with 0.2 U/ ⁇ L Turbo DNase (Thermo Fisher, AM2238) at 37 °C for 15 min.
  • T7 RNA polymerase New England Biolabs, M0251L
  • sgRNAs were then purified using the RNA Clean & Concentrator kit (Zymo Research, R1014). Table 2 - Certain oligonucleotides utilized /5Phos/ indicates 5′ phosphorylation. An asterisk indicates phosphorothioate linkage.
  • Protein expression [0333] N-terminal His-tagged AsCas12f (variants were overexpressed in E. coli BL21(DE3). E. coli harboring the expression plasmid were cultured in Terrific Broth at 37 °C until OD 600 reached 1.0. Protein expression was induced by isopropylthio- ⁇ -galactoside (IPTG) at 0.25 mM.
  • IPTG isopropylthio- ⁇ -galactoside
  • Bacteria were further cultured at 16 °C for 24 h before harvest.
  • Around 50 gram cell pellets were resuspended in 300 mL lysis buffer (20 mM Tris HCL, pH 7.5, 1 M NaCl, 15 mM imidazole, 1 mM DTT) and lysed by sonication. Lysates were cleared by centrifugation and incubated with 3 mL Ni-NTA beads (QIAGEN) that were pre-equilibrated in lysis buffer.
  • the beads were packed into a gravity column and washed 201118018.1 - 90 - with 30 mL wash buffer (20 mM Tris-HCl pH 7.5, 1 M NaCl, 50 mM imidazole, 1 mM DTT).
  • Proteins were eluted with 15 mL elution buffer (20 mM Tris-HCl, pH 7.5, 1 M NaCl, 250 mM imidazole, 1 mM DTT), immediately diluted by adding a 2-fold volume of dilution buffer (20 mM Tris-HCl pH 7.5, 1 M NaCl, 1 mM DTT), and concentrated using a 30 kDa Amicon Ultra- 15 Centrifugal Filter (Millipore Sigma). Proteins were further purified by size exclusion chromatography on a Superdex 200 Increase 10/300 GL column (GE Healthcare) using a buffer containing 20 mM Tris-HCl pH 7.5, 1 M NaCl, and 1 mM DTT.
  • dsDNA substrate was prepared by PCR amplification of a 954 bp region spanning the TP53-1 site.
  • In vitro DNA cleavage reactions were set up by mixing gel-purified dsDNA substrate (48 nM), sgRNA (900 ⁇ M), and the wild-type or engineered AsCas12f protein (900 ⁇ M) in 20 ⁇ L 1 ⁇ reaction buffer (10 mM Tris-HCl pH 7.5, 10 mM MgCl 2 and 50 mM NaCl).
  • Sequences of the sgRNA and the target DNA are provided in Table 2.
  • the mixture was incubated on ice for 30 min before being loaded onto a Superdex 200 Increase 10/300 column (GE Healthcare) equilibrated with buffer D (50 mM Tris-HCl, pH 8.0, 50 mM NaCl, 5 mM MgCl2, and 0.5 mM TCEP).
  • buffer D 50 mM Tris-HCl, pH 8.0, 50 mM NaCl, 5 mM MgCl2, and 0.5 mM TCEP.
  • Fractions that contained the pure AsCas12f-sgRNA- DNA complex were pooled and concentrated to roughly 2.5 mg/mL.
  • Sample vitrification was performed using a Vitrobot Mark IV (Thermo Fisher) operating at 8 °C and 100% humidity.
  • Motion-corrected micrographs were then imported to a cryoSPARC live session (see e.g., Punjani, A., et al., 2017) for CTF determination and particle picking. Particles were automatically picked using 2D class averages as templates, which were generated from 201118018.1 - 93 - blob picking. The extracted particles were imported to cryoSPARC for further processing. After 2D classification, contamination and poorly aligned classes were disposed. The resulting 3,370,441 particles were used to generate three initial models by ab initio reconstruction.3D classification was then performed in cryoSPARC using the three initial models as the starting points.
  • the coordinates of the particles from the best class (1,576,757 particles) were imported into RELION (see e.g., Scheres, S.H.W.2012) for particle re-extraction.
  • RELION see e.g., Scheres, S.H.W.2012
  • Another round of 3D classification was performed using the map generated from cryoSPARC as the initial model.
  • the best class was subjected to 3D refinement, CTF refinement, Bayesian polishing, and postprocessing.
  • Model building was performed in COOT (see e.g., Emsley, P., & Cowtan, K., 2004) using a starting model of AsCas12f predicted by AlphaFold2 (see e.g., Jumper, J., et al., 2021). One full copy of AsCas12f and a second copy of the N-lobe were identified in the cryo-EM map and modeled.
  • DNA and sgRNA were built into the map based on the knowledge of sequence complementarity, secondary structure prediction by IPknot (see e.g., Sato, K., et al., 2011), and fragment RNA model generated by RNAComposer (see e.g., Popenda, M., et al., 2012).
  • the final model was refined in real space and validated using PHENIX (see e.g., Adams, P.D., et al., 2010).
  • the statistics of model refinement and geometry are available in Table 3.
  • GUIDE-seq experiments were performed following a reported protocol (see e.g., Tsai, S.Q. et al., 2015; and Malinin, N.L., et al., 2021).
  • 1.8 ⁇ g plasmid encoding AsCas12f1 1.8 ⁇ g plasmid encoding the sgRNA
  • 5 ⁇ L end-protected double-stranded oligodeoxynucleotide (dsODN, 100 ⁇ M) were added to one million HEK293T cells in 100 ⁇ L nucleofection buffer. Nucleofection was performed on a 4D-Nucleofector (Lonza) according to the manufacturer's instructions. Full-length sgRNAs were applied in all GUIDE-seq experiments. [0341] Cells were harvested 3 days post-nucleofection and were subjected to genomic DNA (gDNA) isolation.
  • gDNA genomic DNA
  • Targeted deep sequencing was performed as described above to analyze indel and dsODN incorporation frequencies.1 ⁇ g gDNA was applied to fragmentation, end-repair, A-tailing, adapter ligation, and dsODN-specific amplification. The libraries were sequenced for 150 cycles on an Illumina Nextseq platform. Data were analyzed and visualized using open-source guideseq software (see e.g., Tsai, S.Q., et al., 2016). DNA oligos used for GUIDE-seq are provided in Table 2. [0342] All sequencing data are available at NCBI Gene Expression Omnibus with the accession number; GSE211600.
  • DSBs are sensed by DNA damage repair machinery and are fixed primarily by nonhomologous end-joining (NHEJ), resulting in the formation of random insertions and deletions (indels) (see e.g., Anzalone, A.V., et al., 2020).
  • NHEJ nonhomologous end-joining
  • indels random insertions and deletions
  • AsCas12f has demonstrated capable of introducing DSBs in the human genome, its activity is modest and varies substantially among different loci (see e.g., Wu, Z., et al., 2021; Xu, X., et al., 2021; and Kim, D.Y., et al., 2022a).
  • the inventors reasoned that AsCas12f-mediated DNA targeting and cleavage could potentially be improved by increasing the affinity of AsCas12f to a gRNA and/or target DNA.
  • Protein-nucleic acid engagement is frequently mediated by electrostatic interactions between phosphodiester backbones of nucleic acids and positively charged patches on proteins (see e.g., Marcovitz, A. & Levy, Y., 2011).
  • introduction of basic residues such as lysine (K) and/or arginine (R) into Cas proteins may increase their affinity to nucleic acids, resulting in boosted DNA-targeting and/or cleavage activity.
  • NLS nuclear localization signal
  • AsCas12f variants showed similar or lower activity when compared to the 201118018.1 - 96 - wild-type protein
  • eight single-point mutations including D196K, N199K, G276R, D281K, T327K, N328G, D364K, D364R, increased the indel frequency at one or both target sites (see FIGs.1D-1E, and FIG.8).
  • AsCas12f activity could be further increased by combining activity increasing mutations (e.g., D196K, N199K, G276R, D281K, T327K, N328G, D364K, and/or D364R).
  • AsCas12f variants harboring double, triple, quadruple, and quintuple mutations was generated, and the editing activities of these enzymes were tested. Many assayed combinations gave rise to greater levels of indels (up to 73.6% indel creation rate), with the most robust variant being AsCas12f-v5.2, comprising engineered mutations D196K, N199K, G276R, N328G, and D364R. AsCas12f-v5.2 exhibited at least 2.5- to 3.5-fold higher gene-editing activity at all three tested target sites (see FIGs. 1D-1E, and FIGs. 9A-9B).
  • This engineered variant is also termed herein as enhanced AsCas12f (enAsCas12f).
  • enAsCas12f enhanced AsCas12f
  • the observed improvement in activity could not be attributed to differences in protein expression or stability, as Flag-tagged wild-type AsCas12f, AsCas12f-v4.1, and enAsCas12f exhibited similar protein levels in HEK293T cells (FIG. 20A).Wild-type AsCas12f and enAsCas12f were then heterologously expressed and purified from Escherichia coli.
  • enAsCas12f was more active than wild-type AsCas12f in cleaving dsDNA at both 37 °C and 50 °C (FIGs.1F, 1G, and FIGs.20B-20D).
  • enAsCas12f formed a similar indel pattern as wild-type AsCas12f, indicating that the preferred sites of cleavage in the target DNA remained unchanged (see FIGs. 10A-10B).
  • Deletion signals centered at 19-24 bp downstream of the PAM and extended beyond the 3 ⁇ end of the protospacer (see FIGs.10A-10C).
  • AsCas12f-v3.2 comprising mutations D196K, N199K, and N328G
  • AsCas12f-v4.1 comprising mutations D196K, N199K, N328G, and D364R
  • UnCas12f Another Cas12f family member, UnCas12f, was previously shown to have minimal gene-editing activity when assayed in mammalian cells (see e.g., Kim, D.Y., et a., 2022).
  • sgRNA “ge4.1” SEQ ID NO: 56
  • sgRNA “ge4.1” SEQ ID NO: 56
  • CasMINI likely had a higher level of sequence context dependence, and thus may be less suitable for use with a wide array of genomic target loci when compared to enAsCas12f.
  • the inventors also compared enAsCas12f with two commonly used Cas proteins, AsCas12a and SpCas9. AsCas12a recognizes T-rich PAMs located upstream of the protospacer, and thus direct comparisons are difficult. Therefore the inventors assayed AsCas12a-mediated indel formation directly at target sites designed for AsCas12f.
  • AsCas12a generated 12.2 ⁇ 9.4% indels across 17 target sites, which was significantly lower than enAsCas12f in all cases (FIG. 2D, and FIG.21A).
  • SpCas9 recognizes 5 ⁇ -NGG PAMs at the 3 ⁇ end of target sites.
  • the inventors selected five loci that carry 5 ⁇ -NGG PAMs among the 17 loci and designed three additional spacers for SpCas9 that recognize sites adjacent to those targeted by AsCas12f.
  • SpCas9 showed activity comparable to or higher than that of enAsCas12f at all eight target sites (62.2 ⁇ 10.0% indels; FIG.2E, and FIG.21B).
  • Cas12f systems were recently reported as being repurposed for base editing, transcription repression, and transcription activation (see e.g., Xu X., et al., 2021; Kim, D.Y., et al., 2022b; Xin, C., et al., 2022; and Zhang, S., et al., 2023), offering a versatile toolbox for genome engineering.
  • transcription activation of endogenous genes using AsCas12f has not been reported.
  • CRISPRa CRISPR activation
  • VPR transcription activator complex VP64-p65-Rta
  • D225A dead AsCas12f variants
  • FIG.21C dead AsCas9- and dCas12a-based CRISPRa systems at three genomic loci.
  • a catalytically dead variant of wild-type AsCas12f modestly activated transcription of HBB in HEK293T cells, but was unable to activate transcription of HBG and IL1RN loci.
  • enAsCas12f was shown to be a potent gene-editing agent that functioned broadly in human cells.
  • Example 3 Cryo-EM structure of the AsCas12f complex
  • AsCas12f-D225A variant nuclease-deficient AsCas12f
  • sgRNA (193 nt) nuclease-deficient AsCas12f
  • sgRNA (193 nt) nuclease-deficient AsCas12f
  • target dsDNA 42 bp
  • Cryo-electron microscopy cryo-EM
  • single-particle analysis were performed (see FIG.12A, and Table 3).
  • the C-lobe of AsCas12f.2 was shown to be situated in close proximity to the cleavage site of the target DNA, but was poorly resolved due to structural flexibility (see FIGs.13A-13B).
  • the folding of monomeric AsCas12 and UnCas12f was shown to be similar (see FIGs.14A-14C), while AsCas12f was confirmed as being smaller than UnCas12f because the REC domain of UnCas12f hosts an additional zinc finger motif (78 aa) close to its N-terminus (see FIG.14C).
  • AsCas12f was shown to dimerize through an extensive interface in the REC domain (see FIG. 15A).
  • AsCas12f was shown to recognize T-rich PAMs by interacting with both the non- target strand (e.g., through amino acids K80 and/or S92 in REC.1, see FIG.16A) and the target strand (e.g., through amino acids K96 and/or S92 in REC.1, see FIG. 16B). Apart from the PAM, AsCas12f was also shown to pervasively interact with the phosphodiester groups of the 201118018.1 - 100 - target DNA (see FIGs.16A-16B). Mutating residues that interacted with the target DNA (e.g., Y76, S92, R101, R298, and/or Y343) again led to reduced indel frequencies (see FIG.
  • the sgRNA of AsCas12f initially created by fusing a 49-nt CRISPR RNA (crRNA) and a 138-nt trans-activating CRISPR RNA (tracrRNA) was termed as a wild type sgRNA, and comprised five stem loops (see FIGs. 17A-17B). Among them, stem 2 was found to engage both AsCas12f monomers, making major contributions to protein-sgRNA assembly (see FIG. 3C). Notably, AsCas12f sgRNA was much longer and adopted a tertiary structure distinct from UnCas12f sgRNA (see FIGs.17B-17D).
  • Example 4 Structure-guided sgRNA engineering
  • modifications to gRNAs have reportedly improved the gene-editing performance of several CRISPR systems (see e.g., Lee, H.J., et al., 2022; Xu, X., et al., 2021; Kim, D.Y., et a., 2022a; Dang, Y., et al., 2015; and Moon, S.B., et al., 2019).
  • cryo-EM structure of AsCas12f ribonucleoprotein-DNA complexes Based on the inventors cryo-EM structure of AsCas12f ribonucleoprotein-DNA complexes, the inventors reasoned that truncation of the sgRNA, especially in regions that do not directly interact with AsCas12f, could potentially reduce the flexibility of the complex and consolidate key interactions in the sgRNA enAsCas12f complex.
  • the poorly resolved cryo- EM density of U(–47)-U(–15) in stem 5 (see FIGs. 13A-13B, and FIG. 17A, grey box) suggested that this segment was flexible, and did not intimately interact with AsCas12f.
  • sgRNA-v2 This new sgRNA, which is termed herein as sgRNA-v2, is 72 nt shorter than the original sequence (e.g., a >33% decrease in molecular weight, see FIG.4B).
  • sgRNA-v2 When complexed with enAsCas12f, sgRNA-v2 showed DNA cleavage activity on par with or slightly higher when compared to full-length sgRNA in vitro (FIG.4C and FIG.19E).
  • the robust activity of sgRNA-v2 extended to indel formation across eight assayed target sites in HEK293T cells (see FIGs.4A, 4D, and 4E).
  • sgRNA-v2 As the U6 promoter may favor shorter less structured transcripts.
  • the abundances of full-length sgRNA and sgRNA-v2 in transfected HEK293T cells were examined using quantitative reverse transcription PCR (RT- qPCR). It appeared that sgRNA-v2 showed ⁇ 4-fold higher expression than the full-length sgRNA (FIG. 4F).
  • the improved expression did not translate into higher indel formation frequencies, indicating that the cellular activity of the enAsCas12f system was not limited by sgRNA expression levels.
  • cryo-EM structure of the AsCas12f- sgRNA-DNA complex enabled rational gRNA engineering, yielding a more compact and potent AsCas12f system.
  • Example 5 Off-target effects of the engineered Cas12f systems [0362]
  • the inventors interrogated genome-wide specificity of engineered AsCas12f variants using the method GUIDE-seq, wherein DNA breakage sites were mapped by integration of a double-stranded oligonucleotide (dsODN).
  • dsODN double-stranded oligonucleotide
  • GUIDE-seq has previously been applied to identify off-target effects for various Type V CRISPR systems, including AsCas12a and LbCas12a, enzymes that introduced DSBs with similar patterns as AsCas12f (see e.g., Kleinstiver, B.P., et al., 2016).
  • the inventors analyzed 17 target sites assayed in this study using Cas-Offinder (see e.g., Bae, S., et al., 2014) and selected five sites with the largest number 201118018.1 - 102 - of potential off-target sites for GUIDE-seq (see Table 4). Consistent with results obtained from lipid-mediated transfection (see FIGs.
  • AsCas12f-v4.1 and enAsCas12f showed high potency at on-target sites, delivering up to 20.7% and up to 34.6% levels of observable indel induction, respectively (see FIG. 5A). In sharp contrast, much lower indel frequencies of up to 2.2% were observed with wild-type AsCas12f. It was noted that the overall indel rates were lower in GUIDE-seq assays, potentially because delivery was compromised by co-electroporation of large amounts of dsODN. dsODN-bearing reads constituted 0.8-7.5% of indel-containing reads among GUIDE-seq samples (see FIGs.5A, and 22A).
  • beneficial mutations at a rate of 8 out of 32, a rate that is notably higher than what would be expected for random mutagenesis and/or some structure-guided approaches.
  • beneficial mutations when combined, lead to creation of complex engineered AsCas12f variants, such as enAsCas12f.
  • enAsCas12f was found to be up to 11.3-fold more potent in editing the human genome when compared to the wild-type protein AsCas12f.
  • enAsCas12f was found to generate high indel frequencies at PAM-distal regions, similar to wild-type AsCas12f and UnCas12f.
  • CRISPR provides acquired resistance against viruses in prokaryotes. Science 315, 1709-1712 (2007). 3. Gasiunas, G., Barrangou, R., Horvath, P. & Siksnys, V. Cas9–crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria. Proc. Natl. Acad. Sci. USA 109, E2579-E2586 (2012). 4. Jinek, M. et al. A programmable dual-RNA–guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816-821 (2012). 5. Anzalone, A.V., Koblan, L.W. & Liu, D.R.
  • GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat. Biotechnol.33, 187-197 (2015). 46. Marcovitz, A. & Levy, Y. Frustration in protein–DNA binding influences conformational switching and target search kinetics. Proc. Natl. Acad. Sci. USA 108, 17957- 17962 (2011). 47. Kleinstiver, B.P. et al. Engineered CRISPR–Cas12a variants with increased activities and improved targeting ranges for gene, epigenetic and base editing. Nat. Biotechnol. 37, 276-282 (2019). 48. Clement, K. et al.
  • CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol.37, 224-226 (2019). 49. Xin, C. et al. Comprehensive assessment of miniature CRISPR-Cas12f nucleases for gene disruption. Nat. Commun.13, 5623 (2022). 50. Zhang, S. et al. TadA reprogramming to generate potent miniature base editors with high precision. Nature Commun.14, 413 (2023). 51. Chavez, A. et al. Highly efficient Cas9-mediated transcriptional programming. Nat. Methods 12, 326-328 (2015). 52. Dang, Y. et al. Optimizing sgRNA structure to improve CRISPR-Cas9 knockout efficiency.
  • Cas-OFFinder a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30, 1473-1475 (2014). 56. Zheng, S.Q. et al. MotionCor2: anisotropic correction of beam-induced motion for improved cryo-electron microscopy. Nat. Methods 14, 331-332 (2017). 57. Punjani, A., Rubinstein, J.L., Fleet, D.J. & Brubaker, M.A. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14, 290-296 (2017). 58. Scheres, S.H.W.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Organic Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Medicinal Chemistry (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

La présente invention concerne de nouveaux systèmes CRISPR/Cas Cas12f compacts ayant des activités améliorées, telles qu'une activité endonucléase améliorée. Des aspects de l'invention concernent des polypeptides AsCas12f modifiés comprenant une ou plusieurs substitutions d'acides aminés par rapport à SEQ ID NO : 1, des oligonucléotides constitutifs auxiliaires du système CRISPR/Cas modifiés tels que sgRNA-v2, et des polynucléotides codant pour lesdits polypeptides et/ou lesdites espèces fonctionnelles d'ARN. L'invention concerne également des vecteurs, des compositions et des procédés comprenant et/ou comprenant l'utilisation de ceux-ci.
PCT/US2024/027039 2023-05-01 2024-04-30 Système crispr-cas12f compact modifié Ceased WO2024229018A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363499365P 2023-05-01 2023-05-01
US63/499,365 2023-05-01

Publications (2)

Publication Number Publication Date
WO2024229018A2 true WO2024229018A2 (fr) 2024-11-07
WO2024229018A3 WO2024229018A3 (fr) 2025-04-17

Family

ID=93333291

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/027039 Ceased WO2024229018A2 (fr) 2023-05-01 2024-04-30 Système crispr-cas12f compact modifié

Country Status (1)

Country Link
WO (1) WO2024229018A2 (fr)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020123887A2 (fr) * 2018-12-14 2020-06-18 Pioneer Hi-Bred International, Inc. Nouveaux systèmes crispr-cas d'édition du génome
US20230212612A1 (en) * 2020-05-28 2023-07-06 Shanghaitech University Genome editing system and method
WO2022082179A2 (fr) * 2020-10-14 2022-04-21 Pioneer Hi-Bred International, Inc. Variants d'endonucléase cas modifiés pour une édition génique améliorée
KR20240011120A (ko) * 2020-12-22 2024-01-25 크로마 메디슨, 인크. 후성유전학적 편집을 위한 조성물 및 방법

Also Published As

Publication number Publication date
WO2024229018A3 (fr) 2025-04-17

Similar Documents

Publication Publication Date Title
CN114040970B (zh) 使用腺苷脱氨酶碱基编辑器编辑疾病相关基因的方法,包括遗传性疾病的治疗
CN116497067B (zh) 治疗血红素病变的组合物和方法
Herrmann et al. A robust and all-inclusive pipeline for shuffling of adeno-associated viruses
WO2016174056A1 (fr) Compositions et méthodes pour le traitement de troubles dus à l'expansion de répétition des nucléotides
US20230203463A1 (en) Rna-guided nucleases and active fragments and variants thereof and methods of use
JP2022507402A (ja) 肝特異的ウイルスプロモーター及びその使用方法
EP4470612A2 (fr) Nucléases guidées par arn et fragments actifs et variants associés et procédés d'utilisation
US20220298500A1 (en) Compositions for regulating and self-inactivating enzyme expression and methods for modulating off-target activity of enzymes
KR20250075747A (ko) Rna 가이드된 뉴클레아제-매개 유전자 편집을 위한 잠금 핵산을 이용한 가이드 rna의 화학적 변형
US12378549B2 (en) CRISPR-cas9 system and uses thereof
WO2020150338A1 (fr) Répresseurs de htt et leurs utilisations
WO2024229018A2 (fr) Système crispr-cas12f compact modifié
TW202546224A (zh) 新穎的rna引導之核酸酶及用於聚合酶編輯之蛋白質
WO2024259332A2 (fr) Procédés et compositions de régulation de l'expression génique
WO2024173573A1 (fr) Systèmes transposon-crispr et composants
CN120677236A (zh) 工程化omni-50核酸酶变体
JP2025510622A (ja) 操作されたヌクレアーゼ及びキメラヌクレアーゼ
WO2023147558A2 (fr) Méthodes crispr pour corriger des mutations du gène bag3 in vivo
US20250135032A1 (en) Crispr methods for correcting bag3 gene mutations in vivo
Luk Development of CRISPR-Cas Editing Tools for Therapeutic Genome Editing
Ibraheim Genome Engineering Goes Viral: Repurposing of Adeno-associated Viral Vectors for CRISPR-mediated in Vivo Genome Engineering
EP4689103A2 (fr) Composés pour édition génomique
WO2025083619A1 (fr) Nucléases guidées par arn et fragments actifs, variants associés et méthodes d'utilisation
WO2024050548A2 (fr) Promoteurs compacts pour cibler des gènes induits par l'hypoxie
WO2024127370A1 (fr) Arn guides ciblant le gène trac et procédés d'utilisation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24800446

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE