WO2020224611A1 - Improved gene editing system - Google Patents

Improved gene editing system Download PDF

Info

Publication number
WO2020224611A1
WO2020224611A1 PCT/CN2020/088887 CN2020088887W WO2020224611A1 WO 2020224611 A1 WO2020224611 A1 WO 2020224611A1 CN 2020088887 W CN2020088887 W CN 2020088887W WO 2020224611 A1 WO2020224611 A1 WO 2020224611A1
Authority
WO
WIPO (PCT)
Prior art keywords
polypeptide
cell
gene editing
editing system
deletion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2020/088887
Other languages
French (fr)
Inventor
Caixia Gao
Huawei ZHANG
Shengxing WANG
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Genetics and Developmental Biology of CAS
Original Assignee
Institute of Genetics and Developmental Biology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Genetics and Developmental Biology of CAS filed Critical Institute of Genetics and Developmental Biology of CAS
Priority to CN202080034110.3A priority Critical patent/CN114008207B/en
Priority to CN202510113127.2A priority patent/CN119932089A/en
Priority to EP20802707.8A priority patent/EP3966335A4/en
Priority to US17/609,640 priority patent/US20220251580A1/en
Publication of WO2020224611A1 publication Critical patent/WO2020224611A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8201Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
    • C12N15/8213Targeted insertion of genes into the plant genome by homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases [RNase]; Deoxyribonucleases [DNase]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/24Hydrolases (3) acting on glycosyl compounds (3.2)
    • C12N9/2497Hydrolases (3) acting on glycosyl compounds (3.2) hydrolysing N- glycosyl compounds (3.2.2)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/88Lyases (4.)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04001Cytosine deaminase (3.5.4.1)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y402/00Carbon-oxygen lyases (4.2)
    • C12Y402/99Other carbon-oxygen lyases (4.2.99)
    • C12Y402/99018DNA-(apurinic or apyrimidinic site)lyase (4.2.99.18)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]

Definitions

  • the invention relates to the field of genetic engineering.
  • the present invention relates to an improved gene editing system.
  • the present invention relates to a gene editing system capable of providing accurate editing, particularly predictable accurate polynucleotide deletion, to the genome of a eukaryotic cell.
  • the present invention provides a gene editing system for editing a target sequence in the genome of a cell, comprising:
  • a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding the first polypeptide
  • a second polypeptide and/or an expression construct comprising a nucleotide sequence encoding the second polypeptide
  • RNA and/or an expression construct comprising a nucleotide sequence encoding the guide RNA
  • the first polypeptide comprises CRISPR nuclease, cytosine deaminase and optionally uracil-DNA glycosylase (UDG)
  • the second polypeptide comprises AP lyase
  • the guide RNA is capable of targeting the first polypeptide to the target sequence in the genome of the cell.
  • the present invention provides a gene editing system for editing a target sequence in a cell genome, comprising:
  • polypeptide and/or an expression construct comprising a nucleotide sequence encoding the polypeptide
  • RNA and/or an expression construct comprising a nucleotide sequence encoding the guide RNA
  • polypeptide comprises CRISPR nuclease, cytosine deaminase, AP lyase and optionally uracil-DNA glycosylase (UDG) , wherein the guide RNA is capable of targeting the polypeptide to the target sequence in the cell genome.
  • the present invention provides a method for producing a genetically modified cell, comprising introducing the gene editing system of the present invention into the cell.
  • the present invention provides a kit comprising the gene editing system of the invention and instructions for use.
  • FIG. 1 shows the working mode of an ACD system.
  • FIG. 2 shows a comparative analysis of the efficiency of InDel generation at different targeting sites between SpCas9 and ACD systems.
  • FIG. 3 shows the type and efficiency of the deletion mutation formed by the ACD system at the sgF3HT4 site.
  • FIG. 4 shows the type and efficiency of the deletion mutation formed by the ACD system at the sgLART4 site.
  • FIG. 5 shows the type and efficiency of the deletion mutation formed by the ACD system at the sgMYBT2 site.
  • FIG. 6 shows the type and efficiency of the deletion mutation formed by the ACD system at the sgPMKT1 site.
  • FIG. 7 shows the type and efficiency of the deletion mutation formed by the ACD system at the sgVRN1T1 site.
  • FIG. 8 shows the type and efficiency of the deletion mutation formed by the ACD system at the sgGS6T2 site.
  • Figure 9 shows the difference in deamination activity and deamination window of different cytosine deaminase.
  • Figure 10 shows the schematic diagram of the vector construction of two different types of AFID systems.
  • Figure 11 shows the deletion efficiency of Cas9, AFID-3, and eAFID-3 on different rice endogenous targets.
  • Figure 12 shows the deletion efficiency of Cas9, AFID-3, and eAFID-3 on different wheat endogenous targets.
  • Figure 13 shows types and proportions of deletion mutations of AFID-3 and eAFID-3 on rice endogenous targets.
  • Figure 14 shows types and proportions of deletion mutations of AFID-3 and eAFID-3 on wheat endogenous targets.
  • Figure 15 shows the preference of AFID-3 and eAFID-3 for cytosine bases where the predictable fragment deletion starts.
  • Figure 16 shows that the mutation types and proportions thereof of required predictable deletion in the reading frame generated by the Cas9, AFID-3, and eAFID-3 at the miR396h binding site of the rice OsGRF1 gene and the miR156 binding site of the OsIPA1 gene, respectively.
  • Figure 17. shows the schematic diagram of the construction of the AFID-3 vector used for Agrobacterium infection in rice.
  • Figure 18 shows the types of regenerated plant mutants produced by Cas9 and AFID-3 on the rice OsCDC48 gene.
  • the term “and/or” encompasses all combinations of items connected by the term, and each combination should be regarded as individually listed herein.
  • “A and/or B” covers “A” , “A and B” , and “B” .
  • “A, B, and/or C” covers “A” , “B” , “C” , “A and B” , “A and C” , “B and C” , and “A and B and C” .
  • the protein or nucleic acid may consist of the sequence, or may have additional amino acids or nucleotide at one or both ends of the protein or nucleic acid, but still have the activity described in this invention.
  • those skilled in the art know that the methionine encoded by the start codon at the N-terminus of the polypeptide will be retained under certain practical conditions (for example, when expressed in a specific expression system) , but does not substantially affect the function of the polypeptide.
  • Gene as used herein encompasses not only chromosomal DNA present in the nucleus, but also organellar DNA present in the subcellular components (eg, mitochondria, plastids) of the cell.
  • organism includes any organism, preferably eukaryotic organism that is suitable for genomic editing.
  • organisms include, but are not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens, ducks, geese; plants including monocots and dicots such as rice, corn, wheat, sorghum, barley, soybean, peanut, arabidopsis and the like.
  • a “genetically modified organism” or “genetically modified cell” includes the organism or the cell which comprises within its genome an exogenous polynucleotide or a modified gene or expression regulatory sequence.
  • the exogenous polynucleotide is stably integrated within the genome of the organism or the cell such that the polynucleotide is passed on to successive generations.
  • the exogenous polynucleotide may be integrated into the genome alone or as part of a recombinant DNA construct.
  • the modified gene or expression regulatory sequence means that, in the organism genome or the cell genome, said sequence comprises one or more nucleotide substitution, deletion, or addition.
  • Exogenous in reference to a sequence means a sequence from a foreign species, or refers to a sequence in which significant changes in composition and /or locus occur from its native form through deliberate human intervention if from the same species.
  • nucleic acid sequence RNA or DNA polymers, optionally containing synthetic, non-natural or altered nucleotide bases.
  • Nucleotides are referred to by their single letter names as follows: “A” is adenosine or deoxyadenosine (corresponding to RNA or DNA, respectively) , “C” means cytidine or deoxycytidine, “G” means guanosine or deoxyguanosine, “U” represents uridine, “T” means deoxythymidine, “R” means purine (A or G) , “Y” means pyrimidine (C or T) , “K” means G or T, “H” means A or C or T, “I” means inosine, and “N” means any nucleotide.
  • Polypeptide, " “peptide, “ and “protein” are used interchangeably in the present invention to refer to a polymer of amino acid residues. The terms apply to an amino acid polymer in which one or more amino acid residues is artificial chemical analogue of corresponding naturally occurring amino acid (s) , as well as to a naturally occurring amino acid polymer.
  • polypeptide, “ “peptide, “ “amino acid sequence, “ and “protein” may also include modified forms including, but not limited to, glycosylation, lipid ligation, sulfation, ⁇ carboxylation of glutamic acid residues, and ADP-ribosylation.
  • Sequence identity has recognized meaning in the art, and the percentage of sequence identity between two nucleic acids or polypeptide molecules or regions can be calculated using the disclosed techniques. Sequence identity can be measured along the entire length of a polynucleotide or polypeptide or along a region of the molecule.
  • identity is well known to the skilled person (Carrillo, H. &Lipman, D., SIAM J Applied Math 48: 1073 (1988) ) .
  • Suitable conserved amino acid replacements in peptides or proteins are known to those skilled in the art and can generally be carried out without altering the biological activity of the resulting molecule.
  • one skilled in the art recognizes that a single amino acid replacement in a non-essential region of a polypeptide does not substantially alter biological activity (See, for example, Watson et al., Molecular Biology of the Gene, 4th Edition, 1987, The Benjamin/Cummings Pub. co., p. 224) .
  • expression construct refers to a vector such as a recombinant vector that is suitable for expression of a nucleotide sequence of interest in an organism.
  • “Expression” refers to the production of a functional product.
  • expression of a nucleotide sequence may refer to the transcription of a nucleotide sequence (eg, transcription to produce mRNA or functional RNA) and /or the translation of an RNA into a precursor or mature protein.
  • the "expression construct" of the present invention may be a linear nucleic acid fragment, a circular plasmid, a viral vector or, in some embodiments, an RNA that is capable of translation (such as mRNA) .
  • the "expression construct" of the present invention may comprise regulatory sequences and nucleotide sequences of interest from different origins, or regulatory sequences and nucleotide sequences of interest from the same source but arranged in a manner different from that normally occurring in nature.
  • regulatory sequence and “regulatory element” are used interchangeably to refer to a nucleotide sequence that is located upstream (5 'non-coding sequence) , middle or downstream (3' non-coding sequence) of a coding sequence and affects the transcription, RNA processing or stability or translation of the relevant coding sequence. Regulatory sequences may include, but are not limited to, promoters, translation leaders, introns and polyadenylation recognition sequences.
  • Promoter refers to a nucleic acid fragment capable of controlling the transcription of another nucleic acid fragment.
  • the promoter is a promoter capable of controlling the transcription of a gene in a cell, whether or not it is derived from the cell.
  • the promoter may be a constitutive promoter or tissue-specific promoter or developmentally-regulated promoter or inducible promoter.
  • Constant promoter refers to a promoter that may in general cause the gene to be expressed in most cases in most cell types.
  • tissue-specific promoter and “tissue-preferred promoter” are used interchangeably and mean that they are expressed primarily but not necessarily exclusively in one tissue or organ, but also in a specific cell or cell type.
  • Developmentally-regulated promoter refers to a promoter whose activity is dictated by developmental events.
  • inducible promoter selectively express operably linked DNA sequences in response to an endogenous or exogenous stimulus (environment, hormones, chemical signals, etc. ) .
  • promoters include, but are not limited to, the polymerase (pol) I, pol II or pol III promoters.
  • the pol I promoter include the gallus RNA pol I promoter.
  • the pol II promoters include, but are not limited to, the immediate-early cytomegalovirus (CMV) promoter, the Rous sarcoma virus long terminal repeat (RSV-LTR) promoter, and the immediate-early simian virus 40 (SV40) promoter.
  • pol III promoters include the U6 and H1 promoters.
  • An inducible promoter such as a metallothionein promoter can be used.
  • promoters include the T7 phage promoter, the T3 phage promoter, the ⁇ -galactosidase promoter, and the Sp6 phage promoter, and the like.
  • Promoters that can be used in plants include, but are not limited to, cauliflower mosaic virus 35S promoter, maize Ubi-1 promoter, wheat U6 promoter, rice U3 promoter, and rice actin promoter, and the like.
  • operably linked refers to the linkage of a regulatory element (e.g., but not limited to, a promoter sequence, a transcription termination sequence, etc. ) to a nucleic acid sequence (e.g., a coding sequence or an open reading frame) such that transcription of the nucleotide sequence is controlled and regulated by the transcriptional regulatory element.
  • a regulatory element e.g., but not limited to, a promoter sequence, a transcription termination sequence, etc.
  • nucleic acid sequence e.g., a coding sequence or an open reading frame
  • “Introduction" of a nucleic acid molecule (e.g., plasmid, linear nucleic acid fragment, RNA, etc. ) or protein into an organism means that the nucleic acid or protein is used to transform an organism cell such that the nucleic acid or protein is capable of functioning in the cell.
  • “transformation” includes both stable and transient transformations.
  • “Stable transformation” refers to the introduction of exogenous nucleotide sequences into the genome, resulting in the stable inheritance of foreign genes. Once stably transformed, the exogenous nucleic acid sequence is stably integrated into the genome of the organism and any of its successive generations.
  • Transient transformation refers to the introduction of a nucleic acid molecule or protein into a cell, performing a function without the stable inheritance of an exogenous gene. In transient transformation, the exogenous nucleic acid sequences are not integrated into the genome.
  • the present inventors have surprisingly discovered that accurate deletion from the DSB site in the target sequence to the C nucleotide site may be provided through targeting the CRISPR nuclease to the target sequence in the cell genome by the guide RNA to form a double-stranded break (DSB) , while converting the C in the target sequence or its complementary sequence to U by the cytosine deaminease fused with the CRISPR nuclease, and then through the combined effect of the endogenous or exogenous uracil-DNA glycosylase (UDG) and AP lyase.
  • DSB double-stranded break
  • the present invention provides a gene editing system for editing a target sequence in the genome of a cell, comprising:
  • a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding the first polypeptide
  • a second polypeptide and/or an expression construct comprising a nucleotide sequence encoding the second polypeptide
  • RNA and/or an expression construct comprising a nucleotide sequence encoding the guide RNA
  • the first polypeptide comprises cytosine deaminase, CRISPR nuclease and optionally uracil-DNA glycosylase (UDG)
  • the second polypeptide comprises AP lyase
  • the guide RNA is capable of targeting the first polypeptide to the target sequence in the genome of the cell.
  • the expression construct comprising the nucleotide sequence encoding the first polypeptide, the expression construct comprising the nucleotide sequence encoding the second polypeptide, and/or the expression construct comprising the nucleotide sequence encoding the guide RNA may be different expression constructs, or any two or all of them may be the same expression construct.
  • the first polypeptide is isolated
  • the second polypeptide is isolated polypeptide and/or the guide RNA is isolated RNA.
  • gene editing system refers to a combination of components required for gene editing of a genome in a cell.
  • the various components of the system such as polypeptides, gRNA, etc., may exist independently of each other, or may exist in any combination thereof.
  • the gene editing system comprises at least an expression construct comprising a nucleotide sequence encoding the first polypeptide, a nucleotide sequence encoding a self-cleaving peptide, and a nucleotide sequence encoding the second polypeptide ligated in frame.
  • the nucleotide sequence encoding the first polypeptide, the nucleotide sequence encoding the self-cleavage peptide, and the nucleotide sequence encoding the second polypeptide are arranged in the direction from 5 'to 3' .
  • the "self-cleaving peptide” means a peptide that may achieve self-cleavage within a cell.
  • the self-cleaving peptide may contain a protease recognition site so as to be recognized and specifically cleaved by the protease in the cell.
  • the self-cleaving peptide may be a 2A polypeptide.
  • the 2A polypeptide is a class of short peptides originated from viruses whose self-cleaving occurs during translation. When two different polypeptides of interest are linked by the 2A polypeptide and expressed in the same reading frame, the two polypeptides of interest are generated at a ratio of nearly 1: 1.
  • the commonly used 2A polypeptides may be P2A from porcine techovirus-1, T2A from Thora asigna virus, E2A from equal rhinitis A virus and F2A from foot-and-mouth disease virus. Among them, P2A has the highest efficiency in cleavage, so it is preferable.
  • the self-cleavage peptide is P2A as shown in SEQ ID NO: 9.
  • the gene editing system at least contains an expression construct, which contains a nucleotide sequence encoding the amino acid sequence shown in SEQ ID NO: 10 or SEQ ID NO: 11.
  • the present invention provides a gene editing system for editing a target sequence in the genome of a cell, comprising:
  • polypeptide and/or an expression construct comprising a nucleotide sequence encoding the polypeptide
  • RNA and/or an expression construct comprising a nucleotide sequence encoding the guide RNA
  • the polypeptide comprises cytosine deaminase, CRISPR nuclease, AP lyase and optionally uracil-DNA glycosylase (UDG) , wherein the guide RNA is capable of targeting the polypeptide to the target sequence in the genome of the cell.
  • the expression construct comprising the nucleotide sequence encoding the polypeptide and the expression construct comprising the nucleotide sequence encoding the guide RNA may be different expression constructs, or may be the same expression construct.
  • the polypeptide is an isolated polypeptide, and/or the guide RNA is an isolated RNA.
  • the polypeptide contains the amino acid sequence shown in SEQ ID NO: 10 or SEQ ID NO: 11.
  • CRISPR nuclease generally refers to nucleases present in the naturally occurring CRISPR systems, as well as modified forms thereof, variants thereof, or catalytically active fragments thereof.
  • the CRISPR nuclease may recognize, bind, and/or cleave the target nucleic acid structure by interacting with the guide RNA.
  • This term encompasses any CRISPR system based nuclease or functional variant capable of gene editing in the cell.
  • the functional variant retains its double-stranded cleavage activity, i.e., the ability to form a double-stranded break (DSB) in the target sequence.
  • DSB double-stranded break
  • the CRISPR nuclease used in the gene editing system of the present invention may be selected from, for example, Cas3, Cas8a, Cas5, Cas8b, Cas8c, Cas10d, Cse1, Cse2, Csy1, Csy2, Csy3, GSU0054, Cas10, Csm2, Cmr5, Cas10, Csx11, Csx10, Csf1, Cas9, Csn2, Cas4, Cpf1, C2c1, C2c3 or C2c2 proteins, or functional variants of these nucleases.
  • the CRISPR nuclease comprises Cas9 nuclease or a variant thereof.
  • the Cas9 nuclease may be a Cas9 nuclease originated from a different species, such as spCas9 from S. pyogenes.
  • the Cas9 nuclease variant may comprise, for example, a highly specific variant of Cas9 nuclease, such as the Cas9 nuclease variant eSpCas9 (1.0) (K810A/K1003A/R1060A) , eSpCas9 (1.1) (K848A/K1003A/R1060A) of Feng Zhang et al., and the Cas9 nuclease variant SpCas9-HF1 (N497A/R661A/Q695A/Q926A) developed by J. Keith Joung et al.
  • the CRISPR nuclease has the amino acid sequence shown in SEQ ID NO: 1.
  • the CRISPR nuclease comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%sequence identity with SEQ ID NO: 1, or having one or more conservative amino acid substitutions relative to SEQ ID NO: 1.
  • the CRISPR nuclease may also comprise Cpf1 nuclease or a variant thereof, such as a highly specific variant.
  • the Cpf1 nuclease may be a Cpf1 nuclease originated from different species, such as Cpf1 nuclease originated from Francisella novicida U112, Acidaminococcus sp. BV3L6 and Lachnospiraceae bacterium ND2006.
  • cytosine deaminase refers to a deaminase that may accept single-stranded DNA as a substrate and may catalyze the deamidation of cytidine or deoxycytidine into uracil or deoxyuracil, respectively.
  • cytosine deaminase examples include but are not limited to, for example, APOBEC1 deaminase, activation-induced cytidine deaminase (AID) , APOBEC3G, CDA1, human APOBEC3A deaminase, APOBEC3B deaminase (e.g., truncated APOBEC3B deaminase) .
  • the cytosine deaminase is human APOBEC3A deaminase, for example, having an amino acid sequence shown in SEQ ID NO: 2.
  • the cytosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%sequence identity with SEQ ID NO: 2, or having one or more conservative amino acid substitutions relative to SEQ ID NO: 2.
  • the cytosine deaminase is truncated APOBEC3B deaminase (APOBEC3Bctd) , for example, the amino acid sequence thereof is shown in SEQ ID NO: 7.
  • the cytosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%sequence identity with SEQ ID NO: 7 %, or having one or more conservative amino acid substitutions relative to SEQ ID NO: 7.
  • uracil-DNA glycocasylase or uracil-N-glycosylase (UNG) refers to an enzyme capable of recognizing the U base and remove the N-glycosidic bond of the base to form an apurinic or apyrimidinic site.
  • the UDG may originate from different sources, for example from E. coli.
  • the UDG has the amino acid sequence shown in SEQ ID NO: 3.
  • the DUG comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%sequence identity with SEQ ID NO: 3, or having one or more conservative amino acid substitutions relative to SEQ ID NO: 3.
  • AP lyase AP endonuclease
  • apurinic pyrimidine lyase refers to an enzyme capable of recognizing the apurinic or apyrimidinic site on the nucleic acid and cleaving the nucleic acid.
  • the AP lyase may originate from different sources, for example from E. coli.
  • the AP lyase has the amino acid sequence shown in SEQ ID NO: 4.
  • the AP lyase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%sequence identity with SEQ ID NO: 4, or having one or more conservative amino acid substitutions relative to SEQ ID NO: 4.
  • gRNA and “guide RNA” are used interchangeably, and refer to RNA molecule capable of forming a complex with CRISPR nuclease and capable of targeting the complex to the target sequence due to certain complementarity with the target sequence.
  • gRNA typically consists of crRNA and tracrRNA molecules that are partially complementary to form a complex, where the crRNA contains a sequence that is sufficiently complementary to the target sequence so as to hybridize to the target sequence and direct the CRISPR complex (Cas9+crRNA+tracrRNA) to specifically bind to this target sequence.
  • gRNA single guide RNA
  • Cpf1+crRNA complex
  • gRNA typically only consists of mature crRNA molecules, where the sequence contained in the crRNA is sufficiently identical to the target sequence so as to hybridize with the complementary sequence of the target sequence and guide the complex (Cpf1+crRNA) to bind specifically with the target sequence. It is within the ability of those skilled in the art to design a suitable gRNA based on the CRISPR nuclease used and the target sequence to be edited.
  • a "target sequence” is a sequence complementary or identical to (depending on different CRISPR nucleases) the guide sequence having about 20 nucleotides contained in the guide RNA.
  • the guide RNA targets the target sequence by base pairing with the target sequence or its complementary strand.
  • the gene editing results in the deletion of one or more nucleotides in the target sequence, preferably results in the deletion of multiple consecutive nucleotides in the target sequence.
  • the type and length of deletion depends on the position of double-stranded break (DSB) caused by CRISPR nuclease and the number and position of cytosine (C) bases present in the target sequence or its complementary sequence.
  • the length of the deletion does not exceed the length of the target sequence.
  • the deletion may be deletion of about 1-17 nucleotides, such as 10-17 nucleotides, such as 10, 11, 12, 13, 14, 15, 16, 17 nucleotides.
  • the cytosine deaminase is fused to the N terminal of the CRISPR nuclease.
  • the cytosine deaminase, the CRISPR nuclease, the UDG and/or the AP lyase are directly linked.
  • the cytosine deaminase, the CRISPR nuclease, the UDG and/or the AP lyase are linked with linkers.
  • the linkers may be non-functional amino acid sequences of 1-50 (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or 20-25, 25-50) or more amino acids, without any secondary or higher structure.
  • the linker may be a flexible linker, such as GGGGS, GS, GAP, (GGGGS) x 3, GGS, (GGS) x 7, and the like.
  • the linker contains the amino acid sequence shown in SEQ ID NO: 8.
  • the polypeptide of the invention further comprises a nuclear localization sequence (NLS) .
  • NLS nuclear localization sequence
  • one or more NLS in the polypeptide should be of sufficient strength to drive the accumulation of the polypeptide in the nucleus of the cell in an amount capable of performing its gene editing function.
  • the strength of nuclear localization activity is determined by the number, position of NLS in the polypeptide, one or more specific NLS used, or a combination of these factors.
  • the NLS of the polypeptide of the present invention may be at the N terminal and/or C terminal. In some embodiments of the invention, the NLS of the polypeptide of the present invention may be located between the cytosine deaminase, the CRISPR nuclease, the UDG and/or the AP lyase. In some embodiments, the polypeptide comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLS. In some embodiments, the polypeptide comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLS at or close to the N terminal. In some embodiments, the polypeptide comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLS at or close to the C terminal. In some embodiments, the polypeptide comprises a combination of the above, such as comprises one or more NLS at the N terminal and one or more NLS at the C terminal. When there is more than one NLS, each may be selected as not dependent on other NLS.
  • NLS consists of one or more short sequences of positively charged lysine or arginine exposed on the surface of the protein, but other types of NLS are also known.
  • Non-limiting examples of NLS include: KKRKV (nucleotide sequence 5'-AAGAAGAGAAAGGTC-3') , PKKKRKV (nucleotide sequence 5'-CCCAAGAAGAAGAGGAAGGTG-3' or CCAAAGAAGAAGAGGAAGGTT) , or SGGSPKKKRKV (nucleotide sequence 5'-TCGGGGGGGAGCCCAAAGAAGAAGCGGAAGGTG-3') .
  • the polypeptide of the present invention may also include other localization sequences, such as a cytoplasmic localization sequence, a chloroplast localization sequence, a mitochondrial localization sequence, and the like.
  • the first polypeptide comprises the amino acid sequence shown in SEQ ID NO: 5.
  • the second polypeptide comprises the amino acid sequence shown in SEQ ID NO: 6.
  • the nucleotide sequence encoding the polypeptide is codon optimized for the organism from which the cell to be gene-edited originates.
  • the codon optimization refers to a method for replacing at least one codon in the natural sequence (for example, about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50 or more codons) with a codon used more frequently or most frequently in the gene of the host cell, and maintaining the natural amino acid sequence while modifying the nucleic acid sequence to enhance expression in the host cell of interest.
  • Different species exhibit specific preferences for certain codons of specific amino acids. Codon preference (difference in codon usage between organisms) is often related to the translation efficiency of messenger RNA (mRNA) , which is considered as depending on the nature of the codon being translated and the availability of the specific transfer RNA (tRNA) molecule.
  • mRNA messenger RNA
  • tRNA transfer RNA
  • codon usage tables may be easily obtained, for example, in the codon usage database ( "Codon Usage Database” ) available at www. kazusa. orjp/codon/, and these tables may be adjusted and applied in different ways. See Nakamura Y. et al., "Codon usage tabulated from the international DNA sequence databases: status for the year 2000” . Nucl. Acids Res., 28: 292 (2000) .
  • the organism from which the cell may be genetically edited by the system of the present invention originates is preferably a eukaryote, including but not limited to, mammals such as human, mouse, rat, monkey, dog, pig, sheep, cattle, cat; poultry such as chicken, duck, geese; and plants, including monocotyledons and dicotyledons, such as rice, corn, wheat, sorghum, barley, soybean, peanut, Arabidopsis, etc.
  • mammals such as human, mouse, rat, monkey, dog, pig, sheep, cattle, cat
  • poultry such as chicken, duck, geese
  • plants including monocotyledons and dicotyledons, such as rice, corn, wheat, sorghum, barley, soybean, peanut, Arabidopsis, etc.
  • the present invention provides a method for modifying a target sequence in the genome of a cell, comprising introducing the gene editing system of the present invention into the cell.
  • the modification results in the deletion of one or more nucleotides in the target sequence, preferably results in the deletion of multiple consecutive nucleotides in the target sequence.
  • the type and length of deletion caused by the deletion depends on the position of double-stranded break (DSB) caused by CRISPR nuclease and the number and position of cytosine (C) bases present in the target sequence or its complementary sequence.
  • the deletion is within the target sequence.
  • the modification does not include insertion and/or substitution mutation.
  • the present invention further provides a method for producing a genetically modified cell, comprising introducing the gene editing system of the present invention into the cell.
  • the present invention further provides a genetically modified organism, comprising a genetically modified cell or progeny cell thereof produced by the method of the present invention.
  • the target sequence to be modified may be located at any position in the genome, for example, within a functional gene such as a protein-encoding gene, or for example, may be located in a gene expression regulatory region such as a promoter region or an enhancer region, so as to provide modification of the gene function or modification of gene expression.
  • the modification in the target sequence of the cell may be detected by T7EI, PCR/RE or sequencing methods.
  • the gene editing system may be introduced into the cell by various methods well known to those skilled in the art.
  • Methods that may be used to introduce the gene editing system of the present invention into the cell include, but are not limited to: calcium phosphate transfection, protoplast fusion, electroporation, lipofection, microinjection, viral infection (e.g., baculovirus, vaccinia virus, adenovirus, adeno-associated virus, lentivirus and other viruses) , gene gun method, PEG-mediated protoplasts transformation, Agrobacterium-mediated transformation.
  • viral infection e.g., baculovirus, vaccinia virus, adenovirus, adeno-associated virus, lentivirus and other viruses
  • the cells that may be genetically edited by the method of the present invention may be derived from, for example, mammals such as human, mouse, rat, monkey, dog, pig, sheep, cattle, and cat; poultry such as chicken, duck, geese; and plants, including monocotyledons and dicotyledons, such as rice, corn, wheat, sorghum, barley, soybean, peanut, Arabidopsis, etc.
  • the method of the invention is performed in vitro.
  • the cell is an isolated cell, or a cell in an isolated tissue or organ.
  • the method of the invention may also be performed in vivo.
  • the cell is a cell in an organism, and the system of the present invention may be introduced into the cell in vivo by, for example, a virus or Agrobacterium-mediated method.
  • the present invention further comprises a kit for use in the method of the present invention, the kit comprising the gene editing system of the present invention and instructions for use.
  • the kit generally comprises a label indicating the intended use and/or method of use for the contents of the kit.
  • the term label includes any written or recorded material provided on or with the kit or otherwise provided with the kit.
  • the UDG and AP lyase sequences from E. coli were obtained from NCBI (accession numbers AMB53293.1 and WP_115209270.1, respectively) , and they were codon optimized for rice and genetic synthesized at GENEWIZ, Inc. (Suzhou) .
  • NCBI accession numbers AMB53293.1 and WP_115209270.1, respectively
  • the gene fragment of fusion protein of APOBEC3A, Cas9 and UDG and the gene fragment of AP lyase were finally separately introduced into the pJIT163 vector backbone to obtain pA3A-SpCas9-UDG and pJIT163-Ubi-AP vectors.
  • APOBEC3A was fused to the N-terminus of Cas9 with an XTEN linker
  • UDG was fused to the C-terminus of Cas9
  • AP lyase was fused to the C-terminus of UDG using a self-splicing 2A polypeptide (P2A)
  • P2A self-splicing 2A polypeptide
  • APOBEC3Bctd originated from human APOBC3B sequence (accession number is NM_004900.5) , which is truncated to obtain the C-terminal functional catalytic domain of APOBEC3B (APOBEC3Bctd) ) to construct eAFID-3.
  • APOBEC3Bctd a stable transformation vector pH-AFID-3, which were used for the genetic transformation of rice mediated by Agrobacterium infection.
  • wheat seed coat pigments flanone-3-hydroxylase gene, TaF3H-A1/B1/D1; leucoanthocyanidin reductase gene TaLAR-A1/B1/D1 and its regulatory genes (TaMYB10-A1/B1/D1) , plasma membrane kinase associated with disease resistance of a plant (TaPMK-A1/B1/D1) , vernalization associated genes (TaVRN1-A1/B1/D1) and gibberellin stimulation regulatory factor genes associated with growth and development (TaGASR6-A1/B1/D1) , gene editing target site sequences (sgF3HT4, sgLART4, sgMYBT2, sgPMKT1, sgVRN1T1 and sgGS6T2, see Table 1 for detailed sequence) were obtained respectively, then the sgRNA target site primers were synthesized.
  • pTaU6-sgRNA vector by using T4 ligase, to obtain pTaU6-sgF3HT4, pTaU6-sgLART4, pTaU6-sgMYBT2, PTaU6-sgPMKT1, pTaU6-sgVRN1T1 and pTaU6-sgGS6T2 vectors respectively.
  • endogenous targets were selected from 7 rice genes (OsAAT, OsACC, OsCDC48, OsNRT1.1B, OsPDS, OsGRF1, and OsSPL14/OsIPA1) to construct pOsU3-sgRNA vectors, and 4 endogenous targets were selected from 4 wheat genes (TaF3H, TaGASR6, TaMYB10 and TamiR396) to construct pTaU6-sgRNA vector. See Table 2 for the sequence of all target sites. The sgRNA targeting site primers were synthesized. Then they were annealed and ligated to the sgRNA vectors by using T4 ligase.
  • sgRNA Target sequence sgOsAAT CAAGGATCCCAGCCCCGTGAAGG sgOsACC TCCACAGCTATCACACCCACTGG sgOsCDC48-T1 GACCAGCCAGCGTCTGGCGCCGG sgOsCDC48-T2 CCAGATATCATTGACCCTGCCTT sgOsNRT1.1B ACTAGATATCTAAACCATTAAGG sgOsPDS GTTGGTCTTTGCTCCTGCAGAGG sgOsSPL14 CCAGGCGATCGGATCTCCGGTGG sgOsGRF1-miRT GAACCGTTCAAGAAAGCCTGTGG sgOsIPA1-miRT CTCTTCTGTCAACCCAGCCATGG sgTaF3H CCGAGATCCGGGACCGCGTGGCG sgTaGASR6 CCCGGCACCGCCGGCAACGAGGA sgTaMYB10 TGGCTCAACTACCTCCGGCCGGG sgT
  • the PAM sequence is shown in bold.
  • the rice seeds of Zhonghua 11 were rinsed with 75%ethanol for 1 minute, then treated with 4%sodium hypochlorite for 30 minutes, and washed with sterile water more than 5 times. Place on M6 medium for 3-4 weeks, 26 °C, protected from light.
  • the wheat seeds were potted and planted in the cultivation room, and cultured for about 1-2 weeks (about 10 days) at a temperature of 25 ⁇ 2 °C, light intensity of 1000 Lx, and light exposure of 14-16 h/d.
  • the protoplast was collected in a 2 mL centrifuge tube, the protoplast DNA (about 30 ⁇ L) was extracted by using the CTAB method, with its concentration (30-60 ng/ ⁇ L) measured by using a NanoDrop ultramicro spectrophotometer, and then stored at -20 °C.
  • the 20 ⁇ L amplification system contains 4 ⁇ L 5 ⁇ Fastpfu buffer, 1.6 ⁇ L dNTPs (2.5 mM) , 0.4 ⁇ L Forward primer (10 ⁇ M) , 0.4 ⁇ L Reverse primer (10 ⁇ M) , 0.4 ⁇ L FastPfu polymerase (2.5 U/ ⁇ L) , and 2 ⁇ L DNA template (about 60 ng) .
  • Amplification conditions pre-denaturation at 95 °C for 5 min; denaturation at 95 °C for 30 s, annealing at 50-64 °C for 30 s, extension at 72 °C for 30 s for 35 cycles; fully extension at 72 °C for 5 min, and store at 12 °C;
  • the above amplification product was diluted by 10-fold, and 1 ⁇ L was used as the template for the second round of PCR amplification.
  • the amplification primer was a sequencing primer containing Barcode.
  • the 50 ⁇ L amplification system contains 10 ⁇ L 5 ⁇ Fastpfu buffer, 4 ⁇ L dNTPs (2.5 mM) , 1 ⁇ L Forward primer (10 ⁇ M) , 1 ⁇ L Reverse primer (10 ⁇ M) , 1 ⁇ L FastPfu polymerase (2.5 U/ ⁇ L) , and 1 ⁇ L DNA template.
  • the amplification conditions are as described above, and the number of amplification cycles is 38 cycles.
  • PCR products were separated on 2%agarose gel electrophoresis, and AxyPrepTM DNA Gel Extraction kit was used to recover the target fragments.
  • the recovered products were quantitatively analyzed by NanoDrop ultra-micro spectrophotometer. 100 ng of the recovered products were taken respectively and mixed and sent to GENEWIZ, Inc. for amplicon sequencing library construction and amplicon sequencing analysis.
  • the single-base editing system has been established in 2016 ( Komor et al., 2016; Ma et al., 2016; Nishida et al., 2016) .
  • the system uses nCas9 (D10A) to guide the action of cytosine deaminase on the non-complementary strand of a DNA target site, and deaminate the cytosine (C) in a specific region into uracil (U) .
  • the uracil (U) will be replaced by thymine (T) in the process of DNA replication, thus achieving accurate single-base replacement of C-to-T.
  • the uracil-DNA glycocasylase In the repair process of animal and plant organism, the uracil-DNA glycocasylase (UDG) will preferentially recognize the U base and remove the N-glycosidic bond of the base to form an apurinic or apyrimidinic site (AP site) , and then repair the U base to the original C base under the action of AP lyase through base excision repair. Therefore, uracil-DNA glycocasylase inhibitor (UGI) is often introduced in a single-base editing system to improve C-to-T editing efficiency.
  • UGI uracil-DNA glycocasylase inhibitor
  • the inventors have surprisingly found that replacing nCas9 of the fusion protein in the single-base editing system with wild-type Cas9 allows the fusion protein to regain the ability to break the DNA double strand, while replacing UGI with UDG to recognize the U base and excise its glycosidic bond to form the AP site, which in turn is recognized by AP lyase, excising the glycosylated U base, which can eventually achieve efficient, accurate and predictable deletion of short fragments in cells.
  • the inventor thus constructed an efficient, accurate and predictable short fragment deletion system (APOBEC3A Coupled Deletion, ACD) consisting of Cas9, APOBEC3A, UDG and AP lyase, where the Cas9 mediates the generation of DSB at the DNA target site, while the APOBEC3A, UDG, and AP lyase mediate multiple gaps at the C base of the non-complementary strand upstream of the DSB, resulting in the deletion of single-stranded DNA fragments on the non-complementary strand, leading to the formation of short double-stranded DNA fragments under the action of DNA repair of an organism (FIG. 1) .
  • ACD efficient, accurate and predictable short fragment deletion system
  • the APOBEC3A may efficiently mediate the C-to-U replacement on the non-targeting strand upstream of the DSB, while the UDG and AP lyase mediate the formation of gaps at the U base, resulting in deletion of single stranded DNA fragments on the non-targeting strand. Then, a 5' overhanging end was formed on the targeting strand.
  • the latter is first recognized and excised by the Artemis-DNA-PK complex during the repair of the non-homologous end of the organism, and further forms short fragment deleted double-stranded DNA at the action of the junctional complex consisting of DNA Ligase IV, XRCC4, XRCC4 analogues (XLF) and their paralogs (PAXX) (Chang et al. 2017) .
  • the efficiency of generating insertion and deletion by SpCas9 and ACD were compared and analyzed at the targeted editing sites of sgF3HT4, sgLART4, sgMYBT2, sgPMKT1, sgVRN1T1 and sgGS6T2.
  • the result showed that the Insertion mutation rate generated by the ACD system decreased significantly compared to that by SpCas9, while the Deletion mutation generation rate increased significantly, and the Deletion mutation generation rate was 1.5-23.6 times of that of SpCas9, which fully demonstrated the high efficiency of the ACD system (FIG. 2) .
  • the present invention selects the human APOBEC3A with high deamination activity and wide deamination window to construct the AFID-3 system, and screens a APOBEC3Bctd with higher deamination activity and narrow window to replace the APOBEC3A to construct the eAFID-3 system ( Figure 9 and Figure 10) .
  • Comparative analysis of the deletion efficiencies of Cas9, AFID-3, and eAFID-3 on rice and wheat endogenous gene targets revealed that the efficiency of generating deletion mutations via AFID-3 and eAFID-3 increased significantly compared to Cas9.
  • the average deletion mutation rate was 2.2 times and 2.6 times than that of Cas9, which fully demonstrated the high efficiency of the AFID system.
  • the types and proportions of mutations generated by AFID-3 and eAFID-3 on different endogenous targets were analyzed. The results showed that the length of the deleted fragment mainly depends on the position of the deaminated C nucleotide and its deamination activity. At the target site with strong deamination activity, the mutation type is mainly deletion mutation; but at the target site with weak deamination activity, a certain percentage of insertion mutations will appear.
  • Example 5 AFID system mediates predictable polynucleotide deletion mutations in plants
  • AFID system In order to determine whether the AFID system can mediate predictable polynucleotide deletion mutations in plants, two targets (TamiR396 and TaGASR6) were selected on wheat, and Cas9, AFID-3 were delivered into immature wheat embryos with corresponding sgRNA by gene gun bombardment; three targets (OsCDC48-T2, OsSPL14, and OsPDS) were selected on rice to construct the corresponding pH-Cas9 and pH-AFID-3 Agrobacterium vectors ( Figure 17) and rice callus was transformed by Agrobacterium infection.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Mycology (AREA)
  • Cell Biology (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

Provided is a gene editing system for editing a target sequence in the genome of a cell, comprising a CRISPR nuclease, a cytosine deaminase, an AP lyase, a guide RNA and optionally an uracil-DNA glycosylase. Also provided are a method of producing a genetically modified cell, and a kit comprising the gene editing system.

Description

Improved Gene Editing System Technical Field
The invention relates to the field of genetic engineering. In particular, the present invention relates to an improved gene editing system. More specifically, the present invention relates to a gene editing system capable of providing accurate editing, particularly predictable accurate polynucleotide deletion, to the genome of a eukaryotic cell.
Background
In recent years, as the continuous development of genome editing technology, a large number of gene editing tools have been developed, improved and applied, from the gene knockout tools mediated by SpCas9 to the single-base editing tools mediated by nCas9 (D10A) fusion cytosine deaminase, etc. Under the guidance of the guide RNA, SpCas9 binds and cleaves double-stranded DNA to form a double-stranded break (DSB) . In the repair process of organism, insertions and/or deletions of fragment with different lengths are often introduced. However, such insertions and/or deletions are random, inaccurate and unpredictable (Wang et al., 2014; Zhang et al., 2016) . 
Figure PCTCN2020088887-appb-000001
et al. (2017) significantly increased the frequency of deletion mutations and the length of the deletion fragments by Cas9 fusion 3' repair exonuclease 2 (Trex2) , but the mutation type is still inaccurate and unpredictable. Targeted deletion using a pair of sgRNAs may result in deletion of a specific long fragment, but at the same time it also produces inversion, small fragments of InDel, etc., which also greatly reduces the efficiency of the former (
Figure PCTCN2020088887-appb-000002
et al., 2017) . In order to provide precise fragment deletions, Wolfs et al. (2016) fused Cas9 with TevI nuclease, which recognizes the enzymatic cleavage site and cleaves the double-stranded DNA. This cleavage forms 33-36 bp deletion together with the DSB cleaved by Cas9. However, due to the restriction of the cleavage site, the efficiency of this system is low. Up till now, a tool capable of providing efficient, accurate, and predictable short fragment deletion within the protospacer has still not been developed.
As such, it is still desirable in the art for a gene editing system capable of providing accurate editing, particularly predictable accurate polynucleotide deletion, to the genome of a eukaryotic cell.
Summary of the Invention
In one aspect, the present invention provides a gene editing system for editing a target sequence in the genome of a cell, comprising:
i) a first polypeptide and/or an expression construct comprising a nucleotide sequence  encoding the first polypeptide;
ii) a second polypeptide and/or an expression construct comprising a nucleotide sequence encoding the second polypeptide; and
iii) a guide RNA and/or an expression construct comprising a nucleotide sequence encoding the guide RNA,
wherein the first polypeptide comprises CRISPR nuclease, cytosine deaminase and optionally uracil-DNA glycosylase (UDG) , the second polypeptide comprises AP lyase, wherein the guide RNA is capable of targeting the first polypeptide to the target sequence in the genome of the cell.
In one aspect, the present invention provides a gene editing system for editing a target sequence in a cell genome, comprising:
i) a polypeptide and/or an expression construct comprising a nucleotide sequence encoding the polypeptide; and
ii) a guide RNA and/or an expression construct comprising a nucleotide sequence encoding the guide RNA,
wherein the polypeptide comprises CRISPR nuclease, cytosine deaminase, AP lyase and optionally uracil-DNA glycosylase (UDG) , wherein the guide RNA is capable of targeting the polypeptide to the target sequence in the cell genome.
In an aspect, the present invention provides a method for producing a genetically modified cell, comprising introducing the gene editing system of the present invention into the cell.
In an aspect, the present invention provides a kit comprising the gene editing system of the invention and instructions for use.
Brief Description of the Drawings
FIG. 1 shows the working mode of an ACD system.
FIG. 2 shows a comparative analysis of the efficiency of InDel generation at different targeting sites between SpCas9 and ACD systems.
FIG. 3 shows the type and efficiency of the deletion mutation formed by the ACD system at the sgF3HT4 site.
FIG. 4 shows the type and efficiency of the deletion mutation formed by the ACD system at the sgLART4 site.
FIG. 5 shows the type and efficiency of the deletion mutation formed by the ACD system at the sgMYBT2 site.
FIG. 6 shows the type and efficiency of the deletion mutation formed by the ACD system at the sgPMKT1 site.
FIG. 7 shows the type and efficiency of the deletion mutation formed by the ACD system at the sgVRN1T1 site.
FIG. 8 shows the type and efficiency of the deletion mutation formed by the ACD system at the sgGS6T2 site.
Figure 9. shows the difference in deamination activity and deamination window of different cytosine deaminase.
Figure 10. shows the schematic diagram of the vector construction of two different types of AFID systems.
Figure 11. shows the deletion efficiency of Cas9, AFID-3, and eAFID-3 on different rice endogenous targets.
Figure 12. shows the deletion efficiency of Cas9, AFID-3, and eAFID-3 on different wheat endogenous targets.
Figure 13. shows types and proportions of deletion mutations of AFID-3 and eAFID-3 on rice endogenous targets.
Figure 14. shows types and proportions of deletion mutations of AFID-3 and eAFID-3 on wheat endogenous targets.
Figure 15. shows the preference of AFID-3 and eAFID-3 for cytosine bases where the predictable fragment deletion starts.
Figure 16. shows that the mutation types and proportions thereof of required predictable deletion in the reading frame generated by the Cas9, AFID-3, and eAFID-3 at the miR396h binding site of the rice OsGRF1 gene and the miR156 binding site of the OsIPA1 gene, respectively.
Figure 17. shows the schematic diagram of the construction of the AFID-3 vector used for Agrobacterium infection in rice.
Figure 18. shows the types of regenerated plant mutants produced by Cas9 and AFID-3 on the rice OsCDC48 gene.
Detailed Description of the Invention
1. Definition
In the present invention, the scientific and technical terms used herein have the meaning as commonly understood by a person skilled in the art unless otherwise specified. Also, the protein and nucleic acid chemistry, molecular biology, cell and tissue culture, microbiology, immunology related terms, and laboratory procedures used herein are terms and routine steps that are widely usedin the corresponding field. For example, standard recombinant DNA and molecular cloning techniques used in the present invention are well known to those skilled in the art and are more fully described in the following document: Sambrook, J., Fritsch, E.F.  and Maniatis, T., Molecular Cloning: A Laboratory Manual; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, 1989 (hereinafter referred to as "Sambrook" ) .
As used herein, the term "and/or" encompasses all combinations of items connected by the term, and each combination should be regarded as individually listed herein. For example, "A and/or B" covers "A" , "A and B" , and "B" . For example, "A, B, and/or C" covers "A" , "B" , "C" , "A and B" , "A and C" , "B and C" , and "A and B and C" .
When the term "comprise" is used herein to describe the sequence of a protein or nucleic acid, the protein or nucleic acid may consist of the sequence, or may have additional amino acids or nucleotide at one or both ends of the protein or nucleic acid, but still have the activity described in this invention. In addition, those skilled in the art know that the methionine encoded by the start codon at the N-terminus of the polypeptide will be retained under certain practical conditions (for example, when expressed in a specific expression system) , but does not substantially affect the function of the polypeptide. Therefore, when describing the amino acid sequence of specific polypeptide in the specification and claims of the present application, although it may not include the methionine encoded by the start codon at the N-terminus, the sequence containing the methionine is also encompassed, correspondingly, its coding nucleotide sequence may also contain a start codon; vice versa.
"Genome" as used herein encompasses not only chromosomal DNA present in the nucleus, but also organellar DNA present in the subcellular components (eg, mitochondria, plastids) of the cell.
As used herein, "organism" includes any organism, preferably eukaryotic organism that is suitable for genomic editing. Examples of organisms include, but are not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens, ducks, geese; plants including monocots and dicots such as rice, corn, wheat, sorghum, barley, soybean, peanut, arabidopsis and the like.
A “genetically modified organism” or “genetically modified cell” includes the organism or the cell which comprises within its genome an exogenous polynucleotide or a modified gene or expression regulatory sequence. For example, the exogenous polynucleotide is stably integrated within the genome of the organism or the cell such that the polynucleotide is passed on to successive generations. The exogenous polynucleotide may be integrated into the genome alone or as part of a recombinant DNA construct. The modified gene or expression regulatory sequence means that, in the organism genome or the cell genome, said sequence comprises one or more nucleotide substitution, deletion, or addition.
"Exogenous" in reference to a sequence means a sequence from a foreign species, or refers to a sequence in which significant changes in composition and /or locus occur from its native form through deliberate human intervention if from the same species.
"Polynucleotide" , "nucleic acid sequence" , "nucleotide sequence" or "nucleic acid fragment" are used interchangeably and are single-stranded or double-stranded RNA or DNA polymers, optionally containing synthetic, non-natural or altered nucleotide bases. Nucleotides are referred to by their single letter names as follows: "A" is adenosine or deoxyadenosine (corresponding to RNA or DNA, respectively) , "C" means cytidine or deoxycytidine, "G" means guanosine or deoxyguanosine, "U" represents uridine, "T" means deoxythymidine, "R" means purine (A or G) , "Y" means pyrimidine (C or T) , "K" means G or T, "H" means A or C or T, "I" means inosine, and "N" means any nucleotide.
"Polypeptide, " "peptide, " and "protein" are used interchangeably in the present invention to refer to a polymer of amino acid residues. The terms apply to an amino acid polymer in which one or more amino acid residues is artificial chemical analogue of corresponding naturally occurring amino acid (s) , as well as to a naturally occurring amino acid polymer. The terms "polypeptide, " "peptide, " "amino acid sequence, " and "protein" may also include modified forms including, but not limited to, glycosylation, lipid ligation, sulfation, γcarboxylation of glutamic acid residues, and ADP-ribosylation.
Sequence "identity" has recognized meaning in the art, and the percentage of sequence identity between two nucleic acids or polypeptide molecules or regions can be calculated using the disclosed techniques. Sequence identity can be measured along the entire length of a polynucleotide or polypeptide or along a region of the molecule. (See, for example, Computational Molecular Biology, Lesk, A.M., ed., Oxford University Press, New York, 1988; Biocomputing: Informatics and Genome Projects, Smith, D.W., ed., Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part I, Griffin, A.M., and Griffin, H.G., eds., Humana Press, New Jersey, 1994; Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987; and Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., M Stockton Press, New York, 1991) . Although there are many methods for measuring the identity between two polynucleotides or polypeptides, the term "identity" is well known to the skilled person (Carrillo, H. &Lipman, D., SIAM J Applied Math 48: 1073 (1988) ) .
Suitable conserved amino acid replacements in peptides or proteins are known to those skilled in the art and can generally be carried out without altering the biological activity of the resulting molecule. In general, one skilled in the art recognizes that a single amino acid replacement in a non-essential region of a polypeptide does not substantially alter biological activity (See, for example, Watson et al., Molecular Biology of the Gene, 4th Edition, 1987, The Benjamin/Cummings Pub. co., p. 224) .
As used in the present invention, "expression construct" refers to a vector such as a recombinant vector that is suitable for expression of a nucleotide sequence of interest in an organism. "Expression" refers to the production of a functional product. For example,  expression of a nucleotide sequence may refer to the transcription of a nucleotide sequence (eg, transcription to produce mRNA or functional RNA) and /or the translation of an RNA into a precursor or mature protein.
The "expression construct" of the present invention may be a linear nucleic acid fragment, a circular plasmid, a viral vector or, in some embodiments, an RNA that is capable of translation (such as mRNA) .
The "expression construct" of the present invention may comprise regulatory sequences and nucleotide sequences of interest from different origins, or regulatory sequences and nucleotide sequences of interest from the same source but arranged in a manner different from that normally occurring in nature.
"Regulatory sequence" and "regulatory element" are used interchangeably to refer to a nucleotide sequence that is located upstream (5 'non-coding sequence) , middle or downstream (3' non-coding sequence) of a coding sequence and affects the transcription, RNA processing or stability or translation of the relevant coding sequence. Regulatory sequences may include, but are not limited to, promoters, translation leaders, introns and polyadenylation recognition sequences.
"Promoter" refers to a nucleic acid fragment capable of controlling the transcription of another nucleic acid fragment. In some embodiments of the present invention, the promoter is a promoter capable of controlling the transcription of a gene in a cell, whether or not it is derived from the cell. The promoter may be a constitutive promoter or tissue-specific promoter or developmentally-regulated promoter or inducible promoter.
"Constitutive promoter" refers to a promoter that may in general cause the gene to be expressed in most cases in most cell types. "Tissue-specific promoter" and "tissue-preferred promoter" are used interchangeably and mean that they are expressed primarily but not necessarily exclusively in one tissue or organ, but also in a specific cell or cell type. "Developmentally-regulated promoter" refers to a promoter whose activity is dictated by developmental events. "Inducible promoter" selectively express operably linked DNA sequences in response to an endogenous or exogenous stimulus (environment, hormones, chemical signals, etc. ) .
Examples of promoters include, but are not limited to, the polymerase (pol) I, pol II or pol III promoters. Examples of the pol I promoter include the gallus RNA pol I promoter. Examples of the pol II promoters include, but are not limited to, the immediate-early cytomegalovirus (CMV) promoter, the Rous sarcoma virus long terminal repeat (RSV-LTR) promoter, and the immediate-early simian virus 40 (SV40) promoter. Examples of pol III promoters include the U6 and H1 promoters. An inducible promoter such as a metallothionein promoter can be used. Other examples of promoters include the T7 phage promoter, the T3  phage promoter, the β-galactosidase promoter, and the Sp6 phage promoter, and the like. Promoters that can be used in plants include, but are not limited to, cauliflower mosaic virus 35S promoter, maize Ubi-1 promoter, wheat U6 promoter, rice U3 promoter, and rice actin promoter, and the like.
As used herein, the term "operably linked" refers to the linkage of a regulatory element (e.g., but not limited to, a promoter sequence, a transcription termination sequence, etc. ) to a nucleic acid sequence (e.g., a coding sequence or an open reading frame) such that transcription of the nucleotide sequence is controlled and regulated by the transcriptional regulatory element. Techniques for operably linking regulatory element regions to nucleic acid molecules are known in the art.
"Introduction" of a nucleic acid molecule (e.g., plasmid, linear nucleic acid fragment, RNA, etc. ) or protein into an organism means that the nucleic acid or protein is used to transform an organism cell such that the nucleic acid or protein is capable of functioning in the cell. As used in the present invention, "transformation" includes both stable and transient transformations.
"Stable transformation" refers to the introduction of exogenous nucleotide sequences into the genome, resulting in the stable inheritance of foreign genes. Once stably transformed, the exogenous nucleic acid sequence is stably integrated into the genome of the organism and any of its successive generations.
"Transient transformation" refers to the introduction of a nucleic acid molecule or protein into a cell, performing a function without the stable inheritance of an exogenous gene. In transient transformation, the exogenous nucleic acid sequences are not integrated into the genome.
II. Improved Gene Editing System
The present inventors have surprisingly discovered that accurate deletion from the DSB site in the target sequence to the C nucleotide site may be provided through targeting the CRISPR nuclease to the target sequence in the cell genome by the guide RNA to form a double-stranded break (DSB) , while converting the C in the target sequence or its complementary sequence to U by the cytosine deaminease fused with the CRISPR nuclease, and then through the combined effect of the endogenous or exogenous uracil-DNA glycosylase (UDG) and AP lyase.
In one aspect, the present invention provides a gene editing system for editing a target sequence in the genome of a cell, comprising:
i) a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding the first polypeptide;
ii) a second polypeptide and/or an expression construct comprising a nucleotide sequence encoding the second polypeptide; and
iii) a guide RNA and/or an expression construct comprising a nucleotide sequence encoding the guide RNA,
wherein the first polypeptide comprises cytosine deaminase, CRISPR nuclease and optionally uracil-DNA glycosylase (UDG) , and the second polypeptide comprises AP lyase, wherein the guide RNA is capable of targeting the first polypeptide to the target sequence in the genome of the cell. In some embodiments, the expression construct comprising the nucleotide sequence encoding the first polypeptide, the expression construct comprising the nucleotide sequence encoding the second polypeptide, and/or the expression construct comprising the nucleotide sequence encoding the guide RNA may be different expression constructs, or any two or all of them may be the same expression construct. In some embodiments, the first polypeptide is isolated, and the second polypeptide is isolated polypeptide and/or the guide RNA is isolated RNA.
As used herein, "gene editing system" refers to a combination of components required for gene editing of a genome in a cell. The various components of the system, such as polypeptides, gRNA, etc., may exist independently of each other, or may exist in any combination thereof.
In some embodiments, the gene editing system comprises at least an expression construct comprising a nucleotide sequence encoding the first polypeptide, a nucleotide sequence encoding a self-cleaving peptide, and a nucleotide sequence encoding the second polypeptide ligated in frame. In some embodiments, the nucleotide sequence encoding the first polypeptide, the nucleotide sequence encoding the self-cleavage peptide, and the nucleotide sequence encoding the second polypeptide are arranged in the direction from 5 'to 3' .
As used herein, the "self-cleaving peptide" means a peptide that may achieve self-cleavage within a cell. For example, the self-cleaving peptide may contain a protease recognition site so as to be recognized and specifically cleaved by the protease in the cell.
Alternatively, the self-cleaving peptide may be a 2A polypeptide. The 2A polypeptide is a class of short peptides originated from viruses whose self-cleaving occurs during translation. When two different polypeptides of interest are linked by the 2A polypeptide and expressed in the same reading frame, the two polypeptides of interest are generated at a ratio of nearly 1: 1. The commonly used 2A polypeptides may be P2A from porcine techovirus-1, T2A from Thora asigna virus, E2A from equal rhinitis A virus and F2A from foot-and-mouth disease virus. Among them, P2A has the highest efficiency in cleavage, so it is preferable. Various functional variants of these 2A polypeptides are also known in the art, and may also be used in the present invention. In some embodiments, the self-cleavage peptide is P2A as shown in  SEQ ID NO: 9.
In some embodiments, the gene editing system at least contains an expression construct, which contains a nucleotide sequence encoding the amino acid sequence shown in SEQ ID NO: 10 or SEQ ID NO: 11.
In another aspect, the present invention provides a gene editing system for editing a target sequence in the genome of a cell, comprising:
i) a polypeptide and/or an expression construct comprising a nucleotide sequence encoding the polypeptide; and
i) a guide RNA and/or an expression construct comprising a nucleotide sequence encoding the guide RNA,
wherein the polypeptide comprises cytosine deaminase, CRISPR nuclease, AP lyase and optionally uracil-DNA glycosylase (UDG) , wherein the guide RNA is capable of targeting the polypeptide to the target sequence in the genome of the cell. In some embodiments, the expression construct comprising the nucleotide sequence encoding the polypeptide and the expression construct comprising the nucleotide sequence encoding the guide RNA may be different expression constructs, or may be the same expression construct. In some embodiments, the polypeptide is an isolated polypeptide, and/or the guide RNA is an isolated RNA. In some embodiments, the polypeptide contains the amino acid sequence shown in SEQ ID NO: 10 or SEQ ID NO: 11.
As used herein, the term "CRISPR nuclease" generally refers to nucleases present in the naturally occurring CRISPR systems, as well as modified forms thereof, variants thereof, or catalytically active fragments thereof. The CRISPR nuclease may recognize, bind, and/or cleave the target nucleic acid structure by interacting with the guide RNA. This term encompasses any CRISPR system based nuclease or functional variant capable of gene editing in the cell. In some embodiments, the functional variant retains its double-stranded cleavage activity, i.e., the ability to form a double-stranded break (DSB) in the target sequence.
The CRISPR nuclease used in the gene editing system of the present invention may be selected from, for example, Cas3, Cas8a, Cas5, Cas8b, Cas8c, Cas10d, Cse1, Cse2, Csy1, Csy2, Csy3, GSU0054, Cas10, Csm2, Cmr5, Cas10, Csx11, Csx10, Csf1, Cas9, Csn2, Cas4, Cpf1, C2c1, C2c3 or C2c2 proteins, or functional variants of these nucleases.
In some embodiments, the CRISPR nuclease comprises Cas9 nuclease or a variant thereof. The Cas9 nuclease may be a Cas9 nuclease originated from a different species, such as spCas9 from S. pyogenes. The Cas9 nuclease variant may comprise, for example, a highly specific variant of Cas9 nuclease, such as the Cas9 nuclease variant eSpCas9 (1.0) (K810A/K1003A/R1060A) , eSpCas9 (1.1) (K848A/K1003A/R1060A) of Feng Zhang et al.,  and the Cas9 nuclease variant SpCas9-HF1 (N497A/R661A/Q695A/Q926A) developed by J. Keith Joung et al. In some specific embodiments, the CRISPR nuclease has the amino acid sequence shown in SEQ ID NO: 1. In some specific embodiments, the CRISPR nuclease comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%sequence identity with SEQ ID NO: 1, or having one or more conservative amino acid substitutions relative to SEQ ID NO: 1.
In some embodiments, the CRISPR nuclease may also comprise Cpf1 nuclease or a variant thereof, such as a highly specific variant. The Cpf1 nuclease may be a Cpf1 nuclease originated from different species, such as Cpf1 nuclease originated from Francisella novicida U112, Acidaminococcus sp. BV3L6 and Lachnospiraceae bacterium ND2006.
As used herein, the "cytosine deaminase" refers to a deaminase that may accept single-stranded DNA as a substrate and may catalyze the deamidation of cytidine or deoxycytidine into uracil or deoxyuracil, respectively. Examples of cytosine deaminase include but are not limited to, for example, APOBEC1 deaminase, activation-induced cytidine deaminase (AID) , APOBEC3G, CDA1, human APOBEC3A deaminase, APOBEC3B deaminase (e.g., truncated APOBEC3B deaminase) . In some embodiments, the cytosine deaminase is human APOBEC3A deaminase, for example, having an amino acid sequence shown in SEQ ID NO: 2. In some specific embodiments, the cytosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%sequence identity with SEQ ID NO: 2, or having one or more conservative amino acid substitutions relative to SEQ ID NO: 2. In some embodiments, the cytosine deaminase is truncated APOBEC3B deaminase (APOBEC3Bctd) , for example, the amino acid sequence thereof is shown in SEQ ID NO: 7. In some specific embodiments, the cytosine deaminase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%sequence identity with SEQ ID NO: 7 %, or having one or more conservative amino acid substitutions relative to SEQ ID NO: 7.
As used herein, uracil-DNA glycocasylase (UDG) or uracil-N-glycosylase (UNG) refers to an enzyme capable of recognizing the U base and remove the N-glycosidic bond of the base to form an apurinic or apyrimidinic site. The UDG may originate from different sources, for example from E. coli. In some specific embodiments, the UDG has the amino acid sequence shown in SEQ ID NO: 3. In some specific embodiments, the DUG comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%sequence identity with SEQ ID NO: 3, or having one or more conservative amino acid substitutions relative to SEQ ID NO: 3.
"AP lyase" , AP endonuclease and "apurinic pyrimidine lyase" are used interchangeably  herein, and refer to an enzyme capable of recognizing the apurinic or apyrimidinic site on the nucleic acid and cleaving the nucleic acid. The AP lyase may originate from different sources, for example from E. coli. In some specific embodiments, the AP lyase has the amino acid sequence shown in SEQ ID NO: 4. In some specific embodiments, the AP lyase comprises an amino acid sequence having at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%sequence identity with SEQ ID NO: 4, or having one or more conservative amino acid substitutions relative to SEQ ID NO: 4.
As used herein, "gRNA" and "guide RNA" are used interchangeably, and refer to RNA molecule capable of forming a complex with CRISPR nuclease and capable of targeting the complex to the target sequence due to certain complementarity with the target sequence. For example, in a Cas9-based gene editing system, gRNA typically consists of crRNA and tracrRNA molecules that are partially complementary to form a complex, where the crRNA contains a sequence that is sufficiently complementary to the target sequence so as to hybridize to the target sequence and direct the CRISPR complex (Cas9+crRNA+tracrRNA) to specifically bind to this target sequence. However, it is known in the art that single guide RNA (sgRNA) containing both characteristics of crRNA and tracrRNA can be designed. In the Cpf1-based genome editing system, gRNA typically only consists of mature crRNA molecules, where the sequence contained in the crRNA is sufficiently identical to the target sequence so as to hybridize with the complementary sequence of the target sequence and guide the complex (Cpf1+crRNA) to bind specifically with the target sequence. It is within the ability of those skilled in the art to design a suitable gRNA based on the CRISPR nuclease used and the target sequence to be edited.
As used herein, a "target sequence" is a sequence complementary or identical to (depending on different CRISPR nucleases) the guide sequence having about 20 nucleotides contained in the guide RNA. The guide RNA targets the target sequence by base pairing with the target sequence or its complementary strand.
In some embodiments of the invention, the gene editing results in the deletion of one or more nucleotides in the target sequence, preferably results in the deletion of multiple consecutive nucleotides in the target sequence. The type and length of deletion depends on the position of double-stranded break (DSB) caused by CRISPR nuclease and the number and position of cytosine (C) bases present in the target sequence or its complementary sequence. In some embodiments, the length of the deletion does not exceed the length of the target sequence. For example, the deletion may be deletion of about 1-17 nucleotides, such as 10-17 nucleotides, such as 10, 11, 12, 13, 14, 15, 16, 17 nucleotides.
In some embodiments of the invention, the cytosine deaminase is fused to the N terminal of the CRISPR nuclease.
In some embodiments of the invention, the cytosine deaminase, the CRISPR nuclease, the UDG and/or the AP lyase are directly linked.
In some embodiments of the invention, the cytosine deaminase, the CRISPR nuclease, the UDG and/or the AP lyase are linked with linkers. The linkers may be non-functional amino acid sequences of 1-50 (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or 20-25, 25-50) or more amino acids, without any secondary or higher structure. For example, the linker may be a flexible linker, such as GGGGS, GS, GAP, (GGGGS) x 3, GGS, (GGS) x 7, and the like. In some embodiments, the linker contains the amino acid sequence shown in SEQ ID NO: 8.
In some embodiments of the invention, the polypeptide of the invention further comprises a nuclear localization sequence (NLS) . In general, one or more NLS in the polypeptide should be of sufficient strength to drive the accumulation of the polypeptide in the nucleus of the cell in an amount capable of performing its gene editing function. In general, the strength of nuclear localization activity is determined by the number, position of NLS in the polypeptide, one or more specific NLS used, or a combination of these factors.
In some embodiments of the present invention, the NLS of the polypeptide of the present invention may be at the N terminal and/or C terminal. In some embodiments of the invention, the NLS of the polypeptide of the present invention may be located between the cytosine deaminase, the CRISPR nuclease, the UDG and/or the AP lyase. In some embodiments, the polypeptide comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLS. In some embodiments, the polypeptide comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLS at or close to the N terminal. In some embodiments, the polypeptide comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more NLS at or close to the C terminal. In some embodiments, the polypeptide comprises a combination of the above, such as comprises one or more NLS at the N terminal and one or more NLS at the C terminal. When there is more than one NLS, each may be selected as not dependent on other NLS.
In general, NLS consists of one or more short sequences of positively charged lysine or arginine exposed on the surface of the protein, but other types of NLS are also known. Non-limiting examples of NLS include: KKRKV (nucleotide sequence 5'-AAGAAGAGAAAGGTC-3') , PKKKRKV (nucleotide sequence 5'-CCCAAGAAGAAGAGGAAGGTG-3' or CCAAAGAAGAAGAGGAAGGTT) , or SGGSPKKKRKV (nucleotide sequence 5'-TCGGGGGGGAGCCCAAAGAAGAAGCGGAAGGTG-3') .
In addition, according to the DNA position to be edited, the polypeptide of the present invention may also include other localization sequences, such as a cytoplasmic localization sequence, a chloroplast localization sequence, a mitochondrial localization sequence, and the  like.
In some specific embodiments of the invention, the first polypeptide comprises the amino acid sequence shown in SEQ ID NO: 5. In some specific embodiments of the invention, the second polypeptide comprises the amino acid sequence shown in SEQ ID NO: 6.
In order to get efficient expression in the cell, in some embodiments of the present invention, the nucleotide sequence encoding the polypeptide is codon optimized for the organism from which the cell to be gene-edited originates.
The codon optimization refers to a method for replacing at least one codon in the natural sequence (for example, about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50 or more codons) with a codon used more frequently or most frequently in the gene of the host cell, and maintaining the natural amino acid sequence while modifying the nucleic acid sequence to enhance expression in the host cell of interest. Different species exhibit specific preferences for certain codons of specific amino acids. Codon preference (difference in codon usage between organisms) is often related to the translation efficiency of messenger RNA (mRNA) , which is considered as depending on the nature of the codon being translated and the availability of the specific transfer RNA (tRNA) molecule. The advantages of the selected tRNA in the cell generally reflect the codons most frequently used for peptide synthesis. Therefore, genes may be tailored to the optimal gene expression in a given organism based on codons optimization. The codon usage tables may be easily obtained, for example, in the codon usage database ( "Codon Usage Database" ) available at www. kazusa. orjp/codon/, and these tables may be adjusted and applied in different ways. See Nakamura Y. et al., "Codon usage tabulated from the international DNA sequence databases: status for the year 2000" . Nucl. Acids Res., 28: 292 (2000) .
The organism from which the cell may be genetically edited by the system of the present invention originates is preferably a eukaryote, including but not limited to, mammals such as human, mouse, rat, monkey, dog, pig, sheep, cattle, cat; poultry such as chicken, duck, geese; and plants, including monocotyledons and dicotyledons, such as rice, corn, wheat, sorghum, barley, soybean, peanut, Arabidopsis, etc.
III. Method for modifying the target sequence in the genome of a cell
In another aspect, the present invention provides a method for modifying a target sequence in the genome of a cell, comprising introducing the gene editing system of the present invention into the cell.
In some embodiments of the invention, the modification results in the deletion of one or more nucleotides in the target sequence, preferably results in the deletion of multiple consecutive nucleotides in the target sequence. In the present invention, the type and length of  deletion caused by the deletion depends on the position of double-stranded break (DSB) caused by CRISPR nuclease and the number and position of cytosine (C) bases present in the target sequence or its complementary sequence. In some embodiments, the deletion is within the target sequence. In some embodiments, the modification does not include insertion and/or substitution mutation.
In another aspect, the present invention further provides a method for producing a genetically modified cell, comprising introducing the gene editing system of the present invention into the cell.
In another aspect, the present invention further provides a genetically modified organism, comprising a genetically modified cell or progeny cell thereof produced by the method of the present invention.
In the present invention, the target sequence to be modified may be located at any position in the genome, for example, within a functional gene such as a protein-encoding gene, or for example, may be located in a gene expression regulatory region such as a promoter region or an enhancer region, so as to provide modification of the gene function or modification of gene expression. The modification in the target sequence of the cell may be detected by T7EI, PCR/RE or sequencing methods.
In the method of the present invention, the gene editing system may be introduced into the cell by various methods well known to those skilled in the art.
Methods that may be used to introduce the gene editing system of the present invention into the cell include, but are not limited to: calcium phosphate transfection, protoplast fusion, electroporation, lipofection, microinjection, viral infection (e.g., baculovirus, vaccinia virus, adenovirus, adeno-associated virus, lentivirus and other viruses) , gene gun method, PEG-mediated protoplasts transformation, Agrobacterium-mediated transformation.
The cells that may be genetically edited by the method of the present invention may be derived from, for example, mammals such as human, mouse, rat, monkey, dog, pig, sheep, cattle, and cat; poultry such as chicken, duck, geese; and plants, including monocotyledons and dicotyledons, such as rice, corn, wheat, sorghum, barley, soybean, peanut, Arabidopsis, etc.
In some embodiments, the method of the invention is performed in vitro. For example, the cell is an isolated cell, or a cell in an isolated tissue or organ.
In yet other embodiments, the method of the invention may also be performed in vivo. For example, the cell is a cell in an organism, and the system of the present invention may be introduced into the cell in vivo by, for example, a virus or Agrobacterium-mediated method.
IV. Kit
The present invention further comprises a kit for use in the method of the present invention, the kit comprising the gene editing system of the present invention and instructions for use. The kit generally comprises a label indicating the intended use and/or method of use for the contents of the kit. The term label includes any written or recorded material provided on or with the kit or otherwise provided with the kit.
Examples
Materials and Methods
1. Construction of Vector
In order to construct the pA3A-Cas9-UDG and pJIT163-Ubi-AP vectors, the UDG and AP lyase sequences from E. coli were obtained from NCBI (accession numbers AMB53293.1 and WP_115209270.1, respectively) , and they were codon optimized for rice and genetic synthesized at GENEWIZ, Inc. (Suzhou) . The gene fragment of fusion protein of APOBEC3A, Cas9 and UDG and the gene fragment of AP lyase were finally separately introduced into the pJIT163 vector backbone to obtain pA3A-SpCas9-UDG and pJIT163-Ubi-AP vectors.
In addition, APOBEC3A was fused to the N-terminus of Cas9 with an XTEN linker, UDG was fused to the C-terminus of Cas9, and AP lyase was fused to the C-terminus of UDG using a self-splicing 2A polypeptide (P2A) , and the gene fragments of the fusion protein were finally introduced into the pJIT163 vector backbone to construct the transient transformation vector AFID-3. Then the APOBEC3 in AFID-3 was replaced by APOBEC3Bctd (originated from human APOBC3B sequence (accession number is NM_004900.5) , which is truncated to obtain the C-terminal functional catalytic domain of APOBEC3B (APOBEC3Bctd) ) to construct eAFID-3. In addition, the fusion gene fragments with APOBEC3A were integrated into the pHUE411 skeleton with sgRNA expression components by Gibson method to construct a stable transformation vector pH-AFID-3, which were used for the genetic transformation of rice mediated by Agrobacterium infection.
For key enzyme genes in the synthesis of wheat seed coat pigments (flavanone-3-hydroxylase gene, TaF3H-A1/B1/D1; leucoanthocyanidin reductase gene TaLAR-A1/B1/D1) and its regulatory genes (TaMYB10-A1/B1/D1) , plasma membrane kinase associated with disease resistance of a plant (TaPMK-A1/B1/D1) , vernalization associated genes (TaVRN1-A1/B1/D1) and gibberellin stimulation regulatory factor genes associated with growth and development (TaGASR6-A1/B1/D1) , gene editing target site sequences (sgF3HT4, sgLART4, sgMYBT2, sgPMKT1, sgVRN1T1 and sgGS6T2, see Table 1 for detailed sequence) were obtained respectively, then the sgRNA target site primers were synthesized. Then they were annealed and ligated to the pTaU6-sgRNA vector by using T4 ligase, to obtain pTaU6-sgF3HT4, pTaU6-sgLART4, pTaU6-sgMYBT2, PTaU6-sgPMKT1,  pTaU6-sgVRN1T1 and pTaU6-sgGS6T2 vectors respectively.
Table 1. sgRNA target primers
Figure PCTCN2020088887-appb-000003
9 endogenous targets were selected from 7 rice genes (OsAAT, OsACC, OsCDC48, OsNRT1.1B, OsPDS, OsGRF1, and OsSPL14/OsIPA1) to construct pOsU3-sgRNA vectors, and 4 endogenous targets were selected from 4 wheat genes (TaF3H, TaGASR6, TaMYB10 and TamiR396) to construct pTaU6-sgRNA vector. See Table 2 for the sequence of all target sites. The sgRNA targeting site primers were synthesized. Then they were annealed and ligated to the sgRNA vectors by using T4 ligase.
Table 2. sgRNA target sites and sequences
sgRNA Target sequence
sgOsAAT CAAGGATCCCAGCCCCGTGAAGG
sgOsACC TCCACAGCTATCACACCCACTGG
sgOsCDC48-T1 GACCAGCCAGCGTCTGGCGCCGG
sgOsCDC48-T2 CCAGATATCATTGACCCTGCCTT
sgOsNRT1.1B ACTAGATATCTAAACCATTAAGG
sgOsPDS GTTGGTCTTTGCTCCTGCAGAGG
sgOsSPL14 CCAGGCGATCGGATCTCCGGTGG
sgOsGRF1-miRT GAACCGTTCAAGAAAGCCTGTGG
sgOsIPA1-miRT CTCTTCTGTCAACCCAGCCATGG
sgTaF3H CCGAGATCCGGGACCGCGTGGCG
sgTaGASR6 CCCGGCACCGCCGGCAACGAGGA
sgTaMYB10 TGGCTCAACTACCTCCGGCCGGG
sgTamiR396 ACTGTGAACTCGCGGGGATGGGG
The PAM sequence is shown in bold.
2. Isolation and transformation of protoplasts (4 biological replicates)
2.1 Rice or wheat seedling cultivation
The rice seeds of Zhonghua 11 were rinsed with 75%ethanol for 1 minute, then treated with 4%sodium hypochlorite for 30 minutes, and washed with sterile water more than 5 times. Place on M6 medium for 3-4 weeks, 26 ℃, protected from light.
The wheat seeds were potted and planted in the cultivation room, and cultured for about 1-2 weeks (about 10 days) at a temperature of 25 ± 2 ℃, light intensity of 1000 Lx, and light exposure of 14-16 h/d.
2.2 Isolation of Protoplast
(1) Young leaf of rice or wheat was taken, and was cut at the centre part into filaments of 0.5-1 mm with a blade. They were then placed and treated in 0.6 M Mannitol solution for 10 min in the dark, and were then filtered with a filter screen and placed into 50 mL enzymolysis solution (filtered with 0.45 μm filter membrane) , which was then evacuated (at a pressure of about 15 Kpa) for 30 min. After removal, they were placed on a shaker (10 rpm) for enzymolysis for 5 h at room temperature. (2) 30-50 mL W5 was added to dilute the enzymolysis product and the enzymolysis solution was filtered with 75 μm nylon filter membrane in a round bottom centrifuge tube (50 mL) . (3) At 23 ℃, 100 g (rcf) , it was lifted for 3 times and lowered for 3 times and centrifuged for 3 min, and the supernatant was then discarded. (4) It was then suspended gently with 10 mL W5 and placed on ice for 30 min. The protoplasts were gradually settled and the supernatant was discarded. (5) It was then suspended by adding an appropriate amount of MMG, and then placed on ice until being transformed.
2.3 Protoplast Transformation
(1) 10 μg vectors to be transformed were added respectively to a 2 mL centrifuge tube, and 200 μL of protoplasts was drawn with a sharpened pipette tip after mixing well, which was flicked gently to mix well. It was then added with 250 μL of PEG4000 solution immediately and mixed by flicking gently, and was then induced to conversion at room temperature in the dark for 20-30 min. (2) 800 μL W5 was added (at room temperature) and mixed gently by inverting, and at 100 g (rcf) , lifted for 3 time and lowered for 3 times, and centrifuged for 3 min. The supernatant was then discarded. (3) It was then added with 1 mL of W5 and mixed by inverting gently, and was gently transferred to a 6-well plate, which was added with 1 mL of W5 in advance. The 6-well plate was wrapped with tin foil, and incubated at 23 ℃ in the dark for 48 h.
3. Extraction of protoplast DNA and amplicon sequencing analysis
3.1 Extraction of protoplast DNA
The protoplast was collected in a 2 mL centrifuge tube, the protoplast DNA (about 30 μL) was extracted by using the CTAB method, with its concentration (30-60 ng/μL) measured by using a NanoDrop ultramicro spectrophotometer, and then stored at -20 ℃.
3.2 Amplicon sequencing analysis
(1) PCR amplification was performed to the protoplast DNA template using genome universal primers. The 20 μL amplification system contains 4 μL 5 × Fastpfu buffer, 1.6 μL dNTPs (2.5 mM) , 0.4 μL Forward primer (10 μM) , 0.4 μL Reverse primer (10 μM) , 0.4 μL FastPfu polymerase (2.5 U/μL) , and 2 μL DNA template (about 60 ng) . Amplification conditions: pre-denaturation at 95 ℃ for 5 min; denaturation at 95 ℃ for 30 s, annealing at 50-64 ℃ for 30 s, extension at 72 ℃ for 30 s for 35 cycles; fully extension at 72 ℃ for 5 min, and store at 12 ℃;
(2) The above amplification product was diluted by 10-fold, and 1 μL was used as the template for the second round of PCR amplification. The amplification primer was a sequencing primer containing Barcode. The 50 μL amplification system contains 10 μL 5 × Fastpfu buffer, 4 μL dNTPs (2.5 mM) , 1 μL Forward primer (10 μM) , 1 μL Reverse primer (10 μM) , 1 μL FastPfu polymerase (2.5 U/μL) , and 1 μL DNA template. The amplification conditions are as described above, and the number of amplification cycles is 38 cycles.
(3) The PCR products were separated on 2%agarose gel electrophoresis, and AxyPrepTM DNA Gel Extraction kit was used to recover the target fragments. The recovered products were quantitatively analyzed by NanoDrop ultra-micro spectrophotometer. 100 ng of the recovered products were taken respectively and mixed and sent to GENEWIZ, Inc. for amplicon sequencing library construction and amplicon sequencing analysis.
(4) After the sequencing was done, the original data was split according to the sequencing primers, and by using the sgRNA sequence and its flanking sequence as the reference sequences, and the WT as the control, the type and efficiency of gene editing on the different gene targeting sites in the 4 test replicates was comparatively analyzed.
Example 1. Construction of gene editing system (ACD) for precise short fragment deletion
The single-base editing system has been established in 2016 (Komor et al., 2016; Ma et al., 2016; Nishida et al., 2016) . The system uses nCas9 (D10A) to guide the action of cytosine deaminase on the non-complementary strand of a DNA target site, and deaminate the cytosine (C) in a specific region into uracil (U) . The uracil (U) will be replaced by thymine (T) in the process of DNA replication, thus achieving accurate single-base replacement of C-to-T. In the repair process of animal and plant organism, the uracil-DNA glycocasylase (UDG) will preferentially recognize the U base and remove the N-glycosidic bond of the base to form an  apurinic or apyrimidinic site (AP site) , and then repair the U base to the original C base under the action of AP lyase through base excision repair. Therefore, uracil-DNA glycocasylase inhibitor (UGI) is often introduced in a single-base editing system to improve C-to-T editing efficiency.
The inventors have surprisingly found that replacing nCas9 of the fusion protein in the single-base editing system with wild-type Cas9 allows the fusion protein to regain the ability to break the DNA double strand, while replacing UGI with UDG to recognize the U base and excise its glycosidic bond to form the AP site, which in turn is recognized by AP lyase, excising the glycosylated U base, which can eventually achieve efficient, accurate and predictable deletion of short fragments in cells. The inventor thus constructed an efficient, accurate and predictable short fragment deletion system (APOBEC3A Coupled Deletion, ACD) consisting of Cas9, APOBEC3A, UDG and AP lyase, where the Cas9 mediates the generation of DSB at the DNA target site, while the APOBEC3A, UDG, and AP lyase mediate multiple gaps at the C base of the non-complementary strand upstream of the DSB, resulting in the deletion of single-stranded DNA fragments on the non-complementary strand, leading to the formation of short double-stranded DNA fragments under the action of DNA repair of an organism (FIG. 1) . Without being bound by any theory, the APOBEC3A may efficiently mediate the C-to-U replacement on the non-targeting strand upstream of the DSB, while the UDG and AP lyase mediate the formation of gaps at the U base, resulting in deletion of single stranded DNA fragments on the non-targeting strand. Then, a 5' overhanging end was formed on the targeting strand. The latter is first recognized and excised by the Artemis-DNA-PK complex during the repair of the non-homologous end of the organism, and further forms short fragment deleted double-stranded DNA at the action of the junctional complex consisting of DNA Ligase IV, XRCC4, XRCC4 analogues (XLF) and their paralogs (PAXX) (Chang et al. 2017) .
The efficiency of generating insertion and deletion by SpCas9 and ACD were compared and analyzed at the targeted editing sites of sgF3HT4, sgLART4, sgMYBT2, sgPMKT1, sgVRN1T1 and sgGS6T2. The result showed that the Insertion mutation rate generated by the ACD system decreased significantly compared to that by SpCas9, while the Deletion mutation generation rate increased significantly, and the Deletion mutation generation rate was 1.5-23.6 times of that of SpCas9, which fully demonstrated the high efficiency of the ACD system (FIG. 2) .
Example 2. Analysis of the types of deletions generated by the ACD system
Sequence analysis was carried out for the Deletion mutations generated by the ACD system at different target sites (Figure 3-8) . Except for several types, most mutation types  were as expected, and most mutation were Deletion between bases at which APOBEC3A takes effect (NGG (PAM) corresponds to the C base; CCN (PAM) corresponds to the G base) and Cas9 cleavage site. However, as Cas9 has an asymmetry in cutting the double strand, Cas9 will cleave between positions 3-4 or 4-5 near the PAM. In addition, the bases on the non-targeting strand at which APOBEC3A takes effect will use the target strand as a template to form 1-2 bases paired with the complementary strand during the repair process. Therefore, it may also introduce 1-2 bases complementary and paired with the target strand.
The efficiency of ACD system to generate Insertion is very low, but the efficiency of generating Deletion is very high, and Deletion only occurs within the 20-bp protospacer sequence. In these target sites, most of the Deletions have a length of 10-17nt, and different Deletion types may be stably detected in more than 3 biological replicate experiments, which is impossible by SpCas9 and other tools. It also fully reflects the accuracy and predictability of the ACD system.
Example 3: Construction of AFID (APOBEC-Cas9 Fusion-Induced Deletion) system
The present invention selects the human APOBEC3A with high deamination activity and wide deamination window to construct the AFID-3 system, and screens a APOBEC3Bctd with higher deamination activity and narrow window to replace the APOBEC3A to construct the eAFID-3 system (Figure 9 and Figure 10) . Comparative analysis of the deletion efficiencies of Cas9, AFID-3, and eAFID-3 on rice and wheat endogenous gene targets revealed that the efficiency of generating deletion mutations via AFID-3 and eAFID-3 increased significantly compared to Cas9. The average deletion mutation rate was 2.2 times and 2.6 times than that of Cas9, which fully demonstrated the high efficiency of the AFID system.
Example 4: Analysis of mutation types produced by AFID system
The types and proportions of mutations generated by AFID-3 and eAFID-3 on different endogenous targets were analyzed. The results showed that the length of the deleted fragment mainly depends on the position of the deaminated C nucleotide and its deamination activity. At the target site with strong deamination activity, the mutation type is mainly deletion mutation; but at the target site with weak deamination activity, a certain percentage of insertion mutations will appear. A large proportion of the mutation types are predictable polynucleotide deletion mutations between the C nucleotide where the deaminase works and the Cas9 cleavage site (the cleavage of the double-strand by Cas9 has an asymmetry, resulting in the Cas9 cleavage site appearing between positions 3-4 or between positions 4-5 near the PAM end) (see Figures 13 and 14) . In addition, it was also found that during the NHEJ repair  process, there is a templated insertion of C nucleotides at the deaminated C nucleotides (Figures 13 and 14) . This is mainly because, in the process of excision of the 5' protruding terminus of the target strand, DNA polymerase can easily perform base repair on the non-target strand by using the 5' protruding terminus as templates.
In order to detect the preference of AFID-3 and eAFID-3 for the C base at which the deletion of the predictable fragment starts, the proportion of deletion mutations between AC, TC, CC, and GC motifs and DSBs of different targets were counted. The result showed that AFID-3 can mediate predictable deletion mutations from AC, TC, CC and GC motifs to DSB; eAFID-3 exhibits enhanced TC base preference compared to AFID-3, wherein most of the predictable deletion mutations are deletion mutations from the TC motif to the DSB (Figure 15). In addition, the types of required predictable deletion mutations in the reading frame and the proportions thereof generating by Cas9, AFID-3 and eAFID-3 at the miR396h binding site of the rice OsGRF1 gene and the miR156 binding site of the OsIPA1 gene were analyzed. The result showed that it is almost difficult for Cas9 to generate the predictable deletion mutation in reading frame; while AFID-3 and eAFID-3 can produce this predictable deletion mutation, but the generation proportion of eAFID-3 is significantly higher than that of AFID-3 (Figure 16) . This also fully reflects the accuracy and predictability of the AFID system.
Example 5. AFID system mediates predictable polynucleotide deletion mutations in plants
In order to determine whether the AFID system can mediate predictable polynucleotide deletion mutations in plants, two targets (TamiR396 and TaGASR6) were selected on wheat, and Cas9, AFID-3 were delivered into immature wheat embryos with corresponding sgRNA by gene gun bombardment; three targets (OsCDC48-T2, OsSPL14, and OsPDS) were selected on rice to construct the corresponding pH-Cas9 and pH-AFID-3 Agrobacterium vectors (Figure 17) and rice callus was transformed by Agrobacterium infection. The result showed that among the tested targets, Cas9 did not produce predictable polynucleotide deletion mutants, and the mutation types were mainly 1-bp insertion and 1-3 bp deletion; while AFID-3 produced mostly polynucleotide deletion mutants, the proportion of those predictable accounted for 25.0-55.5% (Table 3, Figure 18) . It can be seen from this that the AFID system can mediate predictable polynucleotide deletion mutations in plants.
Table 3 Statistics of predictable deletion mutants in plant generated by AFID-3
Figure PCTCN2020088887-appb-000004
Figure PCTCN2020088887-appb-000005
Sequence List
SEQ ID NO: 1 SpCas9
Figure PCTCN2020088887-appb-000006
SEQ ID NO: 2 APOBEC3A
Figure PCTCN2020088887-appb-000007
SEQ ID NO: 3 UDG
Figure PCTCN2020088887-appb-000008
SEQ ID NO: 4 AP lyase
Figure PCTCN2020088887-appb-000009
SEQ ID NO: 5 Exemplary first polypeptide
Figure PCTCN2020088887-appb-000010
Figure PCTCN2020088887-appb-000011
SEQ ID NO: 6 Exemplary second polypeptide
Figure PCTCN2020088887-appb-000012
SEQ ID NO: 7 APOBEC3Bctd
Figure PCTCN2020088887-appb-000013
SEQ ID NO: 8 XTEN linker
Figure PCTCN2020088887-appb-000014
SEQ ID NO: 9 P2A
Figure PCTCN2020088887-appb-000015
SEQ ID NO: 10 AFID-3
Figure PCTCN2020088887-appb-000016
Figure PCTCN2020088887-appb-000017
SEQ ID NO: 11 eAFID-3
Figure PCTCN2020088887-appb-000018
Figure PCTCN2020088887-appb-000019

Claims (11)

  1. A gene editing system for editing a target sequence in the genome of a cell, comprising:
    i) a first polypeptide and/or an expression construct comprising a nucleotide sequence encoding the first polypeptide;
    ii) a second polypeptide and/or an expression construct comprising a nucleotide sequence encoding the second polypeptide; and
    iii) a guide RNA and/or an expression construct comprising a nucleotide sequence encoding the guide RNA,
    wherein the first polypeptide comprises a CRISPR nuclease, a cytosine deaminase, and optionally an uracil-DNA glycosylase (UDG) , wherein the second polypeptide comprises AP lyase, wherein the guide RNA is capable of targeting the first polypeptide to the target sequence in the genome of the cell.
  2. A gene editing system for editing a target sequence in the genome of a cell, comprising:
    i) a polypeptide and/or an expression construct comprising a nucleotide sequence encoding the polypeptide; and
    ii) a guide RNA and/or an expression construct comprising a nucleotide sequence encoding the guide RNA,
    wherein the polypeptide comprises a CRISPR nuclease, a cytosine deaminase, an AP lyase and optionally an uracil-DNA glycosylase (UDG) , wherein the guide RNA is capable of targeting the polypeptide to the target sequence in the genome of the cell.
  3. The gene editing system of claim 1 or 2, wherein the CRISPR nuclease is a Cas9 nuclease, such as spCas9.
  4. The gene editing system of claim 1 or 2, wherein the cytosine deaminase is APOBEC3A deaminase.
  5. The gene editing system of claim 1 or 2, wherein the UDG comprises the amino acid sequence shown in SEQ ID NO. 3.
  6. The gene editing system of claim 1 or 2, wherein the AP lyase comprises the amino acid sequence shown in SEQ ID NO. 4.
  7. The gene editing system of claim 1, wherein the first polypeptide comprises the amino acid sequence shown in SEQ ID NO. 5, and the second polypeptide comprises the amino acid sequence shown in SEQ ID NO. 6.
  8. A method of producing a genetically modified cell, comprising introducing the gene editing system of any one of claims 1-7 into the cell.
  9. The method of claim 8, wherein the genetic modification is deletion of one or more nucleotides in the target sequence, preferably deletion of multiple consecutive nucleotides.
  10. The method of claim 8 or 9, wherein the cell is derived from, for example, mammals such as human, mouse, rat, monkey, dog, pig, sheep, cattle, cat; poultry such as chicken, duck, geese; and plants, including monocotyledons and dicotyledons, such as rice, corn, wheat, sorghum, barley, soybean, peanut, Arabidopsis.
  11. A kit comprising the gene editing system of any one of claims 1-7, and instruction for use.
PCT/CN2020/088887 2019-05-07 2020-05-07 Improved gene editing system Ceased WO2020224611A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN202080034110.3A CN114008207B (en) 2019-05-07 2020-05-07 Improved gene editing system
CN202510113127.2A CN119932089A (en) 2019-05-07 2020-05-07 Improved gene editing system
EP20802707.8A EP3966335A4 (en) 2019-05-07 2020-05-07 Improved gene editing system
US17/609,640 US20220251580A1 (en) 2019-05-07 2020-05-07 Improved gene editing system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910375061.9 2019-05-07
CN201910375061 2019-05-07

Publications (1)

Publication Number Publication Date
WO2020224611A1 true WO2020224611A1 (en) 2020-11-12

Family

ID=73051415

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/088887 Ceased WO2020224611A1 (en) 2019-05-07 2020-05-07 Improved gene editing system

Country Status (5)

Country Link
US (1) US20220251580A1 (en)
EP (1) EP3966335A4 (en)
CN (2) CN119932089A (en)
AR (1) AR123675A1 (en)
WO (1) WO2020224611A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114134149A (en) * 2021-11-30 2022-03-04 中国农业科学院深圳农业基因组研究所 gRNA sequence for rapidly increasing anthocyanin content of crops and application thereof
CN114214329A (en) * 2021-12-21 2022-03-22 中国农业科学院深圳农业基因组研究所 gRNA sequence for rapidly improving bud resistance on ear and application thereof
WO2022188816A1 (en) * 2021-03-09 2022-09-15 苏州齐禾生科生物科技有限公司 Improved cg base editing system
CN115261363B (en) * 2021-04-29 2024-01-30 中国科学院分子植物科学卓越创新中心 Method for measuring RNA deaminase activity of APOBEC3A and RNA high-activity APOBEC3A variant

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119530192A (en) * 2023-09-08 2025-02-28 北京齐禾生科生物科技有限公司 A modular gene editing tool and its application
CN120813698A (en) * 2023-09-22 2025-10-17 北京大学现代农业研究院 Improved genome editing method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101613730A (en) * 2008-06-26 2009-12-30 霍夫曼-拉罗奇有限公司 Improved method for preventing carryover contamination in nucleic acid amplification techniques
CN108070611A (en) * 2016-11-14 2018-05-25 中国科学院遗传与发育生物学研究所 Alkaloid edit methods
WO2019023680A1 (en) * 2017-07-28 2019-01-31 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (pace)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6365355B1 (en) * 2000-03-28 2002-04-02 The Regents Of The University Of California Chimeric proteins for detection and quantitation of DNA mutations, DNA sequence variations, DNA damage and DNA mismatches
KR100882711B1 (en) * 2007-03-12 2009-02-06 성균관대학교산학협력단 Uracil-DNA Glycosylase from Cyclobacter Spis HB147 Strains and Uses thereof
IL310721B2 (en) * 2015-10-23 2025-11-01 Harvard College Nucleobase editors and their uses
US20210322577A1 (en) * 2017-03-03 2021-10-21 Flagship Pioneering Innovations V, Inc. Methods and systems for modifying dna
US11542496B2 (en) * 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11884947B2 (en) * 2018-02-23 2024-01-30 Shanghaitech University Fusion proteins for base editing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101613730A (en) * 2008-06-26 2009-12-30 霍夫曼-拉罗奇有限公司 Improved method for preventing carryover contamination in nucleic acid amplification techniques
CN108070611A (en) * 2016-11-14 2018-05-25 中国科学院遗传与发育生物学研究所 Alkaloid edit methods
WO2019023680A1 (en) * 2017-07-28 2019-01-31 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (pace)

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MATTHEW A. COELHO ET AL.: "BE -FLARE: a fluorescent reporter of base editing activity reveals editing characteristics of APOBEC3A and APOBEC3B", BMC BIOLOGY, vol. 16, no. 1, 28 December 2018 (2018-12-28), pages 1 - 11, XP055751951, ISSN: 1741-7007, DOI: 10.1186/s12915-018-0617-1 *
See also references of EP3966335A4 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022188816A1 (en) * 2021-03-09 2022-09-15 苏州齐禾生科生物科技有限公司 Improved cg base editing system
CN115261363B (en) * 2021-04-29 2024-01-30 中国科学院分子植物科学卓越创新中心 Method for measuring RNA deaminase activity of APOBEC3A and RNA high-activity APOBEC3A variant
CN114134149A (en) * 2021-11-30 2022-03-04 中国农业科学院深圳农业基因组研究所 gRNA sequence for rapidly increasing anthocyanin content of crops and application thereof
CN114134149B (en) * 2021-11-30 2023-01-10 中国农业科学院深圳农业基因组研究所 The gRNA sequence and its application to rapidly increase the anthocyanin content of crops
CN114214329A (en) * 2021-12-21 2022-03-22 中国农业科学院深圳农业基因组研究所 gRNA sequence for rapidly improving bud resistance on ear and application thereof
CN114214329B (en) * 2021-12-21 2022-12-27 中国农业科学院深圳农业基因组研究所 gRNA sequence for rapidly improving bud resistance on ear and application thereof

Also Published As

Publication number Publication date
CN114008207A (en) 2022-02-01
EP3966335A4 (en) 2023-06-28
CN119932089A (en) 2025-05-06
US20220251580A1 (en) 2022-08-11
CN114008207B (en) 2025-02-18
EP3966335A1 (en) 2022-03-16
AR123675A1 (en) 2023-01-04

Similar Documents

Publication Publication Date Title
WO2020224611A1 (en) Improved gene editing system
JP7700079B2 (en) Systems and methods for genome editing
WO2019120310A1 (en) Base editing system and method based on cpf1 protein
CN115427564B (en) Improved cytosine base editing system
CN113373130A (en) Cas12 protein, gene editing system containing Cas12 protein and application
CN111742051A (en) Extended single guide RNA and its use
US20210095271A1 (en) System and method for genome editing
CN117187220A (en) Adenine deaminase and its use in base editing
WO2021226369A1 (en) Enzymes with ruvc domains
CA3225082A1 (en) Enzymes with ruvc domains
JP7361109B2 (en) Systems and methods for C2c1 nuclease-based genome editing
CN117264998A (en) Dual-function genome editing systems and their uses
CN113025597A (en) Improved genome editing system
JP2024501892A (en) Novel nucleic acid-guided nuclease
US20250002881A1 (en) Class ii, type v crispr systems
CN115109798B (en) Improved CG base editing system
WO2025061198A1 (en) Improved genome editing method
WO2021098709A1 (en) Gene editing system derived from flavobacteria

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20802707

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020802707

Country of ref document: EP

Effective date: 20211207

WWG Wipo information: grant in national office

Ref document number: 202080034110.3

Country of ref document: CN