WO2024254224A2

WO2024254224A2 - Analytical methods for adeno-associated virus vector genomes and uses thereof

Info

Publication number: WO2024254224A2
Application number: PCT/US2024/032678
Authority: WO
Inventors: Michael Xavier Doss JESUDOSS; Erno Pungor; Salvatore J. ORLANDO
Original assignee: Biomarin Pharmaceutical Inc
Current assignee: Biomarin Pharmaceutical Inc
Priority date: 2023-06-08
Filing date: 2024-06-06
Publication date: 2024-12-12
Anticipated expiration: 2025-12-08
Also published as: WO2024254224A3

Abstract

Methods of analyzing adeno associated virus (AAV) genomes and use thereof are provided herein. The methods and uses include the methods of preparing an AAV genomes for sequenc-ing, and methods of analyzing and characterizing AAV genomes. For example, the AAV analy-sis methods include steps of aligning reference nucleotide sequences to sequence reads of AAV genomes; determining alignment scores of each reference nucleotide sequence to the sequence reads and identifying strand types of the AAV genomes based on the alignment scores; determin-ing first and second coordinates within the sequence reads based on alignments with the refer-ence nucleotide sequences used to identify strand type; and analyzing sequences adjacent to the first and second coordinates to identify ITR configurations of the AAV genomes.

Description

ANALYTICAL METHODS FOR ADENO-ASSOCIATED VIRUS VECTOR GENOMES

AND USES THEREOF

FIELD OF THE INVENTION

[001] The present invention relates to a method for analyzing adeno-associated virus (AAV) genomes to identify different parameters of the genomes. For example, the method can identify truncation hotspots, which can negatively affect vector efficacy and production, using double stranded, 3’ITR extended recombinant AAV (rAAV) genome sequences. The therapeutic gene expression cassettes using the disclosed method is also useful for improving the therapeutic efficiency of drug products. For example, pre-mature truncation of the therapeutic genes can be reduced or eliminated since extensive secondary and/or inverted terminal repeat (ITR)-like palindromic structures can be identified using this method.

BACKGROUND OF THE INVENTION

[002] AAV are non-enveloped viruses with a single-stranded deoxyribonucleic acid (DNA) genome with at least one inverted terminal repeat (ITR) at the termini. For example, the AAV2 serotype can have a single-stranded DNA genome of approximately 4.7-kilobases (kb), with two 145 nucleotide-long inverted terminal repeats (ITRs) at the termini. The virus does not encode a polymerase and therefore relies on cellular polymerases for genome replication. The ITRs flank the two viral genes - rep (replication) and cap (capsid), encoding non- structural and structural proteins, respectively. The Rep gene, through the use of two promoters and alternative splicing, encodes four regulatory proteins that are dubbed Rep78, Rep68, Rep52 and Rep40. These proteins are involved in AAV genome replication and packaging. The Cap gene, through alternative splicing and initiation of translation, gives rise to three capsid proteins, VP1 (virion protein 1), VP2 and VP3. The molecular weight of VP1, VP2, and VP3 for AAV2 is 87, 72 and 62 kDa, respectively. These capsid proteins assemble into a near-spherical protein shell of 60 subunits.

[003] AAV are unable to replicate on their own and require co-infection with a helper virus, typically adenovirus or herpesvirus. When AAV infects a human cell alone, its gene expression program is auto-repressed and latent infection of the cell occurs. However, when a latently infected cell is co-infected with a helper virus, such as adenovirus or herpes simplex virus, AAV gene expression is activated leading to excision of the provirus DNA from the host cell chromosome, followed by replication and packaging of the viral genome. [004] Due to their relative safety and long-term gene expression, different serotypes of recombinant AAV (rAAV) are currently used in various gene therapy programs, both at non-clinical and clinical stages. However, robust bioinformatic analytics that aids in optimizing of AAV and AAV production design and development are still a developing field. For example, prior approaches for determining the presence of truncated genomes and strategies to distinguish flip/flop configurations are lacking and not well defined. To this end, there is a significant need for development of precise analytical methods for characterizing AAV genomes.

SUMMARY OF THE INVENTION

[005] The present invention addresses the limitations in the current state of the art for analyzing adeno-associated virus (AAV) or recombinant adeno-associated virus (rAAV) vector ge- nome(s) (vg). The present invention is directed to methods and processes of analyzing and characterizing of the vg of AAV or rAAV as well as methods of preparing AAV vg or rAAV vg for sequencing. These methods and processes address issues with gathering information and profiles of vg of produced AAV or rAAV that provide for development improved AAV/rAAV vector designs that in turn allow for the production of AAV/rAAV with improved properties (e.g., higher infectivity, improved safety profile, etc.).

[006] Embodiments of methods of preparing an AAV vg or rAAV vg for sequencing are also disclosed.

[007] In various embodiments, methods of preparing a genome from an AAV particle for sequencing are disclosed. The methods comprise the step of isolating a genome from an AAV particle. The genome comprises an inverted terminal repeat (ITR) that is formed into a step-loop structure with complementary 3’ and 5’ stem arms and a first polynucleotide that is single stranded and linked to a 5’ end of the 5’ stem arm. The methods further comprise the step of incubating a deoxyribonucleic acid (DNA) polymerase with the isolated genome to synthesize a second polynucleotide extending from a 3’ end of the 3’ stem arm. The second polynucleotide is complementary to the first polynucleotide and forms a double stranded polynucleotide with the first polynucleotide. The methods also comprise the step of linking an adapter nucleotide sequence to a 5’ end of the first polynucleotide and a 3’ end of the second polynucleotide.

[008] Embodiments of methods and processes of analyzing and characterizing of an AAV vg or rAAV vg are disclosed. [009] In various embodiments, methods of analyzing AAV genomes are disclosed. The methods comprise the step of aligning reference nucleotide sequences to sequence reads of AAV genomes in which each reference nucleotide sequence comprises predetermined sequences of 5’ and 3’ ITRs. The methods further comprise determining alignment scores of each reference nucleotide sequence to the sequence reads and identifying strand types of the AAV genomes based on the alignment scores. The strand types are selected from at least one of a positive nucleotide strand, negative nucleotide strand, and polynucleotide contaminate. The methods further comprise determining first and second coordinates within the sequence reads based on alignments with the reference nucleotide sequences used to identify strand type. The first coordinate identifies the 5’ end of the AAV genomes and the second coordinate identifies the 3’ end of the AAV genomes. The methods also comprise analyzing sequences adjacent to the first and second coordinates to identify ITR configurations of the AAV genomes.

[0010] Further scope of applicability of the present invention will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to one of ordinary skill in the art from this detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011] The present invention will become more fully understood from the detailed description given below and the accompanying drawings that are given by way of illustration only and are thus not limitative of the present invention.

[0012] Figure l is a flowchart demonstrating the general bioinformatics workflow of various embodiments. In the flow-chart, a sequence read, is obtained, and mapped to a reference. Next, the AAV and non-AAV DNA molecules are classified. The AAV molecules are classified as being plus or minus-stranded. Each of the plus or minus-stranded genomes are then identified as having a flip or flop configuration.

[0013] Figure 2A shows the plus and minus-stranded rAAV genomes after the second strand synthesis. As shown, the order of the features between the plus and minus-strands is different. [0014] Figure 2B depicts a 2x reference sequence length without the 3’ ITR extension.

[0015] Figure 3 depicts generation of a sequence read to map against the plasmid sequence. The positive and negative stranded rAAV genomes differ in the orientation of “Promoter - polyA -polyA-Promoter” versus “polyA-Promoter-Promoter-polyA” orientations for a given rAAV vector genome construct.

[0016] Figures 4A and 4B show the reference sequences for a given construct. Figure 4A shows the plus stranded reference sequences. Figure 4B shows the minus stranded reference sequences.

[0017] Figures 5 A and 5B show the difference in alignments of reference sequences to plus and minus-stranded rAAV vg. Figure 5A shows the alignments for a plus stranded reference sequence and plus stranded rAAV vg. Figure 5B shows the alignments for a minus stranded reference sequence and minus stranded rAAV vg. When the read is mapped to the wrong-stranded reference genome, its alignment will be split into 2 halves due to the feature-orientation difference and its alignment score will be nearly half compared to the reads correctly mapped to the respective stranded reference. In this example, the plus-stranded read aligned to plus-stranded genome reference gives optimal alignment, whereas the plus-stranded read aligned to the minus- stranded genome reference gives 2 halves of alignment metrics. This way, the plus and minus stranded information of a given AAV DNA can be verified.

[0018] Figures 6A and 6B show the difference between the alignments of plus and minus stranded genome references. In figure 6A, the plus-stranded read aligned to the plus-stranded genome reference generates an exemplary value of a 9275 base pair (bp) fragment. In figure 6B, the plus-stranded read aligned to the minus-stranded genome reference gives 2 halves of alignment metrics and generates exemplary values of a 4700 bp fragment and a 4699 bp fragment.

[0019] Figure 7 highlights the differences in the sequence alignments of the 5’ and 3’ ITRs in the flip and flop configurations. Flip/flop configurations were determined directly from the mapped ITR sequence information of the respective plus/minus mapped read at the flip/flop defining base positions. The hyphens between bases indicate the base mismatch and these mismatched base positions are the flip/flop defining regions in the bioinformatic analysis of various embodiments. A mismatched base is given a penalty of 0, while a matching base is given a score of 1. The score having a higher numerical value is assigned to the read of interest. [0020] Figures 8A, 8B, and 8C depict an exemplary analysis of an rAAV production using the bioinformatic analysis of various embodiments. Figure 8A shows the metrics of AAV DNA and residual DNA including the percentage of AAV DNA and percentage of total non-AAV DNA. Figure 8B shows exemplary metrics of the plus/minus and flip/flop configuration of AAV genomes. Figure 8C shows exemplary descriptive statistical analysis of AAV genomes by strand including the mean and standard deviation.

[0021] Figure 9 shows exemplary truncation hotspots in rAAV vg. Each AAV genome is plotted in a single dot plot with the density plot in a strand-specific manner, or including both the plus or minus-stranded AAV genomes to determine the major truncation hotspots and the respective genome coordinates/ sequence motif associated with the truncation event(s).

[0022] Figures 10A, 10B, 10C, 10D, 10E, and 10F depict different in silica secondary structure predictions of sequences that have been truncated from rAAV vg and were identified using the bioinformatic analysis of various embodiments.

DETAILED DESCRIPTION OF THE INVENTION

[0023] As required, detailed embodiments of the present disclosure are disclosed herein; however, it is to be understood that the disclosed embodiments are merely exemplary and may be embodied in various and alternative forms.

[0024] Except in the examples, or where otherwise expressly indicated, all numerical quantities in this description indicating amounts of material or conditions of reaction and/or use are to be understood as modified by the word “about”. For example, a description referring to "about X" includes a description of "X.” In one example, the term “about” is understood as within a range of normal tolerance in the art, for example within 2 standard deviations of the mean. In different examples, “about” refers a variability of ±0.0001%, ±0.0005%, ±0.001%, ±0.005%, ±0.01%, ±0.05%, ±0.1%, ±0.5%, ±1%, ±5%, or ±10%. In further examples, “about” can be understood as within ±9%, ±8%, ±7%, ±6%, ±5%, ±4%, ±3%, or ±2%. Also as used herein, a range includes end values.

[0025] The first definition of an acronym or other abbreviation applies to all subsequent uses herein of the same abbreviation and applies mutatis mutandis to normal grammatical variations of the initially defined abbreviation; and, unless expressly stated to the contrary, measurement of a property is determined by the same technique as previously or later referenced for the same property.

[0026] Unless indicated otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present disclosure belongs.

[0027] It is also to be understood that this disclosure is not limited to the specific embodiments and methods described below, as specific components and/or conditions may, of course, vary. Furthermore, the terminology used herein is used only for describing particular embodiments and is not intended to be limiting in any way.

[0028] It must also be noted that, as used in the specification and the appended claims, the singular form "a," "an," and "the" comprise plural referents unless the context clearly indicates otherwise. For example, reference to a component in the singular is intended to comprise a plurality of components.

[0029] The terms “or” and “and” can be used interchangeably and can be understood to mean “and/or”.

[0030] The term “comprising” is synonymous with “with”, “including,” “having,” “containing,” or “characterized by.” These terms are inclusive and open-ended and do not exclude additional, unrecited elements or method steps.

[0031] The phrase “consisting of’ excludes any element, step, or ingredient not specified in the claim. When this phrase appears in a clause of the body of a claim, rather than immediately following the preamble, it limits only the element set forth in that clause; other elements are not excluded from the claim as a whole.

[0032] The phrase “consisting essentially of’ limits the scope of a claim to the specified materials or steps, plus those that do not materially affect the basic and novel characteristic(s) of the claimed subject matter.

[0033] Throughout this application, where publications are referenced, the disclosures of these publications in their entireties are hereby incorporated by reference into this application to more fully describe the state of the art to which this invention pertains. [0034] As previously noted, embodiments of methods of preparing an AAV vg or rAAV vg for sequencing are disclosed.

[0035] In various embodiments, methods of preparing a genome from an AAV particle for sequencing are disclosed. The methods comprise the step of isolating a genome from an AAV particle. The genome comprises an inverted terminal repeat (ITR) that is formed into a step-loop structure with complementary 3’ and 5’ stem arms and a first polynucleotide that is single stranded and linked to a 5’ end of the 5’ stem arm. The methods further comprise the step of incubating a deoxyribonucleic acid (DNA) polymerase with the isolated genome to synthesize a second polynucleotide extending from a 3’ end of the 3’ stem arm. The second polynucleotide is complementary to the first polynucleotide and forms a double stranded polynucleotide with the first polynucleotide. The methods also comprise the step of linking an adapter nucleotide sequence to a 5’ end of the first polynucleotide and a 3’ end of the second polynucleotide.

[0036] In various embodiments, the first polynucleotide of any embodiment further comprises a second ITR formed into a step-loop structure with complementary 3’ and 5’ stem arms.

[0037] In various embodiments, the incubating step of any embodiment further comprises the DNA polymerase denaturing the stem-loop structure of the second ITR.

[0038] In various embodiments, the second polynucleotide of any embodiment further comprises a sequence complementary to the sequence of the second ITR.

[0039] As previously noted, embodiments of methods and processes of analyzing and characterizing of an AAV vg or rAAV vg are disclosed.

[0040] In various embodiments, methods of analyzing AAV genomes are disclosed. The methods comprise the step of aligning reference nucleotide sequences to sequence reads of AAV genomes in which each reference nucleotide sequence comprises predetermined sequences of 5’ and 3’ ITRs. The methods further comprise determining alignment scores of each reference nucleotide sequence to the sequence reads and identifying strand types of the AAV genomes based on the alignment scores. The strand types are selected from at least one of a positive nucleotide strand, negative nucleotide strand, and polynucleotide contaminate. The methods further comprise determining first and second coordinates within the sequence reads based on alignments with the reference nucleotide sequences used to identify strand type. The first coordinate identifies the 5’ end of the AAV genomes and the second coordinate identifies the 3’ end of the AAV genomes. The methods also comprise analyzing sequences adjacent to the first and second coordinates to identify ITR configurations of the AAV genomes.

[0041] In various embodiments, the difference between a sequence length of the upper strand preceding the ITR region and the sequence length of the bottom strand succeeding the ITR region is between -45 and 45.

[0042] In various embodiments, the ITR configurations for the positive nucleotide strand (5’ ITR/3TTR) are flip/flip, flip/flop, flop/flip, and flop/flop.

[0043] In various embodiments, the ITR configurations for the negative nucleotide strand (5’ ITR/3TTR) are flip/flip, flip/flop, flop/flip, and flop/flop.

[0044] In various embodiments, the AAV genome is a double stranded 3’ITR extended recombinant AAV genome.

[0045] In various embodiments, the alignment scores of each reference nucleotide sequence to the sequence reads range from 0 to a positive integer value, where a match between the reference nucleotide sequence and the sequence read has a score of 1.

[0046] In various embodiments, methods of analyzing AAV genomes of various embodiments further comprise the step of identifying one or more metrics of the AAV genomes. Examples of metrics include AAV polynucleotide sequence concentrations, residual polynucleotide sequence concentrations, plus strand concentrations, minus strand concentrations, concentrations of the AAV genomes with ITR sequences, concentrations of the AAV genomes without ITR sequences, or lengths of the AAV genomes.

[0047] In various embodiments, methods of analyzing AAV genomes of various embodiments further comprise the step of modifying production of recombinant AAV, where the modifying alters the one or more metrics. In other embodiments, methods of analyzing AAV genomes of various embodiments further comprise the step of modifying one or more vectors for rAAV production in a host cell, where the modifying alters the one or more metrics. In further embodiments, methods of analyzing AAV genomes of various embodiments further comprise the step of one or more nucleic acid molecules from which the AAV genomes are generated, where the modifying alters the one or more metrics.

[0048] In various embodiments, methods of analyzing AAV genomes of various embodiments further comprise the step of identifying a deletion in the AAV genomes.

[0049] In various embodiments, methods of analyzing AAV genomes of various embodiments further comprise the steps of determining a nucleotide sequence of the deletion and modelling secondary structure formation of the deleted nucleotide sequences during an environmental parameter. Examples of environmental parameters include Rep protein activity, polymerase activity, Rep polymerase activity, temperature, adenosine triphosphate concentration, cell culture medium composition, salt concentration of a cell culture medium, or a cell line parameter.

[0050] In various embodiments, methods of analyzing AAV genomes of various embodiments further comprise the step of modifying one or more nucleic acid molecules from which the AAV genomes are generated to remove one or more nucleotides corresponding to the deletion.

[0051] Definitions

[0052] The term “heterologous” refers to a polynucleotide sequence that is nonnative to AAV or a cell or is native to AAV or a cell but is not located in its native location or position within the viral genome or host cells genome.

[0053] “Encodes,” “encoded” and “encoding” refer to the inherent property of specific sequences of nucleotides in a polynucleotide, such as a gene, a copy DNA/complementary DNA (cDNA), or a messenger ribonucleic acid (mRNA), to serve as templates for synthesis of other polymers and macromolecules in biological processes. Thus, a gene encodes a protein if transcription and translation of mRNA produced by that gene produces the protein in a cell or other biological system. Both the coding strand, the nucleotide sequence of which is identical to the mRNA sequence and is usually provided in sequence listings, and non-coding strand, used as the template for transcription, of a gene or cDNA can be referred to as encoding the protein or other product of that gene or cDNA.

[0054] The term “expression control element” refers to a nucleic acid sequence in a polynucleotide that is capable of regulating the expression of a nucleotide sequence to which it is operably linked thereto. “Operatively linked” refers to a functional relationship between two parts in which the activity of one-part (e.g., the ability to regulate transcription) results in an action on the other part (e.g., transcription of the sequence). An expression control element is “operably linked” to a nucleotide sequence when the element controls or regulates the transcription or the translation of the nucleotide sequence. Examples of an expression control element includes sequences of promoters (e.g., inducible or constitutive), enhancers, transcription terminators, a start codon (e.g., ATG), splicing signals for introns, stop codons, internal ribosome entry sites, homology region elements (e.g., homology region 2 from Autographa californica multi capsid nucleo- polyhedrovirus (AcMNPV)), AAV regulatory elements (e.g., Rep binding element), etc).

[0055] The term “promoter” or “promoter polynucleotide” is understood to mean a regulatory sequence/element or control sequence/element that is capable of binding/recruiting an RNA polymerase and initiating transcription of sequence downstream or in a 3' direction from the promoter. A promoter can be, for example, constitutively active (always on) or inducible in which the promoter is active or inactive in the presence of an external stimulus. The promoter is capable of expressing proteins at high concentration. For example, the transcript level of the promoter is about or is at least about 1.5-fold, 2-fold, 2.5-fold, 3-fold, 3.5-fold, 4-fold, 4.5-fold, 5-fold, 5.5-fold, 6-fold, 6.5-fold, 7-fold, 7.5-fold, 8-fold, 8.5-fold, 9-fold, 9.5-fold, 10-fold, 10.5-fold, 11-fold, 11.5-fold, 12-fold, 12.5-fold, 13-fold, 13.5-fold, 14-fold, 14.5-fold, 15-fold, 15.5-fold, 16-fold, 16.5-fold, 17-fold, 17.5-fold, 18-fold, 18.5-fold, 19-fold, 19.5-fold, 20-fold, 50-fold, 100-fold, 250-fold, 500-fold, 1000-fold, 2000-fold, 2500-fold, 3000-fold, 3500-fold, 4000-fold, 4500-fold, 5000-fold, 5500-fold, 6000-fold, 6500-fold, 7000-fold, 7500-fold, 8000-fold, 8500- fold, 9000-fold, 9500-fold, or 10000-fold higher than a transcript level of a native promoter for an operon encoding the regulatory protein. In different examples, the transcript level of the constitutive promoter polynucleotide is a range between any two levels listed above. The promoter can also be positioned to other expression control element(s) to control transcript expression. For example, an expression cassette with a promoter, homology region element, and/or AAV regulatory element can be stably incorporated into the genome of an insect cell such that baculovirus infection of an insect cell induces transcript expression from the expression cassette (See US2012/0100606).

[0056] The term “sequencing” refers to the laboratory techniques of determining the specific order of nucleic acid residues in a biological sample, including “next generation” sequencing methods. In general, a double stranded nucleic acid is linked using oligonucleotides to connect the sense and antisense strands. Once the repetitive sequences of same DNA species have been obtained, the sequence read of the double stranded nucleic acid is determined by consensus agreements between the repetitive DNA sequences originative from the same DNA species. Sequencing may also be carried out via nanopore sequencing in which the molecule to be sequenced is translocated through a nanopore having an applied electric field. The sequence of the molecule is determined based on the interactions of the molecule as it traverses the pore.

[0057] The term “adapter nucleotide sequence” refers to heterologous polynucleotide sequences that are ligated to double stranded polynucleotide sequences or fragments. Adapter nucleotide sequences are used in sequencing and have elements that support the sequencing process. The size of the adapter nucleotide sequences can vary and examples sizes of these adapter nucleotide sequences can be ~40 base pairs (bp), ~50 bp, -60 bp, -70 bp, -80 bp, -90 bp, -100 bp, -110 bp,

-120 bp, -130 bp, -140 bp, -150 bp, -160 bp, -170 bp, -180 bp, -190 bp, -200 bp, -210 bp,

-220 bp, -230 bp, -240 bp, -250 bp, -260 bp, -270 bp, -280 bp, -290 bp, -300 bp, -310 bp,

-320 bp, -330 bp, -340 bp, -350 bp, -360 bp, -370 bp, -380 bp, -390 bp, -400 bp, -410 bp,

-420 bp, -430 bp, -440 bp, -450 bp, -460 bp, -470 bp, -480 bp, -490 bp, or -500 bp. Adapter nucleotide sequences may also: include sequences with secondary structures such as hairpin structures; are either single stranded or double stranded; include various nucleotide types such as deoxyuracil, deoxyinosine, phosphothiates, or polynucleotides with phosphate backbones or phos- phorothioate modified oligonucleotides with phosphorothioate bonds; include various modifications such as being biotinylated, amidated, phosphorylated, thiolated or modified to include an aldehyde; or be modified to include various fluorescent dyes (e g., Fluorescein) or molecular probes (e g., ALEXA FLOURO TM probes). Adapter nucleotide sequences may also include different elements that are important to sequencing process and workflow. For example, the adapter nucleotide sequences include a flow cell binding sequence that aids in binding of the polynucleotide sequence to which the adapter nucleotide sequence is ligated to binds a flow cell in a sequencing platform. In another example, the adapter nucleotide sequences include a sequencer primer binding site that allows for the binding of a sequencing primer nucleotide sequence, which enables the recruitment of DNA polymerases to bind and extend. In a further example, the adapter nucleotide sequences include tags such as barcodes, where the tag sequence or index allows combining of samples for processing and identifying samples in a single sequence run. [0058] The term “sequence read” refers to the resulting alignment of multiple nucleic acid fragment sequences, or sequence obtained via sequencing. A sequence read may be circular or a long- read and includes a sequence of a given set of nucleotides. Sequences reads are also identified through alignments of multiple nucleotide sequences on vector genomes from an rAAV production or a sample containing rAAV particles. The alignments on the vector genomes are carried at least one time or multiple times. This can be done, for example, to improve the accuracy of the alignments and the sequence read. For example, the alignments are repeated at least 1 time (x), at least 2x, at least 3x, at least 4x, at least 5x, at least 6x, at least 7x, at least 8x, at least 9x, at least lOx, at least 20x, at least 30x, at least 40x, at least 50x, at least 60x, at least 70x, at least 80x, at least 90x, at least lOOx, at least 200x, at least 300x, at least 400x, at least 500x, at least 600x, at least 700x, at least 800x, at least 900x, at least lOOOx, at least 2000x, at least 3000x, at least 4000x, at least 5000x, at least 6000x, at least 7000x, at least 8000x, at least 9000x, at least lOOOOx, at least 20000x, at least 30000x, at least 40000x, at least 50000x, at least 60000x, at least 70000x, at least 80000x, at least 90000x, at least lOOOOOx, at least 200000x, at least 300000x, at least 400000x, at least 500000x, at least 600000x, at least 700000x, at least 800000x, at least 900000x, at least lOOOOOOx, at least 2000000x, at least 3000000x, at least 4000000x, at least 5000000x, at least 6000000x, at least 7000000x, at least 8000000x, at least 9000000x, at least lOOOOOOOx, at least 20000000x, at least 30000000x, at least 40000000x, at least 50000000x, at least 60000000x, at least 70000000x, at least 80000000x, at least 90000000x, at least lOOOOOOOOx, at least 200000000x, at least 300000000x, at least 400000000x, at least 500000000x, at least 600000000x, at least 700000000x, at least 800000000x, at least 900000000x, at least lOOOOOOOOOx, at least 2000000000x, at least 3000000000x, at least 4000000000x, at least 5000000000x, at least 6000000000x, at least 7000000000x, at least 8000000000x, at least 9000000000x, or at least lOOOOOOOOOOx. In other examples, the alignments are repeated 1 time (x), 2x, 3x, 4x, 5x, 6x, 7x, 8x, 9x, lOx, 20x, 3 Ox, 40x, 50x, 60x, 70x, 80x, 90x, lOOx, 200x, 300x, 400x, 500x, 600x, 700x, 800x, 900x, lOOOx, 2000x, 3000x, 4000x, 5000x, 6000x, 7000x, 8000x, 9000x, lOOOOx, 20000x, 30000x, 40000x, 50000x, 60000x, 70000x, 80000x, 90000x, lOOOOOx, 200000x, 300000x, 400000x, 500000x, 600000x, 700000x, 800000x, 900000x, lOOOOOOx, 2000000x, 3000000x, 4000000x, 5000000x, 6000000x, 7000000x, 8000000x, 9000000x, lOOOOOOOx, 20000000x, 30000000x, 40000000x, 50000000x, 60000000x, 70000000x, 80000000x, 90000000x, lOOOOOOOOx, 200000000x, 300000000x, 400000000x, 500000000x, 600000000x, 700000000x, 800000000x, 900000000x, lOOOOOOOOOx, 2000000000x, 3000000000x, 4000000000x, 5000000000x, 6000000000x, 7000000000x, 8000000000x, 9000000000x, or lOOOOOOOOOOx. In various examples, the number of times the alignments are repeated is a range between any two number of times provided above.

[0059] Recombinant Viruses used for Gene Therapy and Adeno-Associated Virus

[0060] The term “gene therapy” refers to the correction of defective genes by introducing normal genes or therapeutic genes into target cells that require gene therapy, or the prevention or treatment of genetic defects through genetic modification of cells by adding new functions to the cells. To this end, viruses such as retrovirus, lentivirus, adenovirus, adeno associated virus, and herpes simplex virus can be used as a vehicle for gene therapy.

[0061] “ AAV” is a standard abbreviation for adeno-associated virus. Adeno-associated virus is a single-stranded DNA parvovirus having a genome encapsidated by a capsid. There are currently thirteen serotypes of AAV that have been characterized. General information and reviews of AAV can be found in, for example, Carter, 1989, Handbook of Parvoviruses, Vol. 1, pp. 169- 228; and Berns, 1990, Virology, pp. 1743-1764, Raven Press, (New York). However, it is fully expected that these same principles will be applicable to additional AAV serotypes since it is well known that the various serotypes are quite closely related, both structurally and functionally, even at the genetic level. (See, e.g., Blacklowe, 1988, pp. 165-174 of Parvoviruses and Human Disease, J. R. Pattison, ed.; and Rose, Comprehensive Virology 3: 1-61 (1974)). For example, all AAV serotypes apparently exhibit very similar replication properties mediated by homologous rep genes; and all bear three related capsid proteins. The degree of relatedness is further suggested by heteroduplex analysis which reveals extensive cross-hybridization between serotypes along the length of the genome; and the presence of analogous self-annealing segments at the termini that correspond to inverted terminal repeats (ITRs). The similar infectivity patterns also suggest that the replication functions in each serotype are under similar regulatory control.

[0062] An “AAV viral particle” as used herein refers to an infectious viral particle composed of at least one AAV capsid protein and an encapsidated AAV genome. “Recombinant AAV” or “rAAV”, “rAAV virion” or “rAAV viral particle” refers to a viral particle composed of at least one capsid or Cap protein and an encapsidated rAAV vector genome as described herein. Thus, production of rAAV particles includes production of an rAAV vector genome. [0063] “Capsid” refers to the structure in which the rAAV vector genome is packaged. The capsid includes VP1 proteins or VP3 proteins, but more typically, all three of VP1, VP2, and VP3 proteins, as found in native AAV. The sequence of the capsid proteins determines the serotype of the rAAV virions. rAAV virions include those derived from a number of AAV serotypes, including AAV1, AAV2, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, Bba21, Bba26, Bba27, Bba29, Bba30, Bba31, Bba32, Bba33, Bba34, Bba35, Bba36, Bba37, Bba38, Bba41, Bba42, Bba43, Bba44, Bcel4, Bcel5, Bcel6, Bcel7, Bcel8, Bce20, Bce35, Bce36, Bce39, Bce40, Bce41, Bce42, Bce43, Bce44, Bce45, Bce46, Bey20, Bey 22, Bey23, Bma42, Bma43, Bpol, Bpo2, Bpo3, Bpo4, Bpo6, Bpo8, Bpol3, Bpol8, Bpo20, Bpo23, Bpo24, Bpo27, Bpo28, Bpo29, Bpo33, Bpo35, Bpo36, Bpo37, Brh26, Brh27, Brh28, Brh29, Brh30, Brh31, Brh32, Brh33, Bfml7, Bfml8, Bfm20, Bfm21, Bfm24, Bfm25, Bfm27, Bfm32, Bfm33, Bfm34, Bfm35, AAV-rhlO, AAV-rh39, AAV-rh43, AAVanc80L65, or any variants thereof (see, e.g., U.S. Patent No. 8,318,480 for its disclosure of non-natural mixed serotypes). Exemplary capsids are also provided in International Application No. WO 2018/022608 and WO 2019/222136, which are incorporated herein in its entirety. The capsid proteins can also be variants of natural VP1, VP2 and VP3, including mutated, chimeric or shuffled proteins. The capsid proteins can be those of rh.10 or other subtype within the various clades of AAV; various clades and subtypes are disclosed, for example, in U.S. Patent No. 7,906,111. In various embodiments, the capsid of the AAV viral particle has an acetylated or unacetylated VP1, VP2, or VP3 protein with an amino acid sequence that is at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% identical to a portion of an amino acid sequence from AAV-1 (Genbank Accession No. AAD27757.1), AAV-2 (NCBI Reference Sequence No. YP_680426.1), AAV-3 (NCBI Reference Sequence No. NP_043941.1), AAV-3B (Genbank Accession No. AAB95452.1), AAV-4 (NCBI Reference Sequence No. NP_044927.1), AAV-5 (NCBI Reference Sequence No. YP_068409.1), AAV-6 (Genbank Accession No. AAB95450.1), AAV-7 (NCBI Reference Sequence No. YP_077178.1), AAV-8 (NCBI Reference Sequence No. YP_077179.1), AAV-9 (Genbank Accession No. AAS99264.1), AAV-10 (Genbank Accession No. AAT46337.1), AAV-11 (Genbank Accession No. AAT46339.1), AAV-12 (Genbank Accession No. ABI16639.1), AAV-13 (Genbank Accession No. ABZ10812.1), or any amino acid sequence disclosed in WO 2018/022608 and WO 2019/222136. Construction and use of AAV proteins of different serotypes are discussed in Chao et al., Mol. Ther. 2:619-623, 2000; Davidson et al., PNAS 97:3428-3432, 2000; Xiao et al., J. Virol. 72:2224-2232, 1998; Halbert et al., J. Virol. 74: 1524-1532, 2000; Halbert et al., J. Virol. 75:6615-6624, 2001; and Auricchio et al., Hum. Molec. Genet. 10:3075-3081, 2001. In alternative embodiments or examples, the capsid can be replaced with a lipid nanoparticle in which the AAV/rAAV vector genome is positioned within the lipid nanoparticle. In this context, AAV, rAAV, AAV particle, or rAAV particle can refer to a non-virus particle with lipid nanoparticle instead of a capsid. The term “lipid nanoparticle” refers to nanoparticles comprising lipids that can be used a vehicle for delivering the AAV/rAAV vector genome to a cell. Lipid nanoparticles can include different lipids such as phospholipids, sterols, sphingolipids, glycerolipids, pegylated/polyethylene glycol, anionic lipids, or cationic lipids.

[0064] As used herein, an “AAV genome”, “rAAV genome”, “AAV vector genome”, “vector genome”, or “rAAV vector genome” refers to single-stranded nucleic acids. An rAAV viral particle has an rAAV vector genome encapsidated within a capsid. The rAAV vector genome has an AAV 5' inverted terminal repeat (ITR) sequence and an AAV 3' ITR flanking a protein-coding sequence (for example, a functional therapeutic protein-encoding sequence; e.g., FVIII, FIX, and PAH) operably linked to transcription regulatory elements that are heterologous to the AAV viral genome, i.e., one or more promoters and/or enhancers and, optionally, a polyadenylation sequence and/or one or more introns inserted in the regulatory elements or between the regulatory elements and the protein-coding sequence or between exons of the protein-coding sequence. rAAV vector genome refers to nucleic acids that are present in the rAAV virus particle and can be either the sense strand or the anti-sense strand of the nucleic acid sequences disclosed herein. The size of such single-stranded nucleic acids is provided in bases. The terms “inverted terminal repeat” and “ITR” as used herein refers to the art-recognized regions found at the 5' and 3' termini of the rAAV genome which function in cis as origins of viral DNA replication and as packaging signals for the viral genome. AAV ITRs, together with the Rep proteins, provide for efficient excision and rescue from, and integration of a nucleotide sequence interposed between two flanking ITRs into a host cell genome. Sequences of certain AAV-associated ITRs are disclosed by Yan et al., J. Virol. 79(l):364-379 (2005). ITRs are also found in a “flip” or “flop” configuration in which the sequence between the AA’ inverted repeats (that form the arms of the hairpin) are present in the reverse complement (Wilmott, Patrick, et al. Human gene therapy methods 30.6 (2019): 206-213). Construction and use of AAV vector genomes of different serotypes are discussed in Chao et al., Mol. Ther. 2:619-623, 2000; Davidson et al., PNAS 97:3428-3432, 2000; Xiao et al., J. Virol. 72:2224-2232, 1998; Halbert et al., J. Virol. 74: 1524-1532, 2000; Halbert et al., J. Virol. 75:6615-6624, 2001; and Auricchio et al., Hum. Molec. Genet. 10:3075- 3081, 2001. Because of wide construct availability and extensive characterization, illustrative AAV vector genomes disclosed below are derived from serotype 2.

[0065] A therapeutically effective rAAV particle or therapeutic rAAV is capable of infecting cells such that the infected cells express (e.g., by transcription and/or by translation) an element (e.g. nucleotide sequence, protein, etc.) of interest. To this extent, the therapeutically effective rAAV particles can include AAV particles having capsids or vector genomes (vgs) with different properties. For example, the therapeutically effective AAV particles can have capsids with different posttranslational modifications. In other examples, the therapeutically effective AAV particles can contain vgs with differing sizes/lengths, plus or minus strand sequences, different flip/flop ITR configurations flip/flop, flop/flip, flip/flip, flop/flop, etc.), different number of ITRs (1, 2, 3, etc.), or truncations. For example, annealing/compl ementation of overlapping truncated plus and minus genomes occurs in AAV infected cells such that a "complete" nucleic acid encoding the large protein is generated, thereby reconstructing a functional, full-length gene. Therapeutically effective AAV particles are also referred to as “heavy” or “full” capsids.

[0066] As an example, a "therapeutic rAAV”, which refers to an rAAV virion, rAAV viral particle, rAAV vector particle, or rAAV that comprises a heterologous polynucleotide that encodes a therapeutic protein, can be used to replace or supplement the protein in vivo. The "therapeutic protein" is a polypeptide that has a biological activity that replaces or compensates for the loss or reduction of activity of a corresponding endogenous protein. For example, a functional phenylalanine hydroxylase (PAH) is a therapeutic protein for phenylketonuria (PKU). Thus, for example recombinant AAV PAH virus can be used for a medicament for the treatment of a subject suffering from PKU. The medicament may be administered by intravenous (IV) administration and the administration of the medicament results in expression of PAH protein in the subject at levels sufficient to alter the neurotransmitter metabolite or neurotransmitter levels in the subject. Optionally, the medicament may also comprise a prophylactic and/or therapeutic corticosteroid for the prevention and/or treatment of any hepatotoxicity associated with administration of the rAAV encoding PAH. The medicament comprising a prophylactic or therapeutic corticosteroid treatment may comprise at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, or more mg/day of the corticosteroid. The medicament comprising a prophylactic or therapeutic corticosteroid may be administered over a continuous period of at least about 3, 4, 5, 6, 7, 8, 9, 10 weeks, or more. The PKU therapy may optionally also include tyrosine supplements.

[0067] The transgene incorporated into the AAV capsid is not limited and may be any heterologous gene of therapeutic interest. The transgene is a nucleic acid sequence, heterologous to the AAV ITR sequences flanking the transgene, which encodes a polypeptide, protein, or other product, of interest. The nucleic acid coding sequence is operatively linked to regulatory components in a manner which permits transgene transcription, translation, and/or expression in a host cell.

[0068] The composition of the transgene sequence will depend upon the use to which the resulting virus will be put. For example, one type of transgene sequence includes a reporter sequence, which upon expression produces a detectable signal. Such reporter sequences include, without limitation, DNA sequences encoding b-lactamase, b-galactosidase (LacZ), alkaline phosphatase, thymidine kinase, green fluorescent protein (GFP), chloramphenicol acetyltransferase (CAT), luciferase, membrane bound proteins including, for example, CD2, CD4, CD8, the influenza hemagglutinin protein, and others well known in the art, to which high affinity antibodies directed thereto exist or can be produced by conventional means, and fusion proteins comprising a membrane bound protein appropriately fused to an antigen tag domain from, among others, hemagglutinin or Myc.

[0069] These coding sequences, when associated with regulatory elements which drive their expression, provide signals detectable by conventional means, including enzymatic, radiographic, colorimetric, fluorescence or other spectrographic assays, fluorescent activating cell sorting assays and immunological assays, including enzyme linked immunosorbent assay (ELISA), radioimmunoassay (RIA) and immunohistochemistry. For example, where the marker sequence is the LacZ gene, the presence of cells infected by rAAV encoding the signal is detected by assays for beta-galactosidase activity. Where the transgene is green fluorescent protein or luciferase, the rAAV encoding the signal may be detected by instruments measuring fluorescence or luminescence.

[0070] However, the transgene is typically a non-marker sequence encoding a product which is useful in biology and medicine, such as proteins, peptides, RNA, enzymes, dominant negative mutants, or catalytic RNAs. Desirable RNA molecules include tRNA, dsRNA, ribosomal RNA, catalytic RNAs, siRNA, small hairpin RNA, trans-splicing RNA, and antisense RNAs. One example of a useful RNA sequence is a sequence which inhibits or extinguishes expression of a targeted nucleic acid sequence in the treated subject. Typically, suitable target sequences include oncologic targets and viral diseases. See, for examples of such targets the oncologic targets and viruses identified below in the section relating to immunogens.

[0071] The transgene may be used to correct or ameliorate gene deficiencies, which may include deficiencies in which normal genes are expressed at less than normal levels or deficiencies in which the functional gene product is not expressed. An exemplary type of transgene sequence encodes a therapeutic protein or polypeptide which is expressed in an infected cell. The vector genome may further include multiple transgenes, e.g., to correct or ameliorate a gene defect caused by a multi-subunit protein. In certain situations, a different transgene may be used to encode each subunit of a protein, or to encode different peptides or proteins. This is desirable when the size of the DNA encoding the protein subunit is large, e.g., for an immunoglobulin, the platelet-derived growth factor, or a dystrophin protein. In order for the cell to produce the multi-subu- nit protein, a cell is infected with the recombinant virus containing each of the different subunits. Alternatively, different subunits of a protein may be encoded by the same transgene. In this case, a single transgene includes the DNA encoding each of the subunits, with the DNA for each subunit separated by an internal ribozyme entry site (IRES). This is desirable when the size of the DNA encoding each of the subunits is small, e.g., the total size of the DNA encoding the subunits and the IRES is less than five kilobases. As an alternative to an IRES, the coding sequences may be separated by sequences encoding a 2A peptide, which self-cleaves in a post-translational event. See, e.g., Donnelly et al, J. Gen. Virol., 78(Pt 1): 13-21 (January 1997); Furler, et al, Gene Ther., 8(1 1 ):864-873 (June 2001); Klump et al, Gene Ther., 8(1O):8 11-817 (May 2001). This 2A peptide is significantly smaller than an IRES, making it well suited for use when space is a limiting factor. More often, when the transgene is large, consists of multi- subunits, or two transgenes are co-delivered, rAAV carrying the desired transgene(s) or subunits are co-adminis- tered to allow them to concatamerize in vivo to form a single vector genome. In such an embodiment, a first AAV may carry an expression cassette which expresses a single transgene and a second AAV may carry an expression cassette which expresses a different transgene for co-expression in the host cell. However, the selected transgene may encode any biologically active product or other product, e.g., a product desirable for study. [0072] Suitable transgenes may be readily selected by one of skill in the art. The selection of the transgene is not considered to be a limitation of this invention. The transgene may be a heterologous protein, and this heterologous protein may be a therapeutic protein. Exemplary therapeutic proteins include, but are not limited to, blood factors, such as b-globin, hemoglobin, tissue plasminogen activator, and coagulation factors; colony stimulating factors (CSF); interleukins, such as IL-1, IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9, etc.; growth factors, such as keratino- cyte growth factor (KGF), stem cell factor (SCF), fibroblast growth factor (FGF, such as basic FGF and acidic FGF), hepatocyte growth factor (HGF), insulin-like growth factors (IGFs), bone morphogenetic protein (BMP), epidermal growth factor (EGF), growth differentiation factor-9 (GDF-9), hepatoma derived growth factor (HDGF), myostatin (GDF-8), nerve growth factor (NGF), neurotrophins, platelet-derived growth factor (PDGF), thrombopoietin (TPO), transforming growth factor alpha (TGF-a.), transforming growth factor beta (TGF-.b ), and the like; soluble receptors, such as soluble TNF-a. receptors, soluble VEGF receptors, soluble interleukin receptors (e.g., soluble IL-1 receptors and soluble type II IL-1 receptors), soluble g/d T cell receptors, ligand-binding fragments of a soluble receptor, and the like; enzymes, such as a-gluco- sidase, imiglucarase, b-glucocerebrosidase, and alglucerase; enzyme activators, such as tissue plasminogen activator; chemokines, such as IP-10, monokine induced by interferon-gamma (Mig), Groa/IL-8, RANTES, MIP-la, MIR- lb., MCP-1, PF-4, and the like; angiogenic agents, such as vascular endothelial growth factors (VEGFs, e.g., VEGF121, VEGF165, VEGF-C, VEGF-2), glioma-derived growth factor, angiogenin, angiogenin-2; and the like; anti -angiogenic agents, such as a soluble VEGF receptor; protein vaccine; neuroactive peptides, such as nerve growth factor (NGF), bradykinin, cholecystokinin, gastin, secretin, oxytocin, gonadotropin-releasing hormone, beta-endorphin, enkephalin, substance P, somatostatin, prolactin, galanin, growth hormone-releasing hormone, bombesin, dynorphin, warfarin, neurotensin, motilin, thyrotropin, neuropeptide Y, luteinizing hormone, calcitonin, insulin, glucagons, vasopressin, angiotensin II, thyrotropin-releasing hormone, vasoactive intestinal peptide, a sleep peptide, and the like; thrombolytic agents; atrial natriuretic peptide; relaxin; glial fibrillary acidic protein; follicle stimulating hormone (FSH); human alpha- 1 antitrypsin; leukemia inhibitory factor (LIF); tissue factors, luteinizing hormone; macrophage activating factors; tumor necrosis factor (TNF); neutrophil chemotactic factor (NCF); tissue inhibitors of metalloproteinases; vasoactive intestinal peptide; angiogenin; angiotropin; fibrin; hirudin; IF-1 receptor antagonists; cardiac factors such as cardiac myosin binding protein C; and the like. Some other non-limiting examples of protein of interest include ciliary neurotrophic factor (CNTF); brain-derived neurotrophic factor (BDNF); neurotrophins 3 and 4/5 (NT-3 and 4/5); glial cell derived neurotrophic factor (GDNF); aromatic amino acid decarboxylase (AADC); hemophilia related clotting proteins, such as Factor VIII, Factor IX, Factor X; dystrophin, mini-dystrophin, or microdystrophin; lysosomal acid lipase; phenylalanine hydroxylase (PAH); glycogen storage disease-related enzymes, such as glu- cose-6-phosphatase, acid maltase, glycogen debranching enzyme, muscle glycogen phosphorylase, liver glycogen phosphorylase, muscle phosphofructokinase, phosphorylase kinase (e.g., PHKA2), glucose transporter (e.g., GFUT2), aldolase A, b-enolase, and glycogen synthase; lysosomal enzymes (e.g., beta-N-acetylhexosaminidase A); and any variants thereof. The AAV vector genome also includes conventional control elements or sequences which are operably linked to the transgene in a manner which permits its transcription, translation and/or expression in a cell transfected with the vector or infected with the virus. As used herein, "operably linked" sequences include both expression control sequences that are contiguous with the gene of interest and expression control sequences that act in trans or at a distance to control the gene of interest. Suitable genes include those genes discussed in Anguela et al. “Entering the Modern Era of Gene Therapy’’, Annual Rev. of Med. Vol. 70, pages 272-288 (2019) and Dunbar et al., “Gene comes of age”, Science, Vol. 359, Issue 6372, eaan4672 (2018).

[0073] Expression control sequences can be linked to the transgenes. Examples of expression control sequences include appropriate transcription initiation, termination, promoter and enhancer sequences; efficient RNA processing signals such as splicing and polyadenylation (poly A) signals; sequences that stabilize cytoplasmic mRNA; sequences that enhance translation efficiency (i.e., Kozak consensus sequence); sequences that enhance protein stability; and when desired, sequences that enhance secretion of the encoded product. A great number of expression control sequences, including promoters which are native, constitutive, inducible and/or tissuespecific, are known in the art and may be utilized. Examples of constitutive promoters include, without limitation, the retroviral Rous sarcoma virus (RSV) LTR promoter (optionally with the RSV enhancer), the cytomegalovirus (CMV) promoter (optionally with the CMV enhancer) (see, e.g., Boshart el al, Cell, 41 :521 -530 (1985)), the SV40 promoter, the dihydrofolate reductase promoter, the b-actin promoter, the phosphoglycerol kinase (PGK) promoter, and the EFl promoter. Inducible promoters allow regulation of gene expression and can be regulated by exogenously supplied compounds, environmental factors such as temperature, or the presence of a specific physiological state, e.g., acute phase, a particular differentiation state of the cell, or in replicating cells only. Inducible promoters and inducible systems are available from a variety of commercial sources, including, without limitation, Invitrogen, Clontech and Ariad. Many other systems have been described and can be readily selected by one of skill in the art. Examples of inducible promoters regulated by exogenously supplied compounds, include, the zinc-inducible sheep metal- lothionine (MT) promoter, the dexamethasone (Dex)-inducible mouse mammary tumor virus (MMTV) promoter, the T7 polymerase promoter system [WO 98/10088]; the ecdysone insect promoter [No et al, Proc. Nall. Acad. Sei. USA, 93:3346-3351 (1996)], the tetracycline-repressible system [Gossen et al., Proc. Natl. Acad. Sci. USA, 89:5547-5551 (1992)], the tetracyclineinducible system [Gossen et al., Science, 268: 1766-1769 (1995), see also Harvey et al., Curr. Opin. Chem. Biol., 2:512-518 (1998)], the RU486-inducible system [Wang et al., Nat. Biotech., 15:239-243 (1997) and Wang et al., Gene Ther., 4:432-441 (1997)] and the rapamycin-inducible system [Magari et al., J. Clin. Invest., 100:2865-2872 (1997)]. Other types of inducible promoters, which may be useful in this context, are those which are regulated by a specific physiological state, e.g., temperature, acute phase, a particular differentiation state of the cell, or in replicating cells only.

[0074] Optionally, the native promoter for the transgene may be used. The native promoter may be used when it is desired that expression of the transgene should mimic the native expression. The native promoter may be used when expression of the transgene must be regulated temporally or developmentally, or in a tissue-specific manner, or in response to specific transcriptional stimuli. In a further embodiment, other native expression control elements, such as enhancer elements, polyadenylation sites or Kozak consensus sequences may also be used to mimic the native expression.

[0075] The transgene may also include a gene operably linked to a tissue specific promoter. For instance, if expression in skeletal muscle is desired, a promoter active in muscle should be used. These include the promoters from genes encoding skeletal b-actin, myosin light chain 2A, dystrophin, muscle creatine kinase, as well as synthetic muscle promoters with activities higher than naturally-occurring promoters (see Li et al., Nat. Biotech., 17:241-245 (1999)). Examples of promoters that are tissue-specific are known for liver (albumin, Miyatake et al., J. Virol., 71:5124-32 (1997); hepatitis B virus core promoter, Sandig et al., Gene Ther., 3:1002-9 (1996); alpha-fetoprotein (AFP), Arbuthnot et al., Hum. Gene Ther., 7: 1503-14 (1996)), bone osteocalcin (Stein et al., Mol. Biol. Rep., 24: 185-96 (1997)); bone sialoprotein (Chen et al., J. Bone Miner. Res., 11 :654-64 (1996)), lymphocytes (CD2, Hansal et al., J. Immunol., 161 : 1063-8 (1998); immunoglobulin heavy chain; T cell receptor chain), neuronal such as neuron-specific enolase (NSE) promoter (Andersen et al., Cell. Mol. Neurobiol., 13:503-15 (1993)), neurofilament lightchain gene (Piccioli et al., Proc. Natl. Acad. Sci. USA, 88:5611-5 (1991)), and the neuron-specific vgf gene (Piccioli et al., Neuron, 15:373-84 (1995)), among others.

[0076] The recombinant AAV can be used to produce a protein of interest in vitro, for example, in a cell culture. For example, the AAV can be used in a method for producing a protein of interest in vitro, where the method includes providing a recombinant AAV comprising a nucleotide sequence encoding the heterologous protein; and contacting the recombinant AAV with a cell in a cell culture, whereby the recombinant AAV expresses the protein of interest in the cell. The size of the nucleotide sequence encoding the protein of interest can vary. For example, the nucleotide sequence can be at least about 0.1 kilobases (kb), at least about 0.2 kb, at least about 0.3 kb, at least about 0.4 kb, at least about 0.5 kb, at least about 0.6 kb, at least about 0.7 kb, at least about 0.8 kb, at least about 0.9 kb, at least about 1 kb, at least about 1.1 kb, at least about 1.2 kb, at least about 1.3 kb, at least about 1.4 kb, at least about 1.5 kb, at least about 1.6 kb, at least about 1.7 kb, at least about 1.8 kb, at least about 2.0 kb, at least about 2.2 kb, at least about 2.4 kb, at least about 2.6 kb, at least about 2.8 kb, at least about 3.0 kb, at least about 3.2 kb, at least about 3.4 kb, at least about 3.5 kb in length, at least about 4.0 kb in length, at least about 5.0 kb in length, at least about 6.0 kb in length, at least about 7.0 kb in length, at least about 8.0 kb in length, at least about 9.0 kb in length, or at least about 10.0 kb in length. In some embodiments, the nucleotide is at least about 1.4 kb in length.

[0077] The recombinant AAV can also be used to produce a protein of interest in vivo, for example in an animal such as a mammal. Some embodiments provide a method for producing a protein of interest in vivo, where the method includes providing a recombinant AAV comprising a nucleotide sequence encoding the protein of interest; and administering the recombinant AAV to the subject, whereby the recombinant AAV expresses the protein of interest in the subject. The subject can be, in some embodiments, a non-human mammal, for example, a monkey, a dog, a cat, a mouse, or a cow. The size of the nucleotide sequence encoding the protein of interest can vary. For example, the nucleotide sequence can be at least about 0.1 kb, at least about 0.2 kb, at least about 0.3 kb, at least about 0.4 kb, at least about 0.5 kb, at least about 0.6 kb, at least about 0.7 kb, at least about 0.8 kb, at least about 0.9 kb, at least about 1 kb, at least about 1.1 kb, at least about 1.2 kb, at least about 1.3 kb, at least about 1.4 kb, at least about 1.5 kb, at least about 1.6 kb, at least about 1.7 kb, at least about 1.8 kb, at least about 2.0 kb, at least about 2.2 kb, at least about 2.4 kb, at least about 2.6 kb, at least about 2.8 kb, at least about 3.0 kb, at least about 3.2 kb, at least about 3.4 kb, at least about 3.5 kb in length, at least about 4.0 kb in length, at least about 5.0 kb in length, at least about 6.0 kb in length, at least about 7.0 kb in length, at least about 8.0 kb in length, at least about 9.0 kb in length, or at least about 10.0 kb in length. In some embodiments, the nucleotide is at least about 1.4 kb in length.

[0078] Of particular interest is the use of recombinant AAV to express one or more therapeutic proteins to treat various diseases or disorders. Non-limiting examples of the diseases include cancer such as carcinoma, sarcoma, leukemia, lymphoma; and autoimmune diseases such as multiple sclerosis. Non-limiting examples of carcinomas include esophageal carcinoma; hepatocellular carcinoma; basal cell carcinoma, squamous cell carcinoma (various tissues); bladder carcinoma, including transitional cell carcinoma; bronchogenic carcinoma; colon carcinoma; colorectal carcinoma; gastric carcinoma; lung carcinoma, including small cell carcinoma and nonsmall cell carcinoma of the lung; adrenocortical carcinoma; thyroid carcinoma; pancreatic carcinoma; breast carcinoma; ovarian carcinoma; prostate carcinoma; adenocarcinoma; sweat gland carcinoma; sebaceous gland carcinoma; papillary carcinoma; papillary adenocarcinoma; cystadenocarcinoma; medullary carcinoma; renal cell carcinoma; ductal carcinoma in situ or bile duct carcinoma; choriocarcinoma; seminoma; embryonal carcinoma; Wilm's tumor; cervical carcinoma; uterine carcinoma; testicular carcinoma; osteogenic carcinoma; epithelieal carcinoma; and nasopharyngeal carcinoma. Non-limiting examples of sarcomas include fibrosarcoma, myxosarcoma, liposarcoma, chondrosarcoma, chordoma, osteogenic sarcoma, osteosarcoma, angiosarcoma, endothelio sarcoma, lymphangiosarcoma, lymphangioendotheliosarcoma, synovioma, mesothelioma, Ewing's sarcoma, leiomyosarcoma, rhabdomyosarcoma, and other soft tissue sarcomas. Non-limiting examples of solid tumors include glioma, astrocytoma, medulloblastoma, craniopharyngioma, ependymoma, pinealoma, hemangioblastoma, acoustic neuroma, oligodendroglioma, menangioma, melanoma, neuroblastoma, and retinoblastoma. Non-limiting examples of leukemias include chronic myeloproliferative syndromes; acute myelogenous leukemias; chronic lymphocytic leukemias, including B-cell CLL, T-cell CLL prolymphocytic leukemia, and hairy cell leukemia; and acute lymphoblastic leukemias. Examples of lymphomas include, but are not limited to, B-cell lymphomas, such as Burkitt's lymphoma; Hodgkin's lymphoma; and the like.

[0079] Other non-liming examples of the diseases that can be treated using rAAV and methods disclosed herein include genetic disorders including sickle cell anemia, cystic fibrosis, lysosomal acid lipase (LAL) deficiency 1, Tay-Sachs disease, Phenylketonuria, Mucopolysaccharidoses, Glycogen storage diseases (GSD, e.g., GSD types I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII, XIII, and XIV), Galactosemia, muscular dystrophy (e.g., Duchenne muscular dystrophy), and hemophilia such as hemophilia A (classic hemophilia) and hemophilia B (Christmas Disease), Wilson’s disease, Fabry Disease, Gaucher Disease hereditary angioedema (HAE), alpha 1 antitrypsin deficiency, and hypertrophic cardiomyopathy. In addition, the rAAV and methods disclosed herein can be used to treat other disorders that can be treated by local expression of a transgene in the liver or heart by expression of a secreted protein from the liver, heart, a hepatocyte, or a cardiomyocyte.

[0080] The amount of the heterologous protein expressed in the subject (e.g., the serum of the subject) can vary. For example, in some embodiments the protein can be expressed in the serum of the subject in the amount of at least about 9 milligram (mg)/milliliter (mL), at least about 10 mg/mL, at least about 11 mg/mL, at least about 12 mg/mL, at least about 13 mg/mL, at least about 14 mg/mL, at least about 15 mg/mL, at least about 16 mg/mL, at least about 17 mg/mL, at least about 18 mg/mL, at least about 19 mg/mL, at least about 20 mg/mL, at least about 21 mg/mL, at least about 22 mg/mL, at least about 23 mg/mL, at least about 24 mg/mL, at least about 25 mg/mL, at least about 26 mg/mL, at least about 27 mg/mL, at least about 28 mg/mL, at least about 29 mg/mL, at least about 30 mg/mL, at least about 31 mg/mL, at least about 32 mg/mL, at least about 33 mg/mL, at least about 34 mg/mL, at least about 35 mg/mL, at least about 36 mg/mL, at least about 37 mg/mL, at least about 38 mg/mL, at least about 39 mg/mL, at least about 40 mg/mL, at least about 41 mg/mL, at least about 42 mg/mL, at least about 43 mg/mL, at least about 44 mg/mL, at least about 45 mg/mL, at least about 46 mg/mL, at least about 47 mg/mL, at least about 48 mg/mL, at least about 49 mg/mL, or at least about 50 mg/mL. The protein of interest may be expressed in the serum of the subject in the amount of about 9 picograms (pg)/mL, about 10 pg/mL, about 50 pg/mL, about 100 pg/mL, about 200 pg/mL, about 300 pg/mL, about 400 pg/mL, about 500 pg/mL, about 600 pg/mL, about 700 pg/mL, about 800 pg/mL, about 900 pg/mL, about 1000 pg/mL, about 1500 pg/mL, about 2000 pg/mL, about 2500 pg/mL, or a range between any two of these values. A skilled artisan will understand that the expression level in which a protein of interest is needed for therapeutic efficacy can vary depending on non-limiting factors, such as the particular protein of interest and the subject receiving the treatment, and an effective amount of the protein can be readily determined by a skilled artisan using conventional methods known in the art without undue experimentation.

[0081] Methods of Producing Adeno-Associated Virus

[0082] There are different methods for generating AAV viral particles: for example, but not limited to, transfection using vector and AAV helper sequences in conjunction with coinfection with one of the AAV helper viruses (e.g., adenovirus, herpesvirus, or vaccinia virus) or transfection with a recombinant AAV vector, an AAV helper vector, and an accessory function vector. Methods of making AAV viral particles are described in e.g., U.S. Patent Nos. US6204059, US5756283, US6258595, US6261551, US6270996, US6281010, US6365394, US6475769, US6482634, US6485966, US6943019, US6953690, US7022519, US7238526, US7291498 and US7491508, US5064764, US6194191, US6566118, US8137948; or International Publication Nos. WO1996039530, W01998010088, WO1999014354, WO1999015685, WO1999047691, W02000055342, W02000075353, W02001023597, W02015191508, WO2019217513, WO20 18022608, WO2019222136, W02020232044, WO2019222132; Methods In Molecular Biology, ed. Richard, Humana Press, NJ (1995); O'Reilly et al., Baculovirus Expression Vectors, A Laboratory Manual, Oxford Univ. Press (1994); Samulski et al., J. Vir.63:3822-8 (1989);

Kajigaya et al., Proc. Nat'l. Acad. Sci. USA 88: 4646-50 (1991); Ruffing et al., J. Vir.66: 6922-30 (1992); Kimbauer et al., Vir., 219:37-44 (1996); Zhao et al., Vir.272:382-93 (2000); the contents of each of which are herein incorporated by reference in their entirety. For detailed descriptions of methods for generating AAV viral particles see, for example, U.S. Pat. Nos. 6,001,650, 6,004,797, and 9,504,762, each herein incorporated by reference in its entirety. In one embodiment, a triple transfection method (see, e.g., U.S. Pat. No. 6,001,650, herein incorporated by reference in its entirety) is used to produce AAV viral particles. This method does not require the use of an infectious helper virus, enabling AAV viral particles to be produced without any detectable helper virus present. This is accomplished by use of three vectors for AAV viral particle production, namely an AAV helper function vector, an accessory function vector, and an AAV viral particle expression vector. One of skill in the art will appreciate, however, that the nucleic acid sequences encoded by these vectors can be provided on two or more vectors in various combinations. In other embodiments, the host cell can be transfected with the helper plasmid or helper virus, the viral construct and the plasmid encoding the AAV cap genes; and the AAV viral particles can be collected at various time points after co-transfection.

[0083] Examples of host cells include mammalian cells (e.g., human cell lines, HEK293, HeLa, CHO, NSO, SP2/0, PER.C6, Vera, RD, BHK, HT 1080, A549, Cos-7, ARPE-19 or MRC- 5 cells). Other examples of host cells include insect cells. For example, the insect cell is from Spodoptera frugiperda, such as Sf9, Sf21 , Sf900+, drosophila cell lines, mosquito cell lines, for example, Aedes albopictus derived cell lines, domestic silkworm cell lines, for example, Bombyxmori cell lines, Trichoplusia rri cell lines such as High Five cells or Lepidoptera cell lines such as Ascalapha odorata cell lines. Insect cells are cells from the insect species which are susceptible to baculovirus infection, including High Five, SI9, Se301, SeIZD2109, SeUCRl, SI 00+. Sf21, BTI-TN-5B 1-4, MG-1, Tn368, HzAml, BM-N, Ha2302, Hz2E5 and Ao38.

[0084] For example, wild-type AAV and helper viruses may be used to provide the necessary replicative functions for producing AAV viral particles (see, e.g, U.S. Pat. No. 5,139,941, herein incorporated by reference in its entirety). Alternatively, a plasmid, containing helper function genes, in combination with infection by one of the well-known helper viruses can be used as the source of replicative functions (see e g., U.S. Pat. No. 5,622,856 and U.S. Pat. No. 5,139,941, both herein incorporated by reference in their entireties). Similarly, a plasmid, containing accessory function genes can be used in combination with infection by wild-type AAV, to provide the necessary replicative functions. Other approaches, described herein and/or well known in the art, can also be employed by the skilled artisan to produce AAV viral particles.

[0085] The term “vector” is understood to refer to any genetic element, such as a plasmid, phage, transposon, cosmid, bacmid, mini-plasmid (e.g., plasmid devoid of bacterial elements), Doggybone DNA (e.g., minimal, closed-linear constructs), chromosome, virus, virion (e.g., baculovirus), etc., which is capable of replication when associated with the proper control elements and which can transfer gene sequences between cells. A "mammalian cell-compatible vector" or "vector" as used herein refers to a nucleic acid molecule capable of productive transformation or transfection of a mammal or mammalian cell. An "insect cell-compatible vector" or "vector" as used herein refers to a nucleic acid molecule capable of productive transformation or transfection of an insect or insect cell. Exemplary biological vectors include plasmids, linear nucleic acid molecules, and recombinant viruses. Any vector can be employed such that it is insect cell compatible. The vector may integrate into the insect cells genome but the presence of the vector in the insect cell need not be permanent and transient episomal vectors are also included. The vectors can be introduced by any means known, for example by chemical treatment of the cells, electroporation, or infection. Vectors and methods for their use are described in the above cited references on molecular engineering of cells.

[0086] The vector from which the cell generates an rAAV vector genome may contain a promoter and a restriction site downstream of the promoter to allow insertion of a polynucleotide encoding one or more proteins of interest, wherein the promoter and the restriction site are located downstream of the 5' AAV ITR and upstream of the 3' AAV ITR. The vector may also contain a posttranscriptional regulatory element downstream of the restriction site and upstream of the 3' AAV ITR. The viral construct may further comprise a polynucleotide inserted at the restriction site and operably linked with the promoter, where the polynucleotide comprises the coding region of a protein of interest. In some embodiments, the viral construct further includes a promoter and a restriction site downstream of the promoter to allow insertion of a polynucleotide encoding one or more proteins of interest, wherein the promoter and the restriction site are located downstream of the 5' AAV ITR and upstream of the 3' AAV ITR. In some embodiments, the viral construct further incudes a posttranscriptional regulatory element downstream of the restriction site and upstream of the 3' AAV ITR. In some embodiments, the viral construct further includes a polynucleotide inserted at the restriction site and operably linked with the promoter, where the polynucleotide includes the coding region of a protein of interest. As a skilled artisan will appreciate, any one of the AAV vectors disclosed in the present application can be used in the method as the viral construct to produce the rAAV virions.

[0087] The term “AAV helper” refer to AAV-derived coding sequences which can be expressed to provide AAV gene products that, in turn, function in trans for productive AAV replication. Thus, AAV helper functions include both of the major AAV open reading frames (ORFs), rep and cap. The Rep expression products have been shown to possess many functions, including, among others: recognition, binding and nicking of the AAV origin of DNA replication; DNA helicase activity; and modulation of transcription from AAV (or other heterologous) promoters. The capsid (Cap) expression products supply necessary packaging functions. AAV helper functions are used herein to complement AAV functions in trans that are missing from AAV vector genomes.

[0088] For production, cells with AAV helper functions produce recombinant capsid proteins sufficient to form a capsid. This includes at least VP1 and VP3 proteins, but more typically, all three of VP1, VP2, and VP3 proteins, as found in native AAV. The sequence of the capsid proteins determines the serotype of the AAV virions produced by the host cell. Capsids useful in the invention include those derived from a number of AAV serotypes, including 1, 2, 3, 3B, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 or mixed serotypes (see, e.g., US Patent No. 8,318,480 for its disclosure of non-natural mixed serotypes). The capsid proteins can also be variants of natural VP1, VP2 and VP3, including mutated, chimeric or shuffled proteins. The capsid proteins can be those of rh.10 or other subtype within the various clades of AAV; various clades and subtypes are disclosed, for example, in U.S. Patent No. 7,906,111. Because of wide construct availability and extensive characterization, illustrative AAV vectors disclosed below are derived from serotype 2. Construction and use of AAV vectors and AAV proteins of different serotypes are discussed in Chao et al., Mol. Ther. 2:619-623, 2000; Davidson et al., PNAS 97:3428-3432, 2000; Xiao et al., J. Virol. 72:2224-2232, 1998; Halbert et al., J. Virol. 74:1524-1532, 2000; Halbert et al., J. Virol. 75:6615-6624, 2001; and Auricchio et al., Hum. Molec. Genet. 10:3075-3081, 2001.

[0089] For production, cells with AAV helper functions produce Rep proteins to promote production of rAAV. It has been found that infectious particles can be produced when at least one large Rep protein (Rep78 or Rep68) and at least one small Rep protein (Rep52 and Rep40) are expressed in cells. In a specific embodiment all four of Rep 78, Rep68, Rep52 and Rep 40 are expressed. Alternately, Rep78 and Rep52, Rep78 and Rep40, Rep 68 and Rep52, or Rep68 and Rep40 are expressed. Examples below demonstrate the use of the Rep78/Rep52 combination. Rep proteins can be derived from AAV-2 or other serotypes.

[0090] Cells with AAV helper functions can also produce assembly-activating proteins (AAP), which help assemble capsids. In various embodiments, nucleotide sequences encoding AAP can be operably linked to a suitable expression control sequence. For example, the nucleotide sequences can be operably linked to eukaryotic promoters. In other examples, the nucleotide sequences can be operably linked to baculoviral promoters such as the polyhedrin (Polh) promoter, AIE1 promoter, p5 promoter, plO promoter pl9 promoter, the p40 promoter, metallothionein promoter, 39K promoter, p6.9 promoter, and orf46 promoter. [0091] The term “non-AAV helper function” refers to non-AAV derived viral and/or cellular functions upon which AAV is dependent for its replication. Thus, the term captures proteins and RNAs that are required in AAV replication, including those moieties involved in activation of AAV gene transcription, stage specific AAV mRNA splicing, AAV DNA replication, synthesis of Cap expression products and AAV capsid assembly. Viral-based accessory functions can be derived from any of the known helper viruses such as adenovirus, herpesvirus (other than herpes simplex virus type-1) and vaccinia virus.

[0092] The term “non-AAV helper function vector” refers generally to a nucleic acid molecule that includes nucleotide sequences providing accessory functions. An accessory function vector can be transfected into a suitable host cell, wherein the vector is then capable of supporting AAV virion production in the host cell. Expressly excluded from the term are infectious viral particles as they exist in nature, such as adenovirus, herpesvirus or vaccinia virus particles. Thus, accessory function vectors can be in the form of a plasmid, phage, transposon or cosmid. In particular, it has been demonstrated that the full-complement of adenovirus genes are not required for accessory helper functions. For example, adenovirus mutants incapable of DNA replication and late gene synthesis have been shown to be permissive for AAV replication. Ito et al., (1970) J. Gen. Virol. 9:243; Ishibashi et al, (1971) Virology 45:317. Similarly, mutants within the E2B and E3 regions have been shown to support AAV replication, indicating that the E2B and E3 regions are probably not involved in providing accessory functions. Carter et al., (1983) Virology 126:505. However, adenoviruses defective in the El region, or having a deleted E4 region, are unable to support AAV replication. Thus, El A and E4 regions are likely required for AAV replication, either directly or indirectly. Laughlin et al., (1982). J. Virol. 41 :868; Janik et al., (1981) Proc. Natl. Acad. Sci. USA 78: 1925; Carter et al., (1983) Virology 126:505. Other characterized Ad mutants include: E1B (Laughlin et al. (1982), supra; Janik et al. (1981), supra; Ostrove et al., (1980) Virology 104:502); E2A (Handa et al., (1975) J. Gen. Virol. 29:239; Strauss et al., (1976) J. Virol. 17: 140; Myers et al., (1980) J. Virol. 35:665; Jay et al., (1981) Proc. Natl. Acad. Sci. USA 78:2927; Myers et al., (1981) J. Biol. Chem. 256:567); E2B (Carter, Adeno-Associated Virus Helper Functions, in I CRC Handbook of Parvoviruses (P. Tijssen ed., 1990)); E3 (Carter et al. (1983), supra); and E4 (Carter et al. (1983), supra; Carter (1995)). Although studies of the accessory functions provided by adenoviruses having mutations in the E1B coding region have produced conflicting results, Samulski et al., (1988) J. Virol. 62:206-210, recently reported that ElB55k is required for AAV virion production, while ElB19k is not. In addition, International Publication WO 97/17458 and Matshushita et al., (1998) Gene Therapy 5:938-945, describe accessory function vectors encoding various Ad genes. Exemplary accessory function vectors comprise an adenovirus VA RNA coding region, an adenovirus E4 ORF6 coding region, an adenovirus E2A 72 kD coding region, an adenovirus E1A coding region, and an adenovirus E1B region lacking an intact ElB55k coding region. Such vectors are described in International Publication No. WO 01/83797.

[0093] Use of insect cells for expression of heterologous proteins is well documented, as are methods of introducing nucleic acids, such as vectors, e.g., insect-cell compatible vectors, into such cells and methods of maintaining such cells in culture. (See, e g., METHODS IN MOLECULAR BIOLOGY, ed. Richard, Humana Press, N J (1995); O'Reilly et al., BACULO VIRUS EXPRESSION VECTORS, A LABORATORY MANUAL, Oxford Univ. Press (1994); Samulski et al., J. Vir. (1989) vol. 63, pp.3822-3828; Kajigaya et al., Proc. Nat'l. Acad. Sci. USA (1991) vol. 88, pp. 4646-4650; Ruffing et al., J. Vir. (1992) vol. 66, pp. 6922- 6930; Kirnbauer et al., Vir. (1996) vol. 219, pp. 37-44; Zhao et al., Vir. (2000) vol. 272, pp. 382- 393; and U.S. Pat. No. 6,204,059). In some embodiments, the nucleic acid construct encoding AAV in insect cells is an insect cell-compatible vector. “Expression vector” refers to a vector including a recombinant polynucleotide including expression control sequences operatively linked to a nucleotide sequence to be expressed. An expression vector includes sufficient cis-acting elements for expression; other elements for expression can be supplied by the host cell or in vitro expression system. Expression vectors include all those known in the art, such as cosmids, plasmids (e g., naked or contained in liposomes), artificial chromosomes, and viruses that incorporate the recombinant polynucleotide. An "insect cell-compatible vector" or "vector" as used herein refers to a nucleic acid molecule capable of productive transformation or transfection of an insect or insect cell. Exemplary biological vectors include plasmids, linear nucleic acid molecules, and recombinant viruses. Any vector can be employed such that it is insect cell compatible. The vector may integrate into the insect cells genome but the presence of the vector in the insect cell need not be permanent and transient episomal vectors are also included. The vectors can be introduced by any means known, for example by chemical treatment of the cells, electroporation, or infection. In some embodiments, the vector is a baculovirus, a viral vector, or a plasmid. In other embodiments, the vector is a baculovirus, i.e., the construct is a baculoviral vector. Baculoviral vectors and methods for their use are described in the above cited references on molecular engineering of insect cells.

[0094] EXAMPLES

[0095] The invention being thus described, one skilled in the art would readily understand that the same may be varied in many ways. Such variations are not to be regarded as a departure from the spirit and scope of the invention, and all such modifications as would be obvious to one skilled in the art are intended to be included within the scope of the following claims.

[0096] The present invention is directed to a method of strand-specific genomic characterization of rAAV genomes from long read next generation sequencing data. This bioinformatics approach allows for improving the therapeutic efficiency of a drug product by eliminating pre-mature truncation of therapeutic genes due to their potential secondary structure and/or ITR-like palindromic structures. To this end, the bioinformatic approach allows design and production of therapeutically effective rAAV. The bioinformatics method of the invention is applied to AAV genomes that have been extended with a 3’ITR to form double stranded DNA. The double stranded DNA is analyzed to categorize the AAV and non-AAV genomes based on the presence of a 145 base pair (bp) ITR and a 20 bp D region of ITR in the inner loop of the extended AAV- derived DNA. The difference between the sequence length of the upper strand of the ITR and D region and the sequence length of the bottom strand of the ITR and D region can be 0 or between about -45 to 45.

[0097] This bioinformatics method also produces alignments from mapping the eight different configurations of the eight (8) different possible AAV reference genome sequences (i.e., flip/flip, flip/flop, flop/flip, and flop/flop) that allow for distinguishing plus-stranded AAV genomes from minus-stranded AAV genomes. This feature orientation-specific difference may be used to discriminate the plus- or minus-stranded form of packaged rAAV DNA. The flip/flop configurations of the index DNA sequence may be determined from the metrics of the score based on the presence of the appropriate flip/flop defining bases/base combinations in the respective 5’ and 3’ ITR sequence regions.

[0098] This method can produce a hard clipped DNA sequence that maps optimally to the respective reference AAV genome sequence among the 8 different configurations for further tertiary analyses and secondary structure predictions. [0099] Every AAV genome may be plotted in a single jitter/dot plot along with a density plot in either a strand specific manner or including both plus- or minus-stranded AAV genomes to empirically determine the major truncation hotspots and their respective reference genome coordinates or the sequence motif that is associated with truncation events.

[00100] The major truncation hot-spot-spanning reference sequence motif may be further investigated for the possible secondary structures at, for example, 28 degrees Celsius (°C) for insect cell culture condition, 37°C for mammalian cell culture condition, and 95°C for melting characteristics.

[00101] When there are potential truncation hot spot specific sequence motifs identified in the expression cassette, that particular motif could be removed, if possible, or could be codon optimized to minimize or eliminate the truncation altogether.

[00102] AAV DN A Preparation for Sequencing

[00103] The AAV genome was extracted by mixing AAV, PBS, Proteinase K, and buffer. The mixture was incubated at an elevated temperature (e.g., 15°C, 16°C, 17°C, 18°C, 19°C, 20°C, 21°C, 22°C, 23°C, 24°C, 25°C, 26°C, 27°C, 28°C, 29°C, 30°C, 31°C, 32°C, 33°C, 34°C, 35°C, 36°C, 37°C, 38°C, 39°C, 40°C, 41°C, 42°C, 43°C, 44°C, 45°C, 46°C, 47°C, 48°C, 49°C, or 50°C) for a pre-determined time (e.g., 30 minutes, 31 minutes, 32 minutes, 33 minutes, 34 minutes, 35 minutes, 36 minutes, 37 minutes, 38 minutes, 39 minutes, 40 minutes, 41 minutes, 42 minutes, 43 minutes, 44 minutes, 45 minutes, 46 minutes, 47 minutes, 48 minutes, 49 minutes, 50 minutes, 51 minutes, 52 minutes, 53 minutes, 54 minutes, 55 minutes, 56 minutes, 57 minutes, 58 minutes, 59 minutes, 60 minutes, 61 minutes, 62 minutes, 63 minutes, 64 minutes, 65 minutes, 66 minutes, 67 minutes, 68 minutes, 69 minutes, 70 minutes, 71 minutes, 72 minutes, 73 minutes, 74 minutes, 75 minutes, 76 minutes, 77 minutes, 78 minutes, 79 minutes, 80 minutes, 81 minutes, 82 minutes, 83 minutes, 84 minutes, 85 minutes, 86 minutes, 87 minutes, 88 minutes, 89 minutes, or 90 minutes), and then 100% ETOH was mixed. The sample mixture was applied to a DNA isolation column and centrifuged a predetermined time. Then, a first wash buffer was added to the column and centrifuged for a predetermined time. Subsequently, a second wash buffer was added to the column and centrifuged for a predetermined time. Next, an elution buffer was added and centrifuged at for a predetermined time. The sample was quantitatively analysed using a quantitation platform. [00104] Next, the extracted AAV genome was purified and incubated in a thermal cycler at an elevated temperature for a predetermined time and subsequently cooled. Then, a volume of paramagnetic beads was added to the sample for removing contaminates. The bead/DNA solution was mixed, spun, and the supernatant containing DNA was collected. The beads were washed with 70% ethanol.

[00105] Next, the DNA was quantified using a fluorometric quantification device. A solution of the DNA and buffer for quantification was prepared. Standards were prepared. The DNA concentration was measured using a fluorometer.

[00106] 3 ’ ITR Extension Second Strand Synthesis

[00107] Next, the second strand of the isolated AAV genomes were synthesized. The AAV genomes were combined with a DNA polymerase (e.g., Taq DNA polymerase) and dideoxynucleotides (ddNPTs) including ddGTP, ddATP, ddTTP and ddCTP. In this process, the end of the stem arm of the 3’ ITR is used as a “self-primer” to fill in the complementary sequence of the AAV genome and eventually denature the hairpin structure of the 5’ ITR as the DNA polymerase continues to generate the complementary nucleotide sequence to the 5’ ITR. The AAV genomes are processed using a polymerase chain reaction (PCR) in a thermocycler where the AAV genomes are heated to a predetermined temperature (e.g., 80°C, 81°C, 82°C, 83°C, 84°C, 85°C, 86°C, 87°C, 88°C, 89°C, 90°C, 91°C, 92°C, 93°C, 94°C, 95°C, 96°C, 97°C, 98°C, 99°C, 100°C )for a predetermined time (e.g., 1 minute, 2 minutes, 3 minutes, 4 minutes, 5 minutes, 6 minutes, 7 minutes, 8 minutes, 9 minutes, 10 minutes, 11 minutes, 12 minutes, 13 minutes, 14 minutes, or 15 minutes), subsequently cooled to a predetermined temperature (e.g., 60°C, 61°C, 62°C, 63°C, 64°C, 65°C, 66°C, 67°C, 68°C, 69°C, 70°C, 71°C, 72°C, 73°C, 74°C, 75°C, 76°C, 77°C, 78°C, 79°C, 80°C) for a predetermined time (e.g., 5 minutes, 6 minutes, 7 minutes, 8 minutes, 9 minutes, 10 minutes, 11 minutes, 12 minutes, 13 minutes, 14 minutes, 15 minutes, 16 minutes, 17 minutes, 18 minutes, 19 minutes, 20 minutes, 21 minutes, 22 minutes, 23 minutes, 24 minutes, 25 minutes, 26 minutes, 27 minutes, 28 minutes, 29 minutes, or 30 minutes) such that the DNA polymerase synthesizes a second complementary strand on the AAV genomes, and cooled to a predetermined temperature (e g., 5°C, 6°C, 7°C, 8°C, 9°C, 10°C, 11°C, 12°C, 13°C, 14°C, 15°C, 16°C, 17°C, 18°C, 19°C, or 20°C) for a predetermined time (e.g., 5 minutes, 10 minutes, 30 minutes, 60 minutes, 120 minutes, or indefinitely). The AAV genomes are subsequently purified, where a volume of paramagnetic beads was added to the sample to remove contaminates. The bead/DNA solution was mixed, and the beads were immobilized using a magnet. The supernatant was discarded. The beads were then washed with 70% ethanol twice and allowed to dry. The dry beads were then resuspended in an appropriate amount of elution buffer and mixed. The beads were immobilized on a magnet and the supernatant containing DNA was collected. The DNA was quantified using a fluorometric quantification device. A solution of the DNA and buffer for quantification was prepared. Standards were prepared. The DNA concentration was measured using a fluorometer.

[00108] Next, adapter nucleotide sequences were ligated to the AAV genomes. In this step, the AAV genomes are combined with the adapter nucleotide sequences and ligase. The samples are then incubated at a predetermined temperature (e.g., 10°C, 11°C, 12°C, 13°C, 14°C, 15°C, 16°C, 17°C, 18°C, 19°C, 20°C, 21°C, 22°C, 23°C, 24°C, 25°C, 26°C, 27°C, 28°C, 29°C, 30°C, 31°C, 32°C, 33°C, 34°C, 35°C, 36°C, 37°C, 38°C, 39°C, or 40°C) for a predetermined time (e.g., 5 minutes, 6 minutes, 7 minutes, 8 minutes, 9 minutes, 10 minutes, 11 minutes, 12 minutes, 13 minutes, 14 minutes, 15 minutes, 16 minutes, 17 minutes, 18 minutes, 19 minutes, 20 minutes, 21 minutes, 22 minutes, 23 minutes, 24 minutes, 25 minutes, 26 minutes, 27 minutes, 28 minutes, 29 minutes, or 30 minutes) in order for ligase activity to occur and ligate the adapters to the AAV genomes. After the ligation, the AAV genomes are subsequently purified, where a volume of paramagnetic beads was added to the sample to remove contaminates. The bead/DNA solution was mixed, spun and the supernatant containing DNA was collected. The beads were washed with 70% ethanol. The DNA was quantified using a fluorometric quantification device. A solution of the DNA and buffer for quantification was prepared. Standards were prepared. The DNA concentration was measured using a fluorometer.

[00109] In various embodiments, the temperatures and time are ranges between any two temperatures or times provided above.

[00110] The samples were then analyzed using a sequencer.

[00111] Bioinformatics Workflow

[00112] As shown in figure 1, the general bioinformatics workflow 10 includes the steps of obtaining a sequence read 100, mapping the sequence read to a reference 200, classifying the AAV and non-AAV DNA molecules 300, classifying the sequence read as being plus or minus- stranded 400, and determining whether the plus or minus-stranded genomes have a flip or flop configurations 500.

[00113] In step 100 of figure 1, multiple reads for each DNA fragment were collapsed into a sequence read for each DNA fragment profiled. The number of passes required for each sequence read generation was set to >=3 on the sequencer. Once the sequence read was generated, a bioinformatic workflow was performed using python and R programming in High Performance Cluster (HPC) environment.

[00114] The sequencer generated reads in multiple copies, called as subreads, for each DNA fragment which were collapsed into a sequence read for each DNA fragment profiled.

[00115] In step 200 of figure 1, it is noted that there are eight different references with plus/minus and flip/flop configurations. The bioinformatic workflow of the present invention differs from the others in that this workflow uses the double stranded, 3’ITR extended rAAV genome sequence compared to single-stranded rAAV genomic sequence. Use of this double stranded plus/minus stranded rAAV sequence information provides strand information of the packaged rAAV DNA that contain self-priming 3’ITR.

[00116] To map the sample genomes in step 300 of figure 1, reference sequences are generated for plus and minus stranded rAAV genomes. Since the rAAVs can package with either plus or minus stranded genomes, plus stranded genomes after the 3 ’end ITR extension will differ from the minus stranded genomes in the orientation of the features in the 3’ extended double stranded rAAV genomes, figure 2A shows the plus and minus-stranded rAAV genomes after the second strand synthesis and figure 2B depicts a 2x reference sequence length without the 3’ ITR extension. As shown in figures 2A and 2B, the plus stranded genome with (5’) Promoter > cDNA of interest > polyA (3’) will become (3’) Promoter > cDNA of interest > polyA > polyA’ >cDNA of interest’ > Promoter’ (3’) upon 3’ ITR extension, whereas the minus stranded genome with (5’) polyA-> cDNA of interest <- Promoter (3’) will become (5’) polyA-> cDNA of interest -> Promoter -> cDNA of interest -> polyA (3’). Thus, this is an orientation-specific workflow.

[00117] This feature of orientation-specific difference was used to discriminate the plus or minus stranded form of the packaged rAAV DNA in step 400 of figure 1. On mapping the sequence reads to each of the plus and minus stranded genomes, the reads that mapped to one of the stranded references with highest alignment score was assigned with that stranded reference’s configuration (i.e., either plus or minus). When the read was mapped to the opposite strand, its alignment was split into two halves due to the feature-orientation difference, and the alignment score was nearly half compared to the reads correctly mapped to the respective stranded reference. This is demonstrated graphically in figure 3, which depicts generation of a sequence read to map against the plasmid sequence. The positive and negative stranded rAAV genomes differ in the orientation of “Promoter - polyA -polyA-Promoter” versus “polyA-Promoter-Promoter- polyA” orientations for a given rAAV vector genome construct.

[00118] In step 500 of figure 1, the reference sequences were generated for the flip/flop configurations. Each stranded reference genome can exist in one of flip or flop at the 5’ITR, as well as at the 3’ITR. Therefore, a full length positive stranded vector genome will exist in one of flip/flip (5’ ITR/3TTR), flip/flop, flop/flip, and flop/flop. The negative stranded vector genome also can exist in one of the four flip/flop configurations. Thus, for a given full length vector genome with self-priming 3’ITR, one of the following eight configurations will occur:

1. Plus Stranded: flip/flip

2. Plus Stranded: flip/flop

3. Plus Stranded: flop/flip

4. Plus Stranded: flop/flop

5. Minus Stranded: flip/flip

6. Minus Stranded: flip/flop

7. Minus Stranded: flop/flip

8. Minus Stranded: flop/flop

[00119] The reads were mapped to each of the above eight different reference sequences and assigned either a plus or minus stranded AAV genome based on the highest alignment score after mapping to each of the eight reference sequences. The following parameters were used for calculating the alignment score:

1. Matching Score: 2

2. Mismatch penalty: 4

3. Gap open penalty: 4

4. Gap extension penalty: 2

[00120] The flip/flop configurations were identified from the mapped ITR sequence information of the respective plus/minus mapped read. [00121] Figure 7 highlights the differences in the sequence alignments of the 5’ and 3’ ITRs in the flip and flop configurations. Flip/flop configurations were determined directly from the mapped ITR sequence information of the respective plus/minus mapped read at the flip/flop defining base positions. The hyphens between bases indicate the base mismatch and these mismatched base positions are the flip/flop defining regions in the bioinformatic analysis of various embodiments. A mismatched base is given a penalty of 0, while a matching base is given a score of 1. The score having a higher numerical value is assigned to the read of interest.

[00122] Upon mapping each sequence reads to each of above eight different reference genomes, the read aligned with highest alignment score to one of the negative reference genomes was assigned with its strand information as either positive or negative. For the flip/flop assignment, the 145 bp ITR sequence was sliced from each of the reads, and flip/flop configuration was assigned based on the mapping scores at the flip/flop defining sequence positions as shown in the figures 4A and 4B, which shows the plus stranded and minus stranded reference sequences for a given construct.

[00123] Next, the mapped sequence reads were categorized as AAV-derived or residual DNA. Residual DNA is non-AAV-derived. If there is a double stranded rAAV DNA packaged in rAAV, it falls in this residual DNA category. The categorization was based on the following two criteria, which are also shown graphically in figures 5A and 5B) the presence of the 145 bp ITR + 20 bp D region of ITR in the inner loop (i.e., a total of 165 bp signature sequence) of the extended AAV-derived DNA; and 2) the delta between the sequence length of the upper strand preceding the ITR+D region and the sequence length of the bottom strand succeeding the ITR+D region should be preferably 0, or at least between -45 and 45. Figures 5A and 5B show the difference in alignments of reference sequences to plus and minus-stranded rAAV vector genomes. When the read is mapped to the wrong- stranded reference genome, its alignment will be split into 2 halves due to the feature-orientation difference and its alignment score will be nearly half compared to the reads correctly mapped to the respective stranded reference. In this example, the plus-stranded read aligned to plus-stranded genome reference gives optimal alignment, whereas the plus-stranded read aligned to the minus-stranded genome reference gives 2 halves of alignment metrics. This way, the plus and minus stranded information of a given AAV DNA can be verified. [00124] Next, the positive and negative stranded rAAV genomes and flip/flop configurations were determined. Upon mapping each sequence reads to each of the eight different configurations of the vector genome reference, the read aligned to one of the eight reference sequences was given the strand information as either positive or negative stranded. The flip/flop configuration of the 5’ITR/3’ITR were determined based on the presence of the appropriate flip/flop defining bases/base combinations in the respective 5’ and 3’ ITR sequence regions, after slicing the 145 bp ITR regions on either end of the reads, if they have the intact ITR sequences. As shown in figures 6A and 6B, there is a difference when the plus-stranded read is aligned to the plus- stranded genome reference (i.e., 9275) versus when the plus-stranded read aligned to the minus- stranded genome reference gives 2 halves of alignment metrics (i.e., 4700 fragment and 4699 fragment halves).

[00125] From the bioinformatics analysis 10, a profile of a sample containing rAAV particles can be generated as the one shown in figures 8A, 8B, and 8C. Figures 8A, 8B, and 8C depict an exemplary analysis of an rAAV production using the bioinformatic analysis of various embodiments. Figure 8A shows the metrics of AAV DNA and residual DNA including the percentage of AAV DNA and percentage of total non- AAV DNA. Figure 8B shows exemplary metrics of the plus/minus and flip/flop configuration of AAV genomes. Figure 8C shows exemplary descriptive statistical analysis of AAV genomes by strand including the mean and standard deviation. Thus, metrics such as AAV polynucleotide sequence concentrations, residual polynucleotide sequence (e g., from host cell genome or vector for rAAV production not intended to be encapsulated within produced AAV) concentrations, plus strand concentrations, minus strand concentrations, concentrations of the AAV genomes with ITR sequences, concentrations of the AAV genomes without ITR sequences, or genome lengths of the AAV genomes.

[00126] Based on this analysis, the identified metrics of the AAV genomes can be used to optimize the design of the rAAV production. For example, one can modify the rAAV production process to alter a metric identified from a previous bioinformatic analysis of AAV genomes. Such modification(s) can include modifications in the production, purification, or formulation processes. In production, one or more vectors for rAAV production in a host cell can be modified. Example vectors include vectors that provide for the generation of the AAV vector genomes, AAV helper functions, or AAV non-helper functions. Such modifications can include altering, deleting, or replacing polynucleotide sequences or different elements such as regulatory elements. To this end, the modifications serve to alter one or more of the metrics identified from a prior bioinformatics analysis and can aid in optimizing the design of an rAAV or production of the rAAV.

[00127] Next, the truncation hotspots were identified. After plotting every AAV genome length or reference start coordinate in a single jitter/dot plot along with density plot as shown in figure 9, the major truncation hotspots and their respective reference genome coordinates/the sequence motif that were associated with truncation events, if there are any, were identified in a strand specific manner. Figure 9 shows exemplary truncation hotspots in an rAAV vg. Each AAV genome is plotted in a single dot plot with the density plot in a strand-specific manner, or including both the plus or minus-stranded AAV genomes to determine the major truncation hotspots and the respective genome coordinates/sequence motif associated with the truncation event(s).

[00128] In addition, the impact of the truncation hotspot(s) identified were investigated by in- silico validation with orthogonal methods. The major truncation hotspot identified with this bio- informatic workflow were validated with several orthogonal methods both in-silico as well as by molecular biology methods. The major truncation hot-spot-spanning reference sequence motif was further investigated for the possible secondary structures at 28°C (insect cell culture conditions) and 37°C (mammalian culture conditions) to understand the production platform dependent impacts on the rAAV genome packaging. The in silico melting characteristics of the truncation hotspot sequence at 95°C of truncation hotspot specific sequence shown to determine the extent of complicated secondary structures in terms of percentage of GC content, melting temperature kinetics and ITR-like characteristics, such as higher melting temperature, hairpin loop/palindromic structures and high GC content to further validate the molecular basis of the truncation mechanism by the identified truncation hotspot specific sequences. For example, figures 10A, 10B, 10C, 10D, 10E, and 10F depict different in silico secondary structure predictions of sequences that have been truncated from rAAV vg and were identified using the bioinformatic analysis of various embodiments. In the event that there are potential occurrence of truncation hotspot specific sequence motifs in the expression cassette, that particular motif could be eliminated if possible or could be codon optimized to minimize or eliminate the truncation phenomenon. Based on the analysis, a vector used for rAAV production and having a nucleotide molecule from which the AAV genome are generated in a host cell can be modified to remove the nucleotides corresponding to deletions or deleted polynucleotides sequences. For example, the nucleotide removal may include replacing the nucleotides with other nucleotides such that there are no deletions or deleted polynucleotides sequences in the produced AAV genomes. For example, the nucleotide sequences corresponding to the secondary structures shown in figures 10A, 10B, 10C, 10D, 10E, and 10F could be altered in the corresponding sequences from a nucleotide acid molecule that generates a AAV genome in a host cell such that the secondary structures do not form, or the size of the secondary structures are reduced such that there are no deletions or truncations in the produced AAV genomes.

[00129] While exemplary embodiments are described above, it is not intended that these embodiments describe all possible forms of the invention. Rather, the words used in the specification are words of description rather than limitation, and it is understood that various changes may be made without departing from the spirit and scope of the invention. Additionally, the features of various implementing embodiments may be combined to form further embodiments of the invention.

Claims

CLAIMS:

1. A method of preparing a genome from an adeno associated virus (AAV) particle for sequencing, the method comprising the steps of: isolating a genome from an AAV particle, the genome comprising an inverted terminal repeat (ITR) that is formed into a step-loop structure with complementary 3’ and 5’ stem arms and a first polynucleotide that is single stranded and linked to a 5’ end of the 5’ stem arm; incubating a deoxyribonucleic acid (DNA) polymerase with the isolated genome to synthesize a second polynucleotide extending from a 3’ end of the 3’ stem arm, where the second polynucleotide is complementary to the first polynucleotide and forms a double stranded polynucleotide with the first polynucleotide; and linking an adapter nucleotide sequence to a 5’ end of the first polynucleotide and a 3’ end of the second polynucleotide.

2. The method of claim 1, wherein the first polynucleotide further comprises a second ITR formed into a step-loop structure with complementary 3’ and 5’ stem arms, the incubating step further comprises the DNA polymerase denaturing the stem-loop structure of the second ITR, and the second polynucleotide further comprises a sequence complementary to the sequence of the second ITR.

3. A method of analyzing AAV genomes, the method comprising the steps of: aligning reference nucleotide sequences to sequence reads of AAV genomes, each reference nucleotide sequence comprising predetermined sequences of 5’ and 3’ ITRs; determining alignment scores of each reference nucleotide sequence to the sequence reads and identifying strand types of the AAV genomes based on the alignment scores, where the strand types are selected from at least one of a positive nucleotide strand, negative nucleotide strand, and polynucleotide contaminate; determining first and second coordinates within the sequence reads based on alignments with the reference nucleotide sequences used to identify strand type, where the first coordinate identifies the 5’ end of the AAV genomes and the second coordinate identifies the 3’ end of the AAV genomes; and analyzing sequences adjacent to the first and second coordinates to identify ITR configurations of the AAV genomes.

4. The method of claim 3, wherein the difference between a sequence length of the upper strand preceding the ITR region and the sequence length of the bottom strand succeeding the ITR region is between -45 and 45.

5. The method of claim 3 or 4, wherein the ITR configurations for the positive nucleotide strand (5’ ITR/3TTR) are flip/flip, flip/flop, flop/flip, and flop/flop.

6. The method of claim 3 or 4, wherein the ITR configurations for the negative nucleotide strand (5’ ITR/3TTR) are flip/flip, flip/flop, flop/flip, and flop/flop.

7. The method of any one of claims 3-6, wherein the alignment scores of each reference nucleotide sequence to the sequence reads range from 0 to a positive integer value, wherein a match between the reference nucleotide sequence and the sequence read has a score of 1.

8. The method of any one of claims 3-7, wherein the AAV genomes comprise a double stranded 3TTR extended recombinant AAV genome.

9. The method of any one of claims 3-8 further comprising the step of identifying one or more metrics of the AAV genomes from the analysis step.

10. The method of claim 9, wherein the one or more metrics comprises one or more of an AAV polynucleotide sequence concentration, a residual polynucleotide sequence concentration, a plus strand concentration, a minus strand concentration, concentration of the AAV genomes with ITR sequences, a concentration of the AAV genomes without ITR sequences, or lengths of the AAV genomes.

11. The method of claim 9 or 10 further comprising the step of modifying production of recombinant AAV, where the modifying alters the one or more metrics.

12. The method of any one of claims 9-11 further comprising the step of modifying one or more vectors for rAAV production in a host cell, where the modifying alters the one or more metrics.

13. The method of any one of claims 9-12 further comprising the step of modifying one or more nucleic acid molecules from which the AAV genomes are generated, where the modifying alters the one or more metrics.

14. The method of claim 3 further comprising the step of identifying a deletion in the AAV genomes.

15. The method of any one of claims 9-13 further comprising the steps of determining a nucleotide sequence of the deletion and modelling secondary structure formation of the deleted nucleotide sequences during an environmental parameter.

16. The method of any one of claims 9-13 and 15, wherein the environmental parameters is one or more of Rep protein activity, polymerase activity, Rep polymerase activity, temperature, adenosine triphosphate concentration, cell culture medium composition, salt concentration of a cell culture medium, and a cell line parameter.

17. The method of any one of claims 9-13 and 15-16 further comprising the step of modifying one or more nucleic acid molecules from which the AAV genomes are generated to remove one or more nucleotides corresponding to the deletion.