WO2023172923A2 - Systèmes et méthodes se rapportant à la bioinformatique - Google Patents
Systèmes et méthodes se rapportant à la bioinformatique Download PDFInfo
- Publication number
- WO2023172923A2 WO2023172923A2 PCT/US2023/063877 US2023063877W WO2023172923A2 WO 2023172923 A2 WO2023172923 A2 WO 2023172923A2 US 2023063877 W US2023063877 W US 2023063877W WO 2023172923 A2 WO2023172923 A2 WO 2023172923A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- instances
- user
- systems
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
Definitions
- High-throughput genetic sequencing methods have provided large data sets of genetic information (such as single cell whole-genome sequencing data) for applications such as genetic engineering, agriculture optimization, and diagnosis or treatment of human disease.
- existing methods frequently suffer from slow processing or visualization speeds which may hinder analysis.
- systems for visualizing biological data comprising: a device comprising at least one processor and instructions executable by the at least one processor to provide a first application configured to perform operations comprising: i. accessing one or more datasets comprising biological data; ii. generating a visual representation of the one or more datasets, wherein the data is generated rapidly enough for live user interpretation. Further provided herein are systems wherein the visual representation is generated in no more than a minute. Further provided herein are systems wherein the first application comprises one or more modules. Further provided herein are systems wherein the one or more modules comprise one or more of plot, tree, cells, counting, or sequencing metrics modules. Further provided herein are systems wherein the sequencing metrics module comprises quality metrics.
- the biological data comprises one or more of molecular data.
- the molecular data comprises one or more of genomic data, proteomics data, transcriptomics data, and methylomics data.
- the molecular data comprises two or more of genomic data, proteomics data, transcriptomics data, and methylomics data.
- the genomic data comprises one or more genomic variants.
- the genomic variants comprise one or more of somatic, germline, SNP, or indels.
- one or more datasets corresponds to a project.
- a project comprises a virtual collection of samples
- the system comprises at least 10 datasets.
- the system comprises at least 1000 datasets.
- the system comprises at least 100,000 datasets.
- the one or more datasets are each representative of a single cell.
- the one or more datasets represent at least 30 cells.
- the one or more datasets represent at least 100 cells.
- the one or more datasets represent at least 1000 cells Further provided herein are systems wherein the one or more datasets represent at least a 100,000 cells Further provided herein are systems wherein the one or more datasets comprise at least 1 biomarker. Further provided herein are systems wherein the one or more datasets comprise at least 100 biomarkers. Further provided herein are systems wherein the system comprises no more than four computer processors. Further provided herein are systems wherein the system comprises at least four processors. Further provided herein are systems wherein the system comprises at least 16 processors.
- system comprises a platform for data security comprising an access control for one or more users; a security framework; and molecular data from an individual or biological species
- back end module is configured to perform user authentication and registration.
- access to genomic data is limited to authorized users who are a part of authorized organizations.
- system is accessible by at least 2 users concurrently.
- system can be accessible by at least 20 users concurrently.
- system can be accessible by at least 200 users concurrently.
- security framework comprises an NIST cybersecurity framework.
- systems wherein the platform complies with HIPAA standards relating to the individual. Further provided herein are systems wherein the system is configured for a user to define one or more analytical workflows. Further provided herein are systems wherein the one or more analytical workflows operate on molecular data defined in a project. Further provided herein are systems wherein the one or more analytical workflows are executed by one or more pipeline modules. Further provided herein are systems wherein the pipeline module is configured to perform complex genomic analytical transformations. The system of claims 37, wherein the pipeline module is configured to run genomics analysis tools to extract biomarkers from one or more sequencing data. Further provided herein are systems wherein the system is configured for a user to queue and/or order pipelines to be run on a dataset.
- systems wherein the system is configured for a user to queue and/or order pipelines to be run on a dataset that does not require manual intervention.
- the system comprises a pipeline module configured to identify biomarkers present in the genome.
- biomarkers comprise one or more of single nucleotide variants, insertions and deletions.
- variants across every sample of a project are aggregated to a single table.
- the aggregation comprises joint genotyping.
- the pipeline module is configured to extract annotations and variants from an input file into a database.
- the database comprises COSMIC, ClinVar, Ensembl, or other genomic annotation databases.
- the pipeline module is configured to identify structural biomarkers present in a genome.
- structural biomarkers comprise variations in chromosomal copy number.
- structural biomarkers comprises one or more of transversions, inversions, tandem repeats, translocation, and element insertion.
- the pipeline module is configured to identify genomic arrangement information specific to classes of genes.
- the genomic arrangement information comprises one or more of major histocompatibility complexes, microsatellite instability, tumor mutational burden and immune repertoire.
- transcriptome analysis tools are configured to extinct counts of isoforms from one or more sequencing files. Further provided herein are systems wherein transcriptome analysis tools are configured to identify fusion events from one or more sequencing files Further provided herein are systems wherein transcriptome analysis tools are configured to identify nucleotide variation in expressed genes from one or more sequencing files. Further provided herein are systems wherein the pipeline module is configured to run methylation analysis tools from one or more sequencing files. Further provided herein are systems wherein the methylation analysis tools are configured to extract methylation status from one or more sequencing files.
- methylation status comprises one or more of regions of methylated cytosine, such as mC, hmC and hhmC from one or more sequencing files.
- the pipeline module is configured to run proteomics analysis tools from one or more sequencing files.
- the proteomics analysis tools provide proteomic data.
- the proteomic data comprises sequence counts of identified native conformationally altered proteins.
- systems comprising a platform for visualizing molecular data comprising: a device comprising at least one processor and instructions executable by the at least one processor, wherein the instructions comprise: a front end module comprising a graphical user interface; a security layer configured to reduce unauthorized access; a back end module; and one or more pipeline modules configured to generate a visual representation.
- the visual representation comprises at least one plot and genomics scale.
- the visual representation displays one or more biomarkers.
- the one or more biomarkers comprises specific molecular states associated with biological states of a sample.
- the system is configured for a user to select one or more applications.
- the one or more applications are configured for visually representing specific types of molecular data.
- the molecular data comprises one or more of genome, transcriptome, methylome and proteome.
- the graphical user interface is configured for a user to select one or more samples processed through a pipeline on the platform to be represented in a visualization application.
- the back end module is configured to encode data for one or more visualizations.
- visualization modules comprises a query engine.
- the query engine comprises one or more of Vaex and AG Grid.
- the back end module is configured to perform one or more calculations or filtering commands.
- a visualization application receives (as input) an output generated from one or more pipeline modules performed on the system.
- a visualization application is configured for a user to view biomarkers.
- the biomarkers are identified through the platform pipelines against one or more of an annotation and genomic reference tables.
- a visualization application comprises of one or more modules which represent one or more characteristics of a biological data type.
- user is able to directly manipulate visual representations in a visualization application.
- the visual representation comprises one or more of feature selection, feature filtering, importing gene lists, and external biomarker results.
- configurations or manipulations of a visual application are exported or imported.
- configurations or manipulations control a layout of other visualization applications within the platform.
- chromosomal coordinates are leveraged to highlight various information across multiple visualization applications.
- biological groupings, label or group samples within a visualization module within an application Further provided herein are systems wherein biological groupings comprises cellular phenotypes.
- systems wherein a user is able to view biomarker information within genomic context, leveraging one or more genomic loci, along with associated annotation.
- the system depicts one or more modules within a visualization application for transcription.
- a visualization application depicts one or more of heatmaps, principal components analysis, differential expression, and alternative transcriptional mapping.
- a visualization application is configured for a user to view changes in copy number or structural changes within genome scale.
- a visualization application is configured for a user to view annotated information of changes in copy number or structural changes within the genome scale.
- systems wherein the system is configured for syncopation of molecular data.
- syncopation of molecular data comprises managing and querying for one or more analysis tools.
- molecular data comprises data objects derived from the same cell.
- syncopated molecular information is visualized together on the same application within the platform.
- syncopated molecular information is visualized together using one or more of a Circos plots, genome viewers, and graph networks.
- the system is configured to associate molecular information with biological groupings.
- the system is configured to associate molecular information with biological groupings using one or more statistical methods.
- one or more statistical comprises differential expression (transcriptomic/proteomic), genome-wide association studies (genomic) and quantitative trait loci calculations using molecular information.
- the system is configured to export information in the system contained in one or more objects.
- the one or more objects comprise files, images, pipeline and visualization configurations, biomarker lists or projects.
- information is exported as one more files within a project through a file explorer embedded directly on the platform.
- configurations of visualization applications is exported.
- configurations of visualization applications is exported in a format readable by the platform’s visualization applications.
- systems wherein the system is configured for export of images of visual modules or applications which preserve manipulations performed by the user. Further provided herein are systems wherein the system is configured for specific output directories. Further provided herein are systems wherein a user has rights and access to place information in the output directory. Further provided herein are systems wherein the system is configured for export of a project or pipeline to specified output locations. Provided herein are methods for performing the functions of a system described herein.
- FIG. 1A depicts a workflow schematic of a system for input, processing, and visualization genomic data.
- FIG. IB depicts an architectural schematic of a system for processing and visualization genomic data.
- FIG. 1C depicts a non-limiting example of a user interface for viewing project summaries. Headers (left to right) include Project [name], Project Size, Number of Cells, Analysis Status, Pipeline(s) performed, BioSkryb Product Lot ID, Initiating User, Date. The horizontal Ellipsis is used as a menu button for several project-related activities.
- FIG. 2A depicts a browser for visualizing genomic data with a full set of variants in a selected region of a genome.
- FIG. 2B depicts a browser for visualizing the genomic variation information that can be pulled from multiple visualization applications that report back genomic coordinates.
- FIG. 2C depicts an interface exploring selected variants, including annotations. From top to bottom: Features (gene name, gene Id, gene type, strand, Tdl, and Hgnc Id); Predictions (Left: SIFT, FATHMM, PROVEAN, MetaSVM, MetaLR), (Middle: Phylo 100-way Vertebrate) (Right: Phylo 30-way mammal); Evidence (Cosmic Genomic ID, Primary site, primary histology); and Population.
- Features gene name, gene Id, gene type, strand, Tdl, and Hgnc Id
- Predictions Left: SIFT, FATHMM, PROVEAN, MetaSVM, MetaLR
- Right Phylo 30-way mammal
- Evidence Cosmic Genomic ID, Primary site, primary histology
- Population Population.
- FIG. 2D depicts a browser for visualizing genomic data with a full set of variants in a selected region of a genome.
- FIG. 2E depicts a browser for visualizing the genomic variation information that can be pulled from multiple visualization applications that report back genomic coordinates.
- FIG. 3A depicts an interface for visualization of somatic mutations in the context of lineage (“tree” module).
- the interface provides rapid switching from plots, tree, cells, quality metrics, and more metrics visualization modules.
- FIG. 3B depicts an interface for visualization of filtered somatic mutations (“tree” module) that occurred ancestral to one or more other data sets.
- FIG. 4A depicts an interface for visualization of a cell population (“cells” module), including a 2 dimensional plot of cell similarity (left) and cell statistics for each individual cell (right).
- Statistic fields include name, number of variations by type (somatic, germline, SNPs, indels, high, medium, and low confidence). Statistics for 10 cells are shown for example only; any number of cells may be visualized.
- FIG. 4B depicts an interface for visualization of sequencing quality control metrics (“quality metrics” module). Smaller plots shown (left to right) include chromosome M population, percent pass/fail reads aligned, WGS mean coverage, WGS PCT EXC DUPE; bottom graph: WGS coverage by sample.
- FIG 5A depicts a circos plot depicting variations in human genomic data. 6,416,550 somatic, 7,398,087 germline, 13,814,637 SNP, and 0 indel variations are shown.
- FIG. 5B depicts a circos plot with a region of chromosome 3 enlarged in the inset.
- FIG. 6 depicts a plot of existing and Vaex methods for generating circus plots.
- the y-axis is labeled from 0 to 12,000 milliseconds at 1,000 millisecond intervals.
- FIG. 7 depicts an interface showing a circos plot generated to simultaneously visualize and compare genomic data from multiple cells (groups 1 and group 2). Filters may be applied such as threshold number of variants, viewing of specific chromosomes or cells, as well as filters regarding variant type (SNP, Indel, Germline, Somatic, Cosmic, Coding change, Clinvar) and features (gene, gene type, prediction).
- Filters may be applied such as threshold number of variants, viewing of specific chromosomes or cells, as well as filters regarding variant type (SNP, Indel, Germline, Somatic, Cosmic, Coding change, Clinvar) and features (gene, gene type, prediction).
- FIG. 8 depicts an expandable tree structure which enables the visualization of variable parameters (i.e. CNV, SNV, express gene, methylation site) that is compatible with filters included in the bioinformatic analysis system.
- FIG. 9 depicts a plot of run times in minutes vs. number of cells filtered (without annotations). 31 cells of WGS were processed with 4 virtual CPUs. The y-axis is labeled runtime from 0 to 3000 milliseconds. The x-axis is labeled number of cells from -5 to 35 cells.
- FIG. 10 depicts a plot of run times in minutes vs. number of cells filtered (with annotation). 31 cells of WGS were processed with 4 virtual CPUs.
- All single-cell WGS data approximately 108 million rows of data, was rendered and visualized in approximately 1.2 seconds, allowing scale up to much greater numbers of cells using parallel processes.
- the y-axis is labeled runtime from 0 to 3000 milliseconds.
- the x-axis is labeled number of cells from -5 to 35 cells.
- FIG. 11 depicts a non-limiting example of a computing device; in this case, a device with one or more processors, memory, storage, and a network interface.
- FIG. 12 depicts a non-limiting example of a web/mobile application provision system; in this case, a system providing browser-based and/or native mobile user interfaces.
- FIG. 13 depicts a non-limiting example of a cloud-based web/mobile application provision system; in this case, a system comprising an elastically load balanced, auto-scaling web server and application server resources as well synchronously replicated databases.
- FIGS. 14A-14B shows the initial user registration forms any new user will need to complete prior to gaining access to the system.
- FIG. 14A depicts a sign in or create account form (create account tab shown).
- FIG. 14B depicts a terms and conditions form.
- FIGS. 15A-15C shows the administration console, which enables the platform administrator to onboard new users, organizations and workspaces.
- Company-specific details can be supplied as benefits proper tracking needed by the company.
- Data ingress and egress details can be provided for a workspaces to ensure proper security and data transfer information.
- User access within and across groups is also able to be set through the console.
- the organizations tab is shown in FIG. 15A.
- the edit workspace interface is shown in FIG. 15B.
- the change user group interface is shown in FIG. 15C.
- FIGS. 16A-16D depicts how a user is able to create projects and select samples to include from which shared access point provided through the administration console.
- FIG. 16A depicts an interface for starting new project types, such as genomic (“Resolve DNA”) or multiomic (ResolveOME) projects.
- FIG. 16B depicts an interface for creating a new project including project name, product lot ID, genome, sequencing library preparation, and BSSH/shared data options.
- FIG. 16C depicts an interface for adding samples to a new project, including a search function for basespace sequence hub project name.
- FIG. 16D depicts another example of an interface for adding samples to a new project, including a biosample name search query tool.
- FIG. 17A-17D depict how pipelines are customized on samples within a project and how to select configurations that a user determines.
- FIG. 17A depicts an interface prompting a user to launch a pipeline immediately after FASTQ files are downloaded.
- FIG. 17B depicts a menu of pipelines for user selection.
- FIG. 17C depicts pipeline parameters and modules for user selection.
- FIG. 17D depicts an interface displaying pipeline parameters.
- FIGS. 18A-18C depict how a user selects secondary or tertiary analyses within a project for which they have administrative privileges. Details and tracker for the steps needed to launch a tertiary analysis are provided.
- FIG. 18A depicts an interface of samples (biosamples) with FASTQ validation status, size, total number of reads, read length, upload date, and lot ID.
- FIG. 18B depicts an interface for selection of secondary pipelines including name, version, number of biosamples, status, size, duration, initiator, and launch date.
- FIG. 18C depicts an interface for tertiary pipelines available for user selection including name, version, and a description of various pipelines.
- FIGS. 19A-19D depict one of the visualization applications showing how variants are identified through pipelines used on a project. Mater variant tables, highlighting all alternate alleles in a project can be filtered based on numerous characteristics and downloaded.
- FIG. 19A depicts an interface for hiding filters, applying filters, and resetting filters. Parameters displayed (left side) include gene names, amino acids, chromosome position, variant quality thresholds, mapping quality, reference datasets (elinvar, cancer hallmarks, COSMIC, any state). The right side depicts a list of variants obtained after filtering, and displays chromosome, position, dbSNP ID, reference, alternate, mapping quality, elinvar, and gnomAD values.
- Genotypes may also be displayed per sample, genotype stats, or empty columns may be removed.
- FIG. 19B depicts the entire interface shown on the left of FIG. 19A, including gene names, amino acids, chromosome position, variant quality thresholds, mapping quality, reference datasets (elinvar, cancer hallmarks, COSMIC, any state), protein feature types (gene variant, intergenic region, transcript), polymorphic thresholds, and protein projection.
- FIG. 19C depicts further columns from the right side of FIG. 19A, including variant prevalence, genotype stats, and genotype per sample.
- FIG. 19D depicts an interface comprising a command line providing the current filter set and a history interface for storing previously used filter(s).
- FIG. 20 depicts the copy number visualization application showing multiple samples’ profiles with ability to highlight genomic region of interest. These regions are represented in gene-level tables that can be filtered and downloaded.
- FIGS. 21A-21E depict initial expression visualization application modules. Users can view and add sample phenotypes that can be used to label other modules within the application. Samples can then be identified as outliers and removed from other visualization modules presented in the application. Differential expression can be performed on the application based on groups identified by the platform.
- FIG. 21A depicts an interface for importing or exporting visualization session settings.
- FIG. 21B depicts a visualization for comparing multiple samples.
- FIG. 21C depicts a heat map visualization of gene data, including cell types.
- FIG. 21D depicts an interface for changing visualization parameters and statistical analysis for differential gene expression.
- Parameters include method (Deseq2 or Llmma), design formula, contrast, minimum number of counts, test (Wald Significance or likelihood ratio), fit type (parametric, local, mean, glmGamPol), false discovery rate method (Bonferroni or BH), and contrast chategories.
- FIG. 21E depicts a volcano plot visualization generated using the systems described herein.
- FIGS. 22A-22B depict expression modules that showcase projections of samples’ expression across multiple representations: PCA and UMAP. Users are able to customize facets of the projections, along with labelling samples based on sample phenotypes.
- FIG. 22A depicts a PCA quality control interface and visualization with filters for a user to select: sample to sample normalization (TMM or Raw), gene normalization (Raw, log odds, z-score), filters to show/hide sample names and size as library size, and color coding samples (by phase, progenitor, TGCA tumor, tissue, TGCA tissue).
- TMM sample to sample normalization
- Raw gene normalization
- z-score filters to show/hide sample names and size as library size
- color coding samples by phase, progenitor, TGCA tumor, tissue, TGCA tissue.
- UMPA interface 22B depicts a UMPA interface and visualization including filters for sample to sample normalization (TMM or Raw), gene normalization (Raw, log odds, z-score), number of neighbors, minimal distance, type of metric function, and option for 2D or 3D UMAP visualization.
- FIGS. 23A-23B depict a representation of pipeline outputs from multiple molecular types through a circos plot. User is able to customize samples, chromosomal regions and molecular data types represented on the plot. Regions on the plot are selectable and important regions can be exported to be used to filter other visualization modules on the platform.
- FIG. 23A depicts an interface for generating a circos plot. Users may apply filters to select specific samples, tracks (variants such as SNP/Indel/CNV), and chromosomes for visualization.
- FIG. 23B depicts an exemplary circos plot generated by the systems described herein. DETAILED DESCRIPTION
- systems and methods for processing and visualization of biological data e.g., biomarkers. Further provided herein are systems and methods described herein result in faster processing and real-time visualization of biological data (such as a genome, transcriptome, proteome). Further provided herein are computer interfaces for visualizing and manipulating genomic data from cell populations.
- the methods and systems described herein in some instances automates many of the required functions formerly requiring labor intensive processes as well dedicated personnel to curate, analyze and interpret complex genomic data.
- a system comprises one or more platforms.
- the data comprises genomic, transcriptomic, proteomic, methylation and epigenomic data.
- computer-implemented systems comprising one or more modules.
- computer-implemented systems comprising: at least one memory storing computer-executable instructions; and at least one processor configured to access the at least one memory and execute the computer-executable instructions, wherein the computer-executable instructions comprise one or more of a frontend, a backend, and a pipeline module.
- FIG. IB an exemplary arrangement of modules is shown in FIG. IB.
- modules are accessed from a cloud-based database or interface.
- an exemplary workflow for genomics data visualization is shown in FIG. 2A.
- Methods and systems described herein in some instances comprise one or more steps of accessing a web-based software application; providing or otherwise linking an input file (such as a file comprising whole genomes sequencing, RNA, or other biological information); processing the file; applying one or more filters or annotations to the data in the file; querying one or more databases; and displaying a visualization of the filtered and/or annotated data.
- an input file such as a file comprising whole genomes sequencing, RNA, or other biological information
- a system comprises a device comprising at least one processor and instructions executable by the at least one processor to provide a first application configured to perform operations comprising: accessing one or more datasets comprising biological data; generating a visual representation of the one or more datasets, wherein the data is generated rapidly enough for live user interpretation.
- Biological data in some instances comprises molecular data.
- systems are configured for the processing and visualization of one or more datasets.
- Systems provided herein may comprise one or more applications.
- a system comprises a first application, second application, third application, or more applications.
- a first application comprises one or more modules.
- a module comprises one or more of a plot, tree, cells, counting, and sequencing metrics modules.
- an application is configured to receive user param eters/data, process data, and/or visualize data.
- Systems may comprise one or more datasets.
- one or more datasets are associated with a project.
- a project comprises a plurality of samples.
- a system comprises at least 1, 2, 5, 10, 20, 25, 50, 75, 100, 150, 200, 500, 1000, 2000, 5000, 10000, 20000, 50000, or at least 100,000 datasets.
- a system comprises 10-100,000, 100-100,000, 500-100,000, 1000-100,000, 1000-50,000, 1000-10,000, 5000-100,000, 5000-50,000, 5000-10,000, 10,000-100,000, 10,000-50,000 or 75,000 to 500,000 datasets.
- one or more datasets are each representative of a single cell.
- a dataset represents at least 1, 2, 5, 10, 20, 25, 50, 75, 100, 150, 200, 500, 1000, 2000, 5000, 10000, 20000, 50000, or at least 100,000 cells. In some instances, a dataset represents 10- 100,000, 100-100,000, 500-100,000, 1000-100,000, 1000-50,000, 1000-10,000, 5000-100,000, 5000-50,000, 5000-10,000, 10,000-100,000, 10,000-50,000 or 75,000 to 500,000 cells. In some instances, a dataset comprises one or more biomarkers. In some instances, a dataset comprises at least 1, 2, 5, 10, 20, 25, 50, 75, 100, 150, 200, 500, 1000, 2000, 5000, 10000, 20000, 50000, or at least 100,000 biomarkers.
- a dataset comprises 10-100,000, 100-100,000, 500-100,000, 1000-100,000, 1000-50,000, 1000-10,000, 5000-100,000, 5000-50,000, 5000- 10,000, 10,000-100,000, 10,000-50,000 or 75,000 to 500,000 biomarkers.
- the systems and methods described herein may comprise a frontend module.
- the frontend module comprises a Vue.js application that provides the user interface and visualizations for the systems and methods described herein.
- the frontend makes requests to the backend to query data.
- a frontend comprises computerexecutable instructions for one or more of: displays complex visualizations such as the circos plot, phylogenic tree, etc. (e.g., as navigable tabs); displays quality metrics; visualizes filters and filtering interactions; and presents data tables for cell information.
- a web version of IGV is integrated into the frontend.
- the systems and methods described herein may comprise a backend module.
- the backend comprises a Flask framework application and provides one or more backend features of for the methods and systems described herein.
- the backend is written in Python.
- a backend comprises computer-executable instructions for one or more of: user authentication and registration; data computations and filtering; access of a Vaex open-source library for speeding up data interactions; interacting with a database and HDF5 files to process data requests; presenting and encoding data for visualizations; and presenting data for IGV.
- a user may define one or more analytical workflows to the system to perform. After a use selects one or more analytical workflows, the system in some instances executes the analytical workflow using one or more pipelines.
- the systems and methods described herein may comprise a pipeline module.
- the pipeline comprises a computationally-intensive workflow that runs genomics analysis tools to extract signatures of biomarkers from sequencing files and loads them into a database.
- the methods and systems described herein comprise one or more pipeline modules.
- pipeline modules comprise multi-omics, such as WGS/exome, methylation, proteome, proteome bacterial, or RNA-seq/transcriptome.
- pipeline comprises one or more sub-modules.
- a pipeline comprises one or more data files.
- a pipeline comprises one or more of sequencing input files, sub-pipeline modules, and summary files.
- Pipelines may be configured for whole genome or exome sequencing data.
- a WGS/exome pipeline is configured to input one or more FASTQ files.
- a WGS/exome pipeline comprises one or more of alignment, haplotype caller jointgenotyping, heterozygous site detector (Pipeline used for the analysis of cell lines without a priori knowledge of reference heterozygous variant sites), statistics, ADO, and CNV are needed to drive insights from sequencing data.
- the files contain sequence(ing) information/data.
- files comprise sequence data from the clusters that pass filter on a flow cell.
- the files comprise FASTQ files.
- the database comprises a PostgreSQL database.
- the databases are accessed from a backend module, rises computer-executable instructions for one or more of accepts a sequencing information file as input (e.g., FASTQ); running joint genotyping to produce VCF file and linking variants to COSMIC, ClinVar, or other variant list.
- a VCF file contains the variants called from multiple samples (cells) all together and represent high confidence variants distributed across the cells.
- These variants in some instances represent changes in nucleotides observed in a cell in relation to the reference genome.
- these variants are placed along the genome using genomic coordinates (e.g. chrl base 18903). Such a configuration having a specific location for a variant allows in some instances association of information complied in databases to this given variant.
- Pipelines may be configured for multi-omics analysis.
- multi- omics comprises two or more types of biological information (or biological data).
- biological data comprises molecular data.
- molecular data comprises two or more of transcript (transcriptome), genomic, proteomic, methylome, or other form of sample analysis.
- molecular data comprises three or more of transcript (transcriptome), genomic, proteomic, methylome, or other form of sample analysis.
- molecular data comprises four or more of transcript (transcriptome), genomic, proteomic, methylome, or other form of sample analysis.
- methods described herein display and/process multi-omics data. Molecular data in some instances is obtained from a single cell.
- Molecular data in other instances is obtained by evaluation of a population of cells.
- methods described herein display transcript and genomic data.
- methods described herein utilize transcript, genomic data, and proteomics data.
- methods described herein utilize transcript, genomic data, and methylome data.
- a pipeline is configured to identify one or more biomarkers (e.g., those present in multiomic data).
- Pipeline modules in a system may be configured to perform complex genomic analytical transformations.
- a pipeline module is configured to run genomics analysis tools to extract biomarkers from one or more sequencing data.
- the system is configured for a user to queue and/or order pipelines to be run on a dataset.
- a system is configured for a user to queue and/or order pipelines to be run on a dataset that does not require manual intervention.
- a pipeline module is configured to identify biomarkers present in the genome.
- biomarkers comprise one or more of single nucleotide variants, insertions and deletions. In some instances variants across every sample of a project are aggregated to a single table.
- the aggregation comprises joint genotyping.
- the pipeline module is configured to extract annotations and variants from an input file into a database.
- a database comprises COSMIC, ClinVar, Ensembl, or other genomic annotation databases.
- a pipeline module is configured to identify structural biomarkers present in a genome.
- structural biomarkers comprise variations in chromosomal copy number.
- structural biomarkers comprises one or more of transversions, inversions, tandem repeats, translocation, and element insertion.
- a pipeline module is configured to identify genomic arrangement information specific to classes of genes.
- a genomic arrangement information comprises one or more of major histocompatibility complexes, microsatellite instability, tumor mutational burden and immune repertoire.
- pipeline module is configured to one or more run transcriptome analysis tools.
- transcriptome analysis tools are configured to extinct counts of isoforms from one or more sequencing files.
- transcriptome analysis tools are configured to identify fusion events from one or more sequencing files.
- transcriptome analysis tools are configured to identify nucleotide variation in expressed genes from one or more sequencing files.
- the pipeline module is configured to run methylation analysis tools from one or more sequencing files.
- the methylation analysis tools are configured to extract methylation status from one or more sequencing files.
- methylation status comprises one or more of regions of methylated cytosine, such as mC, hmC and hhmC from one or more sequencing files.
- a pipeline module is configured to run proteomics analysis tools from one or more sequencing files.
- a proteomics analysis tools provide proteomic data.
- a proteomic data comprises sequence counts of identified native conformationally altered proteins.
- an alignment pipeline comprises one or more of a compressed alignment file describing the alignment information of the reads in the project against a given reference (e.g. hg38), a .bam file) and an index file of the alignment file).
- the pipeline comprises a .bam file.
- a haplotype caller pipeline comprises one or more of a genomic variant call format (GVCF) file containing the detected variants for a given sample) and an indexer file associated with the GVCF file.
- GVCF genomic variant call format
- a joint-genotyping pipeline comprises one or more of a genomic variant call format (GVCF) file containing the joint variant calling of multiple samples) and an indexer file associated with the Joint-Genotyped GVCF file.
- GVCF genomic variant call format
- a heterozygous site detector pipeline comprises one or more of a genomic variant call format (GVCF) file containing the called variants with high degree of prevalence across a dataset and high confidence; and an indexer file associated with the GVCF file.
- GVCF genomic variant call format
- a statistics pipeline comprises one or more of a tabulator-separated value table describing whole genome sequence (WGS) level statistics estimated from the aligned reads (e.g. IX, 5X, 10X coverage, etc.); and a tabulator-separated value table showing exome- panel specific statistics (e.g., On, OFF, Near target events).
- WGS whole genome sequence
- a ADO pipeline comprises one or more of a tabulator-separated value table showing allele frequencies of N number of queried heterozygous sites. This table is in some instances used to estimate WGS allele balance.
- a CNV pipeline comprises one or more of a tabulator-separated value table describing, for a sample, the estimated copy number for bins of size N across the whole genome; and tabulator-separated value table describing, for a sample, the type of event (insertion, deletion) for all bins of size N across the genome.
- Pipelines may be configured for bacterial sequencing data.
- a bacterial pipeline is configured to input a FASTQ file.
- a bacterial pipeline comprises one or more of a compressed FASTQ files containing trimmed and filtered high quality sequences; a tabulator-separated value table describing taxonomic assignation of each read to a given species using a database, such as Kraken’s database); a fasta file describing the genome assembly, at the level of contigs, constructed from the reads in the dataset; fasta file describing the genome-assembly, at the level of scaffolds, constructed from the reads in the dataset; a BAM file describing the alignments of the reads in reference to the assemble genome (e.g., contigs).
- a bacterial pipeline comprises one or more summary files.
- summary files comprise one or more of: a Tabulator-separated value table describing the taxonomic assignment of contigs in an assembly based on the proportion of reads mapped to them; a tabulator-separated value table showing the estimated completeness of a given assembly based on a set of phylogenetic marker genes.
- Pipelines may be configured for RNA-seq data.
- an RNA-seq pipeline is configured to accept one or more of a compressed alignment file describing the alignment information of the reads in the project against a given reference (e.g. hg38); an index file of the compressed alignment file; a compressed alignment file describing the alignment information of the reads in the project against a RNA-Seq specific index for a given reference and an index file for the alignment file.
- a RNA-seq pipeline comprises one or more summary files.
- summary files comprise one or more of a tabulator- separated table describing the matrix of counts of the genomic features (e.g.
- exons in a gene across samples; a tabulator-separated table describing the number of unique splice-junction overlaps; a tabulator- separated table describing overall alignment metrics (e.g. number of genes with counts, etc.); and a tabulator-separated table showing the estimated ratio of exon-non exon alignment events.
- Systems and methods described herein may comprise filters for visualizing data.
- filters comprise one or more of: Germline mutation, Somatic mutation, Copy number variation, Single nucleotide variation, Insertions and deletions, Tumor Mutation Burden (TMB) Analysis, Catalog of somatic mutation in cancer (Cosmic)4, ClinVar, and Predicted Coding Change.
- TMB Tumor Mutation Burden
- FIG. 1 Further described herein are computer-implemented systems comprising: at least one memory storing computer-executable instructions; and at least one processor configured to access the at least one memory and execute the computer-executable instructions to perform: receiving a query, wherein the query comprises genomic data from one or more samples; querying a database; wherein the database comprises a plurality of genomic data and a plurality of phenotype data; generating, using at least the genomic data, a genome summary, the genome summary comprising genes and gene variants of the cohort; determining a graphical representation of the genome summary; and sending the graphical representation to a display device.
- GUI graphical user interface
- a GUI comprises a project browser or dashboard such as 100.
- a GUI comprises search menus for one or more of project 102 or LotID or sorting functions that can rank size 104 initiator 105, analysis date 106.
- a GUI comprises a list of previous and current projects 102.
- projects can have a menu 107 that enables users to modify name, add sample collections, export to delivery server or remove from platform.
- projects and data are shared among a group of users across workspaces or organizations 108.
- GUI is facilitated by a frontend.
- the front end enables operators or queries to a backend to reflect actions of a user to manipulate and transform information.
- Computer-implemented systems may comprise a genome browser.
- a genome browser is configured to display sections of a genome and/or variants (FIGS. 2A-2B).
- a genome browser comprises an IGV (integrated genome viewer).
- IGV integrated genome viewer
- the bin size is selectable from the entire genome down to the individual base.
- a user can request specific samples to be viewed that were analyzed within a project.
- a user can query positions by providing dbSNP ids.
- individual mutations in some instances are viewed to determine the alternative allele or base change.
- each mutation is selectable, further detailing the nature of the modification and presenting it to the user.
- amino acid information is conveyed for user to gather information about potential impact to coding sequence.
- Computer-implemented systems may comprise an interface for annotating variants. This is an important step to empower interpretation of downstream coding changes in protein structure and function.
- Variant information in some instances comprises one or more of features (name, gene id, gene type, strand, Tdl, Hgncld), predictions (SIFT/sorting intolerant from tolerant, LFT / likelihood ratio test, FATHMM, PROVEAN/ protein variation effect analyzer, MetaSVM, MetaLR), conservation among species (e.g., vertebrates, mammals, etc.); evidence (pathology -related data from databases such as COSMIC), and biological population (FIG. 2C).
- a variant annotation interface assesses the degree of conservation among (100) vertabrates and (30) mammals. In some instances this display is helpful in the investigation of de-novo variant alleles which are not annotated by ClinVar, Cosmic, Genecards or Ensembl.
- the comparison allows the determination of conservation of alleles found in the sample compared to the same allele found in an alternative species. conserveed alleles are right shifts, where the conservation is high, where alleles which have low conservation are shifted left.
- conserved alleles are right shifts, where the conservation is high, where alleles which have low conservation are shifted left.
- the allele is highly conserved across all 30 mammals indicating the gene is highly conserved and likely to be important for the health of all mammalian species. Having assessed the potential for the mutation to be pathogenic, if annotated the user in some instances navigates to a variety of external databases (e.g. GeneCards, Ensembl, Clinvar and COSMIC) by simply selecting the hyperlink for
- Variants in some instances are annotated as one or more of Germline mutation, Somatic mutation, Copy number variation (CNV), Single nucleotide variation (SNV), Insertions and deletions, Catalog of somatic mutation in cancer (Cosmic), ClinVar, and Predicted Coding Change. Additional resources are also accessed in some instances, such as GeneCards, Essembl, CinVar and Cosmic. In some instances, variants comprise complex markers such as those obtained using Tumor Mutation Burden (TMB) Analysis.
- TMB Tumor Mutation Burden
- Computer-implemented systems may comprise an interface for tracing variant lineages.
- lineages comprise somatic, ancestral, or reference lineages.
- Lineage trees in some instances are generated from specific chromosomes, and graphically display variants in a chart format (FIGS. 3A-3B).
- Computer-implemented systems may comprise an interface for analyzing cells.
- samples comprise one or more cells.
- Cells in some instances are searched, or summary information about each cell is displayed such as cell name, variants detected (somatic, germline, SNPs, and indels (FIG. 4A).
- variants detected sermatic, germline, SNPs, and indels (FIG. 4A).
- metrics high, medium, and low are used to describe confidence of variant calls for each cell.
- inter-cell distances are graphed.
- Computer-implemented systems may comprise an interface for visualizing sequencing metrics (e.g., Picard metrics) (FIG. 4B).
- Metrics include but are not limited to chromosome M population, percent pass/fail reads aligned, WGS mean coverage, and WGS percent excluded duplicate reads. Each metric in some instances is also displayed on an individual per-cell basis.
- Computer-implemented systems may comprise an interface for visualizing genomic data.
- a platform for visualizing molecular data comprises one or more of a device comprising at least one processor and instructions executable by the at least one processor, wherein the instructions comprise: a front end module comprising a graphical user interface; a security layer configured to reduce unauthorized access; a back end module; and one or more pipeline modules.
- a visualization platform comprises one or more visualization applications.
- one or more visualization applications are configured to execute one or more pipelines (e.g., visualization pipelines).
- visualization applications are controlled by a user.
- a visualization system may be configured to generate a visual representation of a dataset.
- a visual representation comprises at least one plot and genomics scale.
- a visual representation displays one or more biomarkers.
- one or more biomarkers comprise specific molecular states associated with biological states of a sample.
- a system is configured for a user to select one or more applications.
- one or more applications are configured for visually representing specific types of molecular data.
- molecular data comprises one or more of genome, transcriptome, methylome and proteome.
- a graphical user interface is configured for a user to select one or more samples processed through a pipeline on the platform to be represented in a visualization application.
- a back end module is configured to encode data for one or more visualizations.
- visualization modules comprises a query engine.
- a query engine comprises one or more of Vaex and AG Grid.
- a back end module is configured to perform one or more calculations or filtering commands.
- one or more calculations or filtering commands are performed on biomarkers.
- one or more calculations or filtering commands are performed on a project or visualization application.
- a visualization application receives (as input) an output generated from one or more pipeline modules performed on the system.
- a visualization application may be configured for a user to view biomarkers.
- biomarkers are identified through the platform pipelines against one or more of an annotation and genomic reference tables.
- a visualization application comprises of one or more modules which represent one or more characteristics of a biological data type.
- a user is able to directly manipulate visual representations in a visualization application.
- a visual representation comprises one or more of feature selection, feature filtering, importing gene lists, and external biomarker results.
- configurations or manipulations of a visual application are exported or imported.
- configurations or manipulations control a layout of other visualization applications within the platform.
- chromosomal coordinates are leveraged to highlight various information across multiple visualization applications.
- biological groupings comprises cellular phenotypes.
- a user is able to view biomarker information within genomic context, leveraging one or more genomic loci, along with associated annotation.
- a system depicts one or more modules within a visualization application for transcription.
- a visualization application depicts one or more of heatmaps, principal components analysis, differential expression, and alternative transcriptional mapping.
- a visualization application is configured for a user to view changes in copy number or structural changes within genome scale.
- a visualization application is configured for a user to view annotated information of changes in copy number or structural changes within the genome scale.
- a system may be configured for syncopation of molecular data.
- syncopation of molecular data comprises managing and querying for one or more analysis tools.
- molecular data comprises data objects derived from the same cell.
- syncopated molecular information is visualized together on the same application within the platform.
- syncopated molecular information is visualized together using one or more of a Circos plots, genome viewers, and graph networks.
- a system may be configured to associate molecular information with biological groupings.
- a system is configured to associate molecular information with biological groupings using one or more statistical methods.
- one or more statistical comprises differential expression (transcriptomic/proteomic), genome-wide association studies (genomic) and quantitative trait loci calculations using molecular information.
- a system is configured to export information in the system contained in one or more objects.
- one or more objects comprise files, images, pipeline and visualization configurations, biomarker lists or projects.
- Data may be visualized using plots.
- data may be visualized using a circos plot (FIG. 5A)
- Circos plots in some instances comprise additional variant information, such as number of somatic, germline, SNP or indel variants.
- Variants in some instances visualized at the chromosome level (FIG. 5B).
- a circos plot comprises a lineage tree (FIG. 7).
- a user interface is configured to apply one or more filters to the circos plot.
- two or more groups of cells or samples are compared (optionally filtered by number of variants).
- views of one or more chromosomes are displayed or hidden.
- variant filters comprise one or more of variant type (SNP, indel), origin (somatic vs. germline), annotation (COSMIC, CLINVAR, coding change), or features.
- features comprise name, gene id, gene type, strand, Tdl, Hgncld.
- variant filters comprise predictions (SIFT, FATHMM, PROVEAN, MetaSVM, and MetaLR). Upon selection of a region or chromosome within a cell’s genome, a pop-up window is in some instances presented to the user which includes a genome viewing frame (e.g., IGV) plot.
- a genome viewing frame e.g., IGV
- This window can be configured in terms of genome window bin size allowing the visualization of the entire chromosome to the individual bases across that genome, which can be completed in matter of seconds.
- the window size in some instances is scrollable by simply dragging the window left or right.
- each sample in some instances is interrogated to determine, for example, the specific change which is highlighted by a color change from the parental allele.
- the alternative allele is selected to determine the base change, while the parent allele can be detected to determine pathogenic risk score based on several public algorithms as well as the conservation of the allele across several vertebrate and mammalian species.
- This variant annotation further provides links to several databases to provide greater detail of the impact of the genomic alteration.
- the systems and methods described herein may provide a visualization of genomic and multi -omic data having a large number of datasets.
- the genomic data comprises at least 1, 2, 5, 10, 15, 20, 25, 30, 40, 50, 75, 100, 125, 150, 200, 250, 300, 400, 500, 600, 750, 1000, or at least 1500 datasets.
- the genomic data comprises 1-1000, 5-1000, 10-1000, 5-10,000, 100-10,000, 100-10,000, 100-1000, 10-500, 10-750, 50-750, or SO- SOO datasets.
- each sample dataset corresponds to a single cell.
- each dataset comprises at least 500, 1000, 2000, 5000, 10,000, 50,000, 100,000, 150,000, 250,000, 500,000 1 million, 2 million, 3 million, 4 million, 5 million, 6 million, 10 million or at least 15 million variants.
- each dataset comprises about 500, 1000, 2000, 5000, 10,000, 50,000, 100,000, 150,000, 250,000, 500,000 1 million, 2 million, 3 million, 4 million, 5 million, 6 million, 10 million or about 15 million variants.
- each dataset comprises 100-1 million, 100-100,000, 100,000-1 million, 100,000-5 million, 100-500,000, 500-5 million, 1 million -2 million, 2 million to 6 million, 3 million to 10 million, or 4 million to 7 million variants.
- datasets comprise at least 1, 2, 5, 10, 20, 25, 50, 75, 80, 85, 90, 95, 100, 110, 120, 150, 200, or at least 250 million rows of data.
- datasets comprise no more than 1, 2, 5, 10, 20, 25, 50, 75, 80, 85, 90, 95, 100, 110, 120, 150, 200, or no more than 250 million rows of data.
- datasets comprise 1-250 , 1-100, 1-50, 1-25, 5-25, 5-50, 10-100, 10-200, 50-200, 50-150, 100- 400 or 100-300 million rows of data.
- a system for visualizing genomic data comprises one or more of a device comprising at least one processor and instructions executable by the at least one processor to provide a first application configured to perform operations comprising: i. accessing one or more datasets comprising genomic data; and ii. generating a visual representation of the one or more datasets.
- the visualization comprises a circos plot. In some instances, the circos plot is generated in no more than 10, 5, 4, 3, 2, 1, 0.5, 0.2, 0.1, 0.05, or no more than 0.01 seconds.
- the circos plot is generated in 0.01-10, 0.05-10, 0.1-50, 0.5-10, 1-10, 2-10, 5-10, 0.01-0.05, 0.01-0.1, 0.01-0.5, 0.1-0.5, 0.1-1, or 0.1-5 seconds. In some instances, the circos plot is generated in no more than 10, 5, 4, 3, 2, 1, 0.5, 0.2, 0.1, 0.05, or no more than 0.01 seconds for dataset having at least 5 cells.
- the circos plot is generated in 0.01-10, 0.05-10, 0.1-50, 0.5-10, 1-10, 2-10, 5-10, 0.01- 0.05, 0.01-0.1, 0.01-0.5, 0.1-0.5, 0.1-1, or 0.1-5 seconds for a dataset having at least 5 cells. In some instances, the circos plot is generated in no more than 10, 5, 4, 3, 2, 1, 0.5, 0.2, 0.1, 0.05, or no more than 0.01 seconds for dataset having at least 10 cells.
- the circos plot is generated in 0.01-10, 0.05-10, 0.1-50, 0.5-10, 1-10, 2-10, 5-10, 0.01-0.05, 0.01-0.1, 0.01- 0.5, 0.1-0.5, 0.1-1, or 0.1-5 seconds for a dataset having at least 10 cells. In some instances, the circos plot is generated in no more than 10, 5, 4, 3, 2, 1, 0.5, 0.2, 0.1, 0.05, or no more than 0.01 seconds for dataset having at least 20 cells.
- the circos plot is generated in 0.01-10, 0.05-10, 0.1-50, 0.5-10, 1-10, 2-10, 5-10, 0.01-0.05, 0.01-0.1, 0.01-0.5, 0.1-0.5, 0.1-1, or 0.1-5 seconds for a dataset having at least 20 cells. In some instances, the circos plot is generated in no more than 10, 5, 4, 3, 2, 1, 0.5, 0.2, 0.1, 0.05, or no more than 0.01 seconds for dataset having at least 1 million variants per cell.
- the circos plot is generated in 0.01-10, 0.05-10, 0.1-50, 0.5-10, 1-10, 2-10, 5-10, 0.01-0.05, 0.01-0.1, 0.01-0.5, 0.1-0.5, 0.1- 1, or 0.1-5 seconds for a dataset having least 1 million variants per cell. In some instances, the circos plot is generated in no more than 10, 5, 4, 3, 2, 1, 0.5, 0.2, 0.1, 0.05, or no more than 0.01 seconds for dataset having at least 4 million variants per cell.
- the circos plot is generated in 0.01-10, 0.05-10, 0.1-50, 0.5-10, 1-10, 2-10, 5-10, 0.01-0.05, 0.01-0.1, 0.01-0.5, 0.1-0.5, 0.1-1, or 0.1-5 seconds for a dataset having least 4 million variants per cell.
- the circos plot is generated using no more than 1, 2, 3, 4, 5, 6, 7, or no more than 8 processors.
- the circos plot is generated using at least 1, 2, 3, 4, 5, 6, 7, or at least 8 processors.
- the circos plot is generated using about 1, 2, 3, 4, 5, 6, 7, or about 8 processors.
- the visualization further comprises a phylogenic tree.
- the visualization further comprises sequencing quality metrics.
- the visualization further comprises annotated variations.
- the visualization further comprises number of variations.
- the visualization further comprises cell and cell population statistics.
- platforms comprising: a database, in a computer memory, comprising biologic information for member of a population of individuals or samples, the biologic information comprising genome data, the biologic information obtained by analysis of one or more biologic samples from each sample and/or individual, each individual and/or sample having an ID; and a processor configured to provide a biologic information visual synthesis application comprising: a software module presenting an interface allowing a user to query the database one or more of: inputting a phenotype, inputting a gene name, inputting an individual ID, and inputting a sample ID; a software module generating a genome browser, the genome browser comprising: a whole genome display comprising an icon representing each chromosome, each icon indicating a density of gene variants; and a chromosome display comprising an iconic representation of each chromosome, the representation indicating a density of gene variants located at the relevant portion of the chromosome, wherein selection of a chromosome by a user generates a linear display of
- Information from a system or application may be exported.
- information is exported automatically, or by action of a user.
- visualization data is exported.
- annotated datasets are exported.
- information is exported as one more files within a project through a file explorer embedded directly on the platform.
- configurations of visualization applications is exported.
- configurations of visualization applications is exported in a format readable by the platform’s visualization applications.
- a system is configured for export of images of visual modules or applications which preserve manipulations performed by the user.
- a system is configured for specific output directories.
- a user has rights and access to place information in the output directory.
- a system is configured for export of a project or pipeline to specified output locations.
- range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosure. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
- a sample includes a plurality of samples, including mixtures thereof.
- determining means determining if an element is present or not (for example, detection). These terms can include quantitative, qualitative or quantitative and qualitative determinations. Assessing can be relative or absolute. “Detecting the presence of’ can include determining the amount of something present in addition to determining whether it is present or absent depending on the context.
- the term “gene” can refer to a linear sequence of nucleotides along a segment of DNA that provides the coded instructions for synthesis of RNA, which, when translated into protein, leads to the expression of hereditary character.
- nucleic acid molecule can mean DNA, RNA, singlestranded, double-stranded or triple stranded and any chemical modifications thereof. Virtually any modification of the nucleic acid is contemplated.
- a “nucleic acid molecule” can be of almost any length, from 10, 20, 30, 40, 50, 60, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 6000, 7000, 8000, 9000, 10,000, 15,000, 20,000, 30,000, 40,000, 50,000, 75,000, 100,000, 150,000, 200,000, 500,000, 1,000,000, 1,500,000, 2,000,000, 5,000,000 or even more bases in length, including increments therein, up to a full-length chromosomal DNA molecule.
- the nucleic acid isolated from a sample is typically RNA.
- a single-stranded nucleic acid molecule is “complementary” to another singlestranded nucleic acid molecule, in certain embodiments of the subject matter described herein, when it can base-pair (hybridize) with all or a portion of the other nucleic acid molecule to form a double helix (double-stranded nucleic acid molecule), based on the ability of guanine (G) to base pair with cytosine (C) and adenine (A) to base pair with thymine (T) or uridine (U).
- G guanine
- C cytosine
- A adenine
- T thymine
- U uridine
- the nucleotide sequence 5'-TATAC-3' is complementary to the nucleotide sequence 5'- GT ATA-3'.
- mutation can refer to a change in the genome with respect to the standard wild-type sequence. Mutations can be deletions, insertions, or rearrangements of nucleic acid sequences at a position in the genome, or they can be single base changes at a position in the genome, referred to as “point mutations.” Mutations can be inherited, or they can occur in one or more cells during the lifespan of an individual. In some instances, mutation and variant are used synonymously.
- kit or “research kit” can refer to a collection of products that are used to perform a biological research reaction, procedure, or synthesis, such as, for example, a detection, assay, separation, purification, etc., which are typically shipped together, usually within a common packaging, to an end user.
- Described herein is a cloud-based solution for the storage, query, and analysis of longitudinal data comprising a multiplicity of whole genomes, a large number of public and proprietary annotation sources as well as associated high quality phenotypic data, including microbiome metagenomes and metabolomics profiles.
- the data analyzed by the platforms, systems, media, and methods described herein comprises more than 1,000, more than 5,000, more than 10,000, more than 20,000, more than 50,000, more than 100,000, more than 500,000, or more than 1,000,000 whole genomes.
- the data analyzed by the platforms, systems, media, and methods described herein comprises genomic data.
- the genomic data is produced, by way of example, at a next generation sequencing (NGS) lab.
- NGS next generation sequencing
- an AWS analysis pipeline based on Illumina’s HiSeq X and the ISIS Analysis Software are utilized to produce the genomic data.
- Sequencing reads are mapped to the hg38 human reference sequence and the Isaac Variant Caller is used to call single nucleotide variants (SNVs) and insertions and deletions (indels).
- the genomic data comprises a multiplicity of unique SNVs.
- the genomic data comprises over 1 million, over 10 million, over 50 million, over 100 million, over 500 million, or over 1 billion unique SNVs.
- the data analyzed by the platforms, systems, media, and methods described herein comprises metadata.
- the whole genomes are associated with high quality phenotypic information.
- a proprietary phenotype ingestion process enables the cleaning and standardization of phenotype data across disparate data sources.
- the ingestion process includes: data integrity checks; standardization of units; standardization of terms; ontology/vocabulary mapping; and maintenance of the proprietary data dictionary.
- the phenotype data comprises more than 1000, more than 5000, more than 10,000, more than 100,000, more than 1,000,000, or more than 10,000,000 phenotype data fields with, more than 1 million, more than 5 million, more than 10 million, more than 50 million, more than 100 million, more than 500 million, or more than 1 billion data points.
- Phenotypic data in some instances comprises cellular phenotype data. In some instances, cellular phenotypic data obtained from microscopy.
- cell phenotypic data comprises one or more observable phenotypic traits such as cell shape or morphology, size, texture, internal structure, patterns of distribution of one or more specific proteins, glycosylated proteins, nucleic acid molecules, lipid molecules, glycosylated lipid molecules, carbohydrate molecules, metabolites, and ions.
- phenotypic data describes populations of cells described herein.
- phenotypic data describes phenotypic traits of an organism such as a human.
- a phenotypic data comprises a clinical designation or category, for example, a clinical diagnosis, a clinical parameter name, a clinical parameter value, a laboratory test name or a laboratory test value.
- a phenotype is associated with an observable disease characteristic.
- the data analyzed by the platforms, systems, media, and methods described herein comprises annotation data.
- Annotation data is also cleaned and standardized through an automated end-to-end solution, which allows: idempotence, immutability, persistence; high quality data; consistency between data sources; and scalability and flexibility.
- Samples may represent biologic information obtained from individuals or populations of individuals (e.g., genomic information).
- samples comprise single cells.
- samples comprise 1, 2, 5, 10, 20, 25, 50, 75, 100, 200, 500, or more than 1000 cells from the same or different individual.
- samples comprise 1000, 2000, 5000, 10,000 20,000, 50,000, 75,000, or at least 100,000 cells from the same or different individual.
- Samples may be obtained from any species, including but not limited to viruses, bacteria, plants, fungi, protozoa, archaea, or animals.
- samples are obtained from vertebrates.
- samples are obtained from mammals.
- samples are obtained from humans.
- Samples in some instances are obtained from any bodily fluid or tissue (blood, semen, spinal fluid, serum, saliva, mucus, or other fluid).
- samples are obtained from diseased tissue such as a tumor.
- the platforms, systems, media, and methods described herein include biologic data pertaining to a population of individuals, or use of the same.
- the population of individuals comprises more than 1,000, more than 5,000, more than 10,000, more than 20,000, more than 50,000, or more than 100,000, more than 500,000, more than 1,000,000 more than 10,000,000, more than 50,000,000, or more than 100,000,000 individuals.
- the individuals in the population participated in academic medical research studies using consents allowing for genetic testing of specimens.
- biologic specimens and phenotype data are collected for individuals from pharmaceutical clinical trials, academic research, and health care settings.
- biologic data pertaining to a population of individuals is collected from integrated health records for individuals representing a spectrum of diseases with unmet medical needs.
- biologic information comprises genetic information.
- the biologic information comprises whole human genome sequencing information.
- the biologic information comprises human transcriptome sequencing information.
- biologic information comprises genetic information from humans, non-human primates, animals, plants, fungi, protozoa, archaea, or bacteria.
- biologic information comprises genetic information from the microbiome.
- the biologic information may comprise genomic information.
- genomic information refers to genetic information found within a biological sample arising from the genome (or DNA - nuclear, mitochondrial or otherwise).
- genomic information comprises nucleic acid sequence copy number, location, and sequence.
- the genomic information is not limited to protein-coding sequence, it may refer to intronic sequence and intergenic sequence, each known to harbor multiple functional elements whereby DNA changes at those elements may be consequential in normal development and disease.
- genomic information comprises post-transcriptional modifications such as methylation.
- genomic information is found within a chromosome, plasmid, or other medium comprising nucleic acids.
- the biologic information may comprise transcript information.
- transcript information refers to information obtained from a transcriptome within a biological sample.
- transcript information comprises expression levels of genes and sequence of corresponding nucleic acids expressed from genes.
- the biologic information may comprise microbiome information.
- microbiome refers to the bacteria and other microorganisms that live in and on the human body.
- the microbiome information comprises metagenomic microbiome characterization.
- the microbiome information comprises one or more of: microflora genus and/or species information, microflora relative abundance information, and microflora gene and/or gene variant information.
- the biologic information may comprise proteome information.
- the proteome information comprises information regarding abundance, localization, identity, post-transcriptional modifications, or other protein information.
- the biologic information may comprise methylome information.
- methylome information comprises post-transcriptional modifications such as the location of 5-methylcytosine (5-mC), 5-hydroxymethylcytosine (5-hmC), CpG islands, or other post-transcriptional modification to nucleic acids.
- the biologic information may comprise metabolome information.
- metabolome refers to the small-molecule chemicals found within a biological sample.
- metabolome information comprises the presence of one or more small-molecule chemicals.
- the metabolome information comprises a qualitative measurement of one or more small-molecule chemicals.
- the metabolome information comprises a quantitative measurement of one or more small-molecule chemicals.
- the microbiome information comprises measurements of at least 100, at least 200, at least 300, at least 400, at least 500, at least 600, at least 700, at least 800, at least 900, at least 1000, at least 1100, at least 1200, at least 1300, at least 1400, or at least 1500 substances (e.g., molecules).
- substances e.g., molecules
- Databases and visualizations described herein may comprise sensitive information pertaining to an individual’s health.
- a system provided herein comprises a platform for data security.
- platforms for data security comprising one or more of an access control for one or more users; a security framework; and biological data from an individual.
- one or more security measures are implemented via security frameworks to restrict access or protect an individual’s health information.
- Security frameworks in some instances comprises standards.
- security frameworks include HIPPA standards.
- security frameworks comprise NIST cybersecurity framework. Access controls in some instances restrict access to certain individuals or groups of individuals. Access controls in some instances comprise passwords, biometrics, or other method of user authentication.
- a platform for data security comprises one or more of an access control for one or more users; a security framework; and molecular data from an individual or biological species.
- a back end module is configured to host a platform for data security.
- a back end module is configured to perform user authentication and registration.
- access to genomic data is limited to authorized users who are a part of authorized organizations.
- the system is accessible by at least 2, 3, 4, 5, 10, 12, 15, 20, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250, 300, 400, 500, 600, 700, 800, 900 or at least 1000 users concurrently.
- the system is accessible by 2-1000, 5-1000, 10-1000, 20-1000, 50-1000, 100-1000, 200-1000, 500- 1000, or 500-10,000 users concurrently.
- the system can be applied in a variety of fields.
- the system provides useful data and analysis to pharmaceutical companies, including informaticians, bench scientists, medical director, the senior executive team, or commercial organizations.
- data and analysis in some instances includes analysis of clinical trial data for patient stratification and biomarker discovery, identification and in silico validation of novel genetic targets, discovery of novel disease and dose response biomarkers/signatures, compound repurposing and expand indications of marketed drugs, rescue of failed clinical trial assets, real time genetic analysis of adverse events, or targeted accelerated recruitment for clinical trials.
- the system in some instances offers analysis of specific cohorts, analysis of individual patients, or large scale analysis of variation in populations.
- Clinics, hospitals and cancer centers, including physicians and genetic counsellors, in some instances will find the system useful in the analysis of individuals, analysis of cohorts, wellness focus, or oncology focus.
- the data and analysis in some instances also has value to insurance companies, actuarial teams, or health economists.
- the system can serve as or enable a reference set of knowledge/evidence, a hypothesis generation engine, a platform for analysis of pharma’s own data, a platform for combination of pharma data and data and analysis provided by the system, a platform for combining data from multiple collaborators, a platform for sharing data within a company, etc.
- the system can similarly be used as part of a care tool to identify the most relevant results for treatment and prevention, a reference set of knowledge/evidence, or a tool to identify other physicians with similar patients/ share knowledge.
- the system can be useful as part of a tool for detect individual care pathway and incentivize healthy living or a tool to help quantify risk that they have in the insured population.
- kits comprising reagents for acquiring biological information.
- the kit is configured to obtain genomic or transcriptome data.
- the kit is configured to obtain genomic, methylome, transcriptome or proteome data from single cells.
- kits comprising reagents for obtaining biological data from single cells, and instructions for using the kit.
- the instructions comprise links to a web-based portal or mobile based software application to import, analyze, and/or compare biological data obtained from the kit.
- the platforms, systems, media, and methods described herein include a digital processing device, or use of the same.
- the digital processing device includes one or more hardware central processing units (CPUs) or general purpose graphics processing units (GPGPUs) that carry out the device’s functions.
- the digital processing device further comprises an operating system configured to perform executable instructions.
- one or more resources related to the systems described herein is stored locally.
- the digital processing device is optionally connected a computer network.
- the digital processing device is optionally connected to the Internet such that it accesses the World Wide Web.
- the digital processing device is optionally connected to a cloud computing infrastructure.
- the digital processing device is optionally connected to an intranet.
- the digital processing device is optionally connected to a data storage device.
- suitable digital processing devices include, by way of non-limiting examples, server computers, desktop computers, laptop computers, and notebook computers.
- the digital processing device includes an operating system configured to perform executable instructions.
- the operating system is, for example, software, including programs and data, which manages the device’s hardware and provides services for execution of applications.
- suitable server operating systems include, by way of non-limiting examples, FreeBSD, OpenBSD, NetBSD®, Linux, Apple® Mac OS X Server®, Oracle® Solaris®, Windows Server®, and Novell® NetWare®.
- suitable personal computer operating systems include, by way of non-limiting examples, Microsoft® Windows®, Apple® Mac OS X®, UNIX®, and UNIX- like operating systems such as GNU/Linux®.
- the operating system is provided by cloud computing.
- the device includes a storage and/or memory device.
- the storage and/or memory device is one or more physical apparatuses used to store data or programs on a temporary or permanent basis.
- the device is volatile memory and requires power to maintain stored information.
- the device is non-volatile memory and retains stored information when the digital processing device is not powered.
- the non-volatile memory comprises flash memory.
- the non-volatile memory comprises dynamic random-access memory (DRAM).
- the non-volatile memory comprises ferroelectric random access memory (FRAM).
- the non-volatile memory comprises phase-change random access memory (PRAM).
- the device is a storage device including, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, magnetic disk drives, magnetic tapes drives, optical disk drives, and cloud computing based storage.
- the storage and/or memory device is a combination of devices such as those disclosed herein.
- the digital processing device includes a display to send visual information to a user.
- the display is a cathode ray tube (CRT).
- the display is a liquid crystal display (LCD).
- the display is a thin film transistor liquid crystal display (TFT-LCD).
- the display is an organic light emitting diode (OLED) display.
- OLED organic light emitting diode
- on OLED display is a passive-matrix OLED (PMOLED) or active-matrix OLED (AMOLED) display.
- the display is a plasma display.
- the display is a video projector.
- the display is a wearable display.
- the display is a combination of devices such as those disclosed herein.
- the digital processing device includes an input device to receive information from a user.
- the input device is a keyboard.
- the input device is a pointing device including, by way of non-limiting examples, a mouse, trackball, track padjoystick, game controller, or stylus.
- the input device is a touch screen or a multi-touch screen.
- the input device is a microphone to capture voice or other sound input.
- the input device is a video camera or other sensor to capture motion or visual input.
- the input device is a Kinect, Leap Motion, or the like.
- the input device is a combination of devices such as those disclosed herein.
- Non-transitory computer readable storage medium
- the platforms, systems, media, and methods disclosed herein include one or more non-transitory computer readable storage media encoded with a program including instructions executable by the operating system of an optionally networked digital processing device.
- a computer readable storage medium is a tangible component of a digital processing device.
- a computer readable storage medium is optionally removable from a digital processing device.
- a computer readable storage medium includes, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, solid state memory, magnetic disk drives, magnetic tape drives, optical disk drives, cloud computing systems and services, and the like.
- the program and instructions are permanently, substantially permanently, semi -permanently, or non- transitorily encoded on the media.
- FIG. 11 a block diagram is shown depicting an exemplary machine that includes a computer system 1100 (e.g., a processing or computing system) within which a set of instructions can execute for causing a device to perform or execute any one or more of the aspects and/or methodologies for static code scheduling of the present disclosure.
- a computer system 1100 e.g., a processing or computing system
- the components in FIG. 11 are examples only and do not limit the scope of use or functionality of any hardware, software, embedded logic component, or a combination of two or more such components implementing particular embodiments.
- Computer system 1100 may include one or more processors 1101, a memory 1103, and a storage 1108 that communicate with each other, and with other components, via a bus 1140.
- the bus 1140 may also link a display 1132, one or more input devices 1133 (which may, for example, include a keypad, a keyboard, a mouse, a stylus, etc.), one or more output devices 1134, one or more storage devices 1135, and various tangible storage media 1136. All of these elements may interface directly or via one or more interfaces or adaptors to the bus 1140.
- the various tangible storage media 1136 can interface with the bus 1140 via storage medium interface 1126.
- Computer system 1100 may have any suitable physical form, including but not limited to one or more integrated circuits (ICs), printed circuit boards (PCBs), mobile handheld devices (such as mobile telephones or PDAs), laptop or notebook computers, distributed computer systems, computing grids, or servers.
- ICs integrated circuits
- PCBs printed circuit boards
- mobile handheld devices such as mobile telephones or PDAs
- laptop or notebook computers distributed computer systems, computing grids, or servers.
- Computer system 1100 includes one or more processor(s) 1101 (e.g., central processing units (CPUs), general purpose graphics processing units (GPGPUs), or quantum processing units (QPUs)) that carry out functions.
- processor(s) 1101 optionally contains a cache memory unit 1102 for temporary local storage of instructions, data, or computer addresses.
- Processor(s) 1101 are configured to assist in execution of computer readable instructions.
- Computer system 1100 may provide functionality for the components depicted in FIG. 11 as a result of the processor(s) 1101 executing non-transitory, processor-executable instructions embodied in one or more tangible computer-readable storage media, such as memory 1103, storage 1108, storage devices 1135, and/or storage medium 1136.
- the computer-readable media may store software that implements particular embodiments, and processor(s) 1101 may execute the software.
- Memory 1103 may read the software from one or more other computer-readable media (such as mass storage device(s) 1135, 1136) or from one or more other sources through a suitable interface, such as network interface 1120.
- the software may cause processor(s) 1101 to carry out one or more processes or one or more steps of one or more processes described or illustrated herein. Carrying out such processes or steps may include defining data structures stored in memory 1103 and modifying the data structures as directed by the software.
- the memory 1103 may include various components (e.g., machine readable media) including, but not limited to, a random access memory component (e.g., RAM 1104) (e.g., static RAM (SRAM), dynamic RAM (DRAM), ferroelectric random access memory (FRAM), phasechange random access memory (PRAM), etc.), a read-only memory component (e.g., ROM 1105), and any combinations thereof.
- ROM 1105 may act to communicate data and instructions unidirectionally to processor(s) 1101
- RAM 1104 may act to communicate data and instructions bidirectionally with processor(s) 1101.
- ROM 1105 and RAM 1104 may include any suitable tangible computer-readable media described below.
- a basic input/output system 106 (BIOS) including basic routines that help to transfer information between elements within computer system 1100, such as during start-up, may be stored in the memory 1103.
- Fixed storage 1108 is connected bidirectionally to processor(s) 1101, optionally through storage control unit 1107.
- Fixed storage 1108 provides additional data storage capacity and may also include any suitable tangible computer-readable media described herein.
- Storage 108 may be used to store operating system 1109, executable(s) 1110, data 1111, applications 1112 (application programs), and the like.
- Storage 1108 can also include an optical disk drive, a solid-state memory device (e.g., flash-based systems), or a combination of any of the above.
- Information in storage 1108 may, in appropriate cases, be incorporated as virtual memory in memory 1103.
- storage device(s) 1135 may be removably interfaced with computer system 1100 (e.g., via an external port connector (not shown)) via a storage device interface 1125.
- storage device(s) 1135 and an associated machine-readable medium may provide non-volatile and/or volatile storage of machine-readable instructions, data structures, program modules, and/or other data for the computer system 1100.
- software may reside, completely or partially, within a machine-readable medium on storage device(s) 1135.
- software may reside, completely or partially, within processor(s) 1101
- Bus 1140 connects a wide variety of subsystems.
- reference to a bus may encompass one or more digital signal lines serving a common function, where appropriate.
- Bus 140 may be any of several types of bus structures including, but not limited to, a memory bus, a memory controller, a peripheral bus, a local bus, and any combinations thereof, using any of a variety of bus architectures.
- such architectures include an Industry Standard Architecture (ISA) bus, an Enhanced ISA (EISA) bus, a Micro Channel Architecture (MCA) bus, a Video Electronics Standards Association local bus (VLB), a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, an Accelerated Graphics Port (AGP) bus, HyperTransport (HTX) bus, serial advanced technology attachment (SATA) bus, and any combinations thereof.
- ISA Industry Standard Architecture
- EISA Enhanced ISA
- MCA Micro Channel Architecture
- VLB Video Electronics Standards Association local bus
- PCI Peripheral Component Interconnect
- PCI-X PCI-Express
- AGP Accelerated Graphics Port
- HTTP HyperTransport
- SATA serial advanced technology attachment
- Computer system 1100 may also include an input device 1133.
- a user of computer system 1100 may enter commands and/or other information into computer system 1100 via input device(s) 1133.
- Examples of an input device(s) 1133 include, but are not limited to, an alpha-numeric input device (e.g., a keyboard), a pointing device (e.g., a mouse or touchpad), a touchpad, a touch screen, a multi-touch screen, a joystick, a stylus, a gamepad, an audio input device (e.g., a microphone, a voice response system, etc.), an optical scanner, a video or still image capture device (e.g., a camera), and any combinations thereof.
- an alpha-numeric input device e.g., a keyboard
- a pointing device e.g., a mouse or touchpad
- a touchpad e.g., a touch screen
- a multi-touch screen e.g.,
- the input device is a Kinect, Leap Motion, or the like.
- Input device(s) 1133 may be interfaced to bus 1140 via any of a variety of input interfaces 1123 (e.g., input interface 1123) including, but not limited to, serial, parallel, game port, USB, FIREWIRE, THUNDERBOLT, or any combination of the above.
- computer system 1100 when computer system 1100 is connected to network 1130, computer system 1100 may communicate with other devices, specifically mobile devices and enterprise systems, distributed computing systems, cloud storage systems, cloud computing systems, and the like, connected to network 1130. Communications to and from computer system 100 may be sent through network interface 1120.
- network interface 1120 may receive incoming communications (such as requests or responses from other devices) in the form of one or more packets (such as Internet Protocol (IP) packets) from network 1130, and computer system 100 may store the incoming communications in memory 1103 for processing.
- Computer system 100 may similarly store outgoing communications (such as requests or responses to other devices) in the form of one or more packets in memory 1103 and communicated to network 1130 from network interface 1120.
- Processor(s) 1101 may access these communication packets stored in memory 1103 for processing.
- Examples of the network interface 1120 include, but are not limited to, a network interface card, a modem, and any combination thereof.
- Examples of a network 1130 or network segment 1130 include, but are not limited to, a distributed computing system, a cloud computing system, a wide area network (WAN) (e.g., the Internet, an enterprise network), a local area network (LAN) (e.g., a network associated with an office, a building, a campus or other relatively small geographic space), a telephone network, a direct connection between two computing devices, a peer-to-peer network, and any combinations thereof.
- a network, such as network 1130 may employ a wired and/or a wireless mode of communication. In general, any network topology may be used.
- Information and data can be displayed through a display 1132.
- a display 1132 include, but are not limited to, a cathode ray tube (CRT), a liquid crystal display (LCD), a thin film transistor liquid crystal display (TFT-LCD), an organic liquid crystal display (OLED) such as a passive-matrix OLED (PMOLED) or active-matrix OLED (AMOLED) display, a plasma display, and any combinations thereof.
- the display 1132 can interface to the processor(s) 1101, memory 1103, and fixed storage 1108, as well as other devices, such as input device(s) 1133, via the bus 1140.
- the display 1132 is linked to the bus 1140 via a video interface 1122, and transport of data between the display 1132 and the bus 1140 can be controlled via the graphics control 1121.
- the display is a video projector.
- the display is a head-mounted display (HMD) such as a VR headset.
- suitable VR headsets include, by way of non-limiting examples, HTC Vive, Oculus Rift, Samsung Gear VR, Microsoft HoloLens, Razer OSVR, FOVE VR, Zeiss VR One, Avegant Glyph, Freefly VR headset, and the like.
- the display is a combination of devices such as those disclosed herein.
- computer system 1100 may include one or more other peripheral output devices 1134 including, but not limited to, an audio speaker, a printer, a storage device, and any combinations thereof.
- peripheral output devices may be connected to the bus 1140 via an output interface 1124.
- Examples of an output interface 1124 include, but are not limited to, a serial port, a parallel connection, a USB port, a FIREWIRE port, a THUNDERBOLT port, and any combinations thereof.
- computer system 1100 may provide functionality as a result of logic hardwired or otherwise embodied in a circuit, which may operate in place of or together with software to execute one or more processes or one or more steps of one or more processes described or illustrated herein.
- Reference to software in this disclosure may encompass logic, and reference to logic may encompass software.
- reference to a computer-readable medium may encompass a circuit (such as an IC) storing software for execution, a circuit embodying logic for execution, or both, where appropriate.
- the present disclosure encompasses any suitable combination of hardware, software, or both.
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- a general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine.
- a processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
- a software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
- An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
- the storage medium may be integral to the processor.
- the processor and the storage medium may reside in an ASIC.
- the ASIC may reside in a user terminal.
- the processor and the storage medium may reside as discrete components in a user terminal.
- suitable computing devices include, by way of non-limiting examples, server computers, desktop computers, laptop computers, notebook computers, sub-notebook computers, netbook computers, netpad computers, set-top computers, media streaming devices, handheld computers, Internet appliances, mobile smartphones, tablet computers, and personal digital assistants.
- Suitable tablet computers in various embodiments, include those with booklet, slate, and convertible configurations, known to those of skill in the art.
- the computing device includes an operating system configured to perform executable instructions.
- the operating system is, for example, software, including programs and data, which manages the device’s hardware and provides services for execution of applications.
- Operating systems in some instances are stored locally, or accessed via a network.
- server operating systems include, by way of non -limiting examples, FreeBSD, OpenBSD, NetBSD®, Linux, Apple® Mac OS X Server®, Oracle® Solaris®, Windows Server®, and Novell® NetWare®.
- suitable personal computer operating systems include, by way of nonlimiting examples, Microsoft® Windows®, Apple® Mac OS X®, UNIX®, and UNIX-like operating systems such as GNU/Linux®.
- the operating system is provided by cloud computing.
- suitable mobile smartphone operating systems include, by way of non-limiting examples, Nokia® Symbian® OS, Apple® iOS®, Research In Motion® BlackBerry OS®, Google® Android®, Microsoft® Windows Phone® OS, Microsoft® Windows Mobile® OS, Linux®, and Palm® WebOS®.
- Non-transitory computer readable storage medium may be utilized as part of the systems and methods of the present invention.
- a computer system may be utilized as a device configured for use by a researcher, patient, partner, caretaker, or healthcare provider.
- the platforms, systems, media, and methods disclosed herein include one or more non-transitory computer readable storage media encoded with a program including instructions executable by the operating system of an optionally networked computing device.
- a computer readable storage medium is a tangible component of a computing device.
- a computer readable storage medium is optionally removable from a computing device.
- a computer readable storage medium includes, by way of non-limiting examples, CD-ROMs, DVDs, flash memory devices, solid state memory, magnetic disk drives, magnetic tape drives, optical disk drives, distributed computing systems including cloud computing systems and services, and the like.
- the program and instructions are permanently, substantially permanently, semipermanently, or non-transitorily encoded on the media.
- the platforms, systems, media, and methods disclosed herein include at least one computer program, or use of the same.
- a computer program includes a sequence of instructions, executable by one or more processor(s) of the computing device’s CPU, written to perform a specified task.
- Computer readable instructions may be implemented as program modules, such as functions, objects, Application Programming Interfaces (APIs), computing data structures, and the like, that perform particular tasks or implement particular abstract data types. In light of the disclosure provided herein, those of skill in the art will recognize that a computer program may be written in various versions of various languages.
- the functionality of the computer readable instructions may be combined or distributed as desired in various environments.
- a computer program comprises one sequence of instructions.
- a computer program comprises a plurality of sequences of instructions. In some embodiments, a computer program is provided from one location. In other embodiments, a computer program is provided from a plurality of locations. In various embodiments, a computer program includes one or more software modules or features. In various embodiments, a computer program includes, in part or in whole, one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or add-ons, or combinations thereof.
- one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or add-ons, or combinations thereof are utilized to perform the methods as described herein.
- one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or add-ons, or combinations thereof are utilized as part of the systems as described herein.
- one or more web applications, one or more mobile applications, one or more standalone applications, one or more web browser plug-ins, extensions, add-ins, or add-ons, or combinations thereof are utilized to fully or partially automate the systems and methods as described herein. In some embodiments, automation allows methods to be carried out which are beyond the limits of what can be processed by a human.
- a computer program includes a web application.
- a web application in various embodiments, utilizes one or more software frameworks and one or more database systems.
- a web application is created upon a software framework such as Microsoft® .NET or Ruby on Rails (RoR).
- a web application utilizes one or more database systems including, by way of non-limiting examples, relational, non-relational, object oriented, associative, XML, and document oriented database systems.
- suitable relational database systems include, by way of non-limiting examples, Microsoft® SQL Server, mySQLTM, and Oracle®.
- a web application in various embodiments, is written in one or more versions of one or more languages.
- a web application may be written in one or more markup languages, presentation definition languages, client-side scripting languages, server-side coding languages, database query languages, or combinations thereof.
- a web application is written to some extent in a markup language such as Hypertext Markup Language (HTML), Extensible Hypertext Markup Language (XHTML), or extensible Markup Language (XML).
- a web application is written to some extent in a presentation definition language such as Cascading Style Sheets (CSS).
- CSS Cascading Style Sheets
- a web application is written to some extent in a client-side scripting language such as Asynchronous JavaScript and XML (AJAX), Flash® ActionScript, JavaScript, or Silverlight®.
- AJAX Asynchronous JavaScript and XML
- a web application is written to some extent in a server-side coding language such as Active Server Pages (ASP), ColdFusion®, Perl, JavaTM, JavaServer Pages (JSP), Hypertext Preprocessor (PHP), PythonTM, Ruby, Tel, Smalltalk, WebDNA®, or Groovy.
- a web application is written to some extent in a database query language such as Structured Query Language (SQL).
- SQL Structured Query Language
- a web application integrates enterprise server products such as IBM® Lotus Domino®.
- a web application includes a media player element.
- a media player element utilizes one or more of many suitable multimedia technologies including, by way of non-limiting examples, Adobe® Flash®, HTML 5, Apple® QuickTime®, Microsoft® Silverlight®, JavaTM, and Unity®.
- an application provision system comprises one or more databases 1200 accessed by a relational database management system (RDBMS) 1210. Suitable RDBMSs include Firebird, MySQL, PostgreSQL, SQLite, Oracle Database, Microsoft SQL Server, IBM DB2, IBM Informix, SAP Sybase, Teradata, and the like.
- the application provision system further comprises one or more application severs 1220 (such as Java servers, .NET servers, PHP servers, and the like) and one or more web servers 1230 (such as Apache, IIS, GWS and the like).
- the web server(s) optionally expose one or more web services via app application programming interfaces (APIs) 1240.
- APIs app application programming interfaces
- an application provision system alternatively has a distributed, cloud-based architecture 1300 and comprises elastically load balanced, auto-scaling web server resources 1310 and application server resources 1320 as well synchronously replicated databases 1330.
- the web applications may be utilized as part of the systems as described herein.
- the web applications may be utilized to perform the systems as described herein.
- web applications are utilized to provide features or modules of the systems described herein.
- web applications are utilized to fully or partially automate systems and methods described herein. In some embodiments, automation allows methods to be carried out which are beyond the limits of what can be processed by a human.
- a computer program includes a mobile application provided to a mobile computing device.
- the mobile application is provided to a mobile computing device at the time it is manufactured.
- the mobile application is provided to a mobile computing device via the computer network described herein.
- a mobile application is created by techniques known to those of skill in the art using hardware, languages, and development environments known to the art. Those of skill in the art will recognize that mobile applications are written in several languages. Suitable programming languages include, by way of nonlimiting examples, C, C++, C#, Objective-C, JavaTM, JavaScript, Pascal, Object Pascal, PythonTM, Ruby, VB.NET, WML, and XHTML/HTML with or without CSS, or combinations thereof.
- Suitable mobile application development environments are available from several sources. Commercially available development environments include, by way of non-limiting examples, AirplaySDK, alcheMo, Appcelerator®, Celsius, Bedrock, Flash Lite, .NET Compact Framework, Rhomobile, and WorkLight Mobile Platform. Other development environments are available without cost including, by way of non-limiting examples, Lazarus, MobiFlex, MoSync, and PhoneGap. Also, mobile device manufacturers distribute software developer kits including, by way of non-limiting examples, iPhone and iPad (iOS) SDK, AndroidTM SDK, BlackBerry® SDK, BREW SDK, Palm® OS SDK, Symbian SDK, webOS SDK, and Windows® Mobile SDK.
- iOS iPhone and iPad
- the mobile applications may be utilized as part of the systems as described herein.
- the mobile applications may be utilized to perform the systems as described herein.
- mobile applications are utilized to provide features or modules of the systems described herein.
- mobile applications are utilized to fully or partially automate systems and methods described herein. In some embodiments, automation allows methods to be carried out which are beyond the limits of what can be processed by a human.
- Web browser plug-in
- the computer program includes a web browser plug-in (e.g., extension, etc.).
- a plug-in is one or more software components that add specific functionality to a larger software application. Makers of software applications support plug-ins to enable third-party developers to create abilities which extend an application, to support easily adding new features, and to reduce the size of an application. When supported, plug-ins enable customizing the functionality of a software application. For example, plug-ins are commonly used in web browsers to play video, generate interactivity, scan for viruses, and display particular file types. Those of skill in the art will be familiar with several web browser plug-ins including, Adobe® Flash® Player, Microsoft® Silverlight®, and Apple® QuickTime®.
- the toolbar comprises one or more web browser extensions, add-ins, or add-ons. In some embodiments, the toolbar comprises one or more explorer bars, tool bands, or desk bands.
- plug-in frameworks are available that enable development of plug-ins in various programming languages, including, by way of non-limiting examples, C++, Delphi, JavaTM, PHP, PythonTM, and VB .NET, or combinations thereof.
- Web browsers are software applications, designed for use with network-connected computing devices, for retrieving, presenting, and traversing information resources on the World Wide Web. Suitable web browsers include, by way of nonlimiting examples, Microsoft® Edge®, Mozilla® Firefox®, Google® Chrome, Apple® Safari®, Opera Software® Opera®, and KDE Konqueror. In some embodiments, the web browser is a mobile web browser. Mobile web browsers (also called microbrowsers, mini-browsers, and wireless browsers) are designed for use on mobile computing devices including, by way of non- limiting examples, handheld computers, tablet computers, netbook computers, subnotebook computers, smartphones, and personal digital assistants (PDAs).
- PDAs personal digital assistants
- Suitable mobile web browsers include, by way of non-limiting examples, Google® Android® browser, RIM BlackBerry® Browser, Apple® Safari®, Palm® Blazer, Palm® WebOS® Browser, Mozilla® Firefox® for mobile, Microsoft® Internet Explorer® Mobile, Amazon® Kindle® Basic Web, Nokia® Browser, Opera Software® Opera® Mobile, and Sony® PSPTM browser.
- the platforms, systems, media, and methods disclosed herein include software, server, and/or database modules, or use of the same.
- software modules are created by techniques known to those of skill in the art using machines, software, and languages known to the art.
- the software modules disclosed herein are implemented in a multitude of ways.
- a software module comprises a file, a section of code, a programming object, a programming structure, or combinations thereof.
- a software module comprises a plurality of files, a plurality of sections of code, a plurality of programming objects, a plurality of programming structures, or combinations thereof.
- the one or more software modules comprise, by way of non-limiting examples, a web application, a mobile application, and a standalone application.
- software modules are in one computer program or application. In other embodiments, software modules are in more than one computer program or application. In some embodiments, software modules are hosted on one machine. In other embodiments, software modules are hosted on more than one machine. In further embodiments, software modules are hosted on a distributed computing platform such as a cloud computing platform. In some embodiments, software modules are hosted on one or more machines in one location. In other embodiments, software modules are hosted on one or more machines in more than one location.
- users query one or more databases to identify information about biological data in his or her dataset.
- user may use an interface to display specific information about a variant, such as the variants role in cancer or other diseases.
- the platforms, systems, media, and methods disclosed herein include one or more databases, or use of the same.
- databases are suitable for storage and retrieval of, for example, patient, photo, video, skin condition, visit, physician, and insurance information.
- suitable databases include, by way of non-limiting examples, relational databases, non-relational databases, object oriented databases, object databases, entity-relationship model databases, associative databases, XML databases, document oriented databases, and graph databases. Further non-limiting examples include SQL, PostgreSQL, MySQL, Oracle, DB2, Sybase, and MongoDB.
- a database is Internet-based.
- a database is web-based.
- a database is cloud computing-based.
- a database is a distributed database.
- a database is based on one or more local computer storage devices.
- Databases may comprise information (e.g., annotations) regarding genetic variants.
- databases provide information on somatic, germline, or somatic and germline variants.
- a database comprises one or more of ClinVar, COSMIC, NCBI database of Genotypes and Phenotypes (dbGaP), gnomAD, 69 genomes from CGI, Personalized Genome Project, NCI Genomic Data Commons (GDC), cBioPortal, Intogen, and the Pediatric Cancer Genome Project.
- databases provide information on variants related to cancer or other disease.
- User Account creation is provided as first option for new users onto the account showing the image in FIG. 14A.
- Fields (1400) provided within the form must be filled out entirely and agreements to terms and conditions made prior to email verification.
- the user Prior to gaining access to the platform, the user must review and approve the terms and conditions (1410), where access and review capability is shown in FIG. 14B.
- a verification email is then sent providing a number token to be provided back to the platform to confirm email details.
- Tabs at the login screen allow for users to toggle between new account creation and login.
- An administration console is provided to platform administrators to handle toggling access and assigning information to organizations and users (1500). This console is not available to individual accounts unless they have been grated Administrative access. The console provides information about the organizations with created accounts on the platform as shown in FIG. 15A.
- Selecting a specific workspace opens up a panel (1550) where all of an organization’s workspaces are viewable as in FIG. 15B.
- Selecting the edit icon allows a user to specify data ingress locations such as BaseSpace Sequencing Hub (platform provided by Illumina®, 1570) and a path available to a user on Amazon Web Services Simple Storage Serivce (s3) (1580).
- BaseSpace Sequencing Hub platform provided by Illumina®, 1570
- s3 Amazon Web Services Simple Storage Serivce
- Selecting the User tab on the console (1500) opens up a user dashboard where a platform admin can perform several functions. Queries can be made based on components of a user’s name, their email or organization (1590). Selecting the Elipsis icon allows the Administrator to assign specific rights at the organization (1592) or Workspace (1593) level as in FIG 15C. This can also allow authorized users to have access to workspaces or organizations outside of their own, should collaborations or agreements between organizations authorize such an action.
- Project creation is done by selecting the New Project button (1610) at the top of a workspace’s project listing in FIG. 16A.
- the new project form in FIG 16B allows users to specify a name (1620) for a project (platform checks and reports if the name is redundant). Additional details about BioSkryb product lots (1630), associated genome (1640) and sequencing library information (1650) can be provided.
- a new window opens where a user can specify a specific project (1670) if BSSH (BaseSpace Sequence Hub®) in FIG 16C or specific characters of a sample name (1680) which can exist in BSSH or from a user’s s3, as in FIG 16D. From there a box can be used to select all samples or individual sample boxes to select specific samples to be used for a project (1690).
- BSSH BasicSpace Sequence Hub®
- the platform offers up analysis pipelines to process user’s molecular data in a series of transformations automated by the platform. [0186] Immediately after project creation, the platform alerts the user if they want to queue a pipeline immediately after downloading as in FIG 17A.
- the launch window in FIG 17B allows users to select what pipeline and what version of a pipeline to be run on the samples
- Selecting a secondary analysis pipeline allows users to customize parameters and metadata used to provide additional instructions to steps in the pipeline as seen in FIG 17C. Specific tools/modules can be selected as well as details about chemistry.
- Pipelines are separated into secondary and tertiary analyses, determined by what inputs are provided into the pipeline.
- Secondary pipelines start from sequencing data (typically FASTQs) and tertiary pipelines start from outputs of secondary analyses (typically aligned files, VCFs or counting information).
- the platform displays details (1800) about the biosamples of the project such as file size, total number of reads, read length BioSkryb LotID and upload date as shown in FIG 18 A.
- Tertiary analysis pipelines can only be run after secondary analysis pipelines and if a user selected a tertiary analysis (1820), it takes them to a window like in FIG 18B, where a set of secondary analyses that can be leveraged is provided. The user would then select a pipeline and advance to the next stage.
- a tertiary workflow progress bar (1830) is provided at the top of the window to show the user what stage of execution the window portrays.
- a window which shows which tertiary analysis pipeline can be selected (one or many) is displayed in FIG 18C.
- Variant filtering visualization application has modules containing at least: a locus specific table showing variants called across all samples selected for pipeline output (1900) a table for criteria to use for filtering (1910), an ability to select genotype details (1920) and a table export feature (1930).
- Biomarkers characterized for specific protein effects or gene bodies can be leveraged in 1970.
- Polymorphic databases such as but not limited to: dbSNP/gnomAD) where genotypes associated to populations can be leveraged and genotypes above or below certain minor allele frequencies can be selected (1980).
- a log of each filtering step is provided at the bottom of the application, highlighting one or more steps as shown in FIG 19D. A user can go back to previous steps or copy and paste the syntax to go directly to a previously used filtering combination.
- FIG. 20 For users that select to use copy number tertiary analyses an application (FIG. 20) can be leveraged to evaluate chromosomal gains and losses for samples in their project [0210] Users can select some or all of the samples in their project by point and click or by searching for matching strings of sample sets they want to use (2000).
- a user can choose to select specific genomic regions of a sample (2020) and automatically initialize a table listing the genes that occupy that region (2030) for all selected samples in the application.
- the user can use any of the tables filtering bars to select for specific features they are interested in such as (but not limited to): chromosomal start and stop positions, genes, samples or number of copies (2040).
- a user can download the feature table (2030) with any or no filters applied (2050).
- EXAMPLE 8 To handle express! on -based molecular information, such as those that come from transcriptomic or proteomic sequencing data an application is created that leverages several modules showing different facets of the data
- manipulations of each of the modules creates a custom view into the data of a project and if the user should so choose, they can export the manipulation settings or import a previously used manipulation schema as shown in FIG.
- This table (2100) can be exported but also a user is allowed to add and import additional classification columns (2110), such as those identifying biomarker states from investigations of other molecular types on the platform.
- the application can also be configured to display relative expression values at the isoform-level (specific coding combinations of a gene) in 2120.
- a user can select which samples to remove from representation in the modules of the application by selecting or searching for sample names in 2130.
- a read count figure is initially shown to identify specific samples that may have abnormally high (or low) sequencing information that may skew graphical representations (2150). Samples removed from 2130 are shown in contrast to approved samples
- a user can chose gene sets they wish to display on the heatmap from a variety of sources, such as: top variable genes (auto-calculated by the application), differentially expressed genes, custom gene lists or known biological pathways (2170).
- One of the platform’s tertiary analysis capabilities is performed as a module within the application as shown in FIG. 21D.
- PCA principal components analysis
- Samples can be colored by the phenotypic labels (2200) provided by the client in 2100.
- Gene or sample-level normalizations can be toggled on or off (2210), depending on specific numerical characteristics of the project’s samples.
- the user is able to select specific components to display (2220), highlighting different views of potential sample grouping along specific contributions of variability. Percent contribution of sample variability is shown for each component selected
- UMAP Uniform Manifold Approximation and Projection
- the user can display results in a rendered 3 dimensional view for the main graphic shown in FIG 22B.
- a circos plot application is provided to users that can overlay multiple molecular data types in genomic context as shown in FIG. 23 A.
- chromosomes are also able to be selected (2320) as well to determine what region(s) are displayed on the plot.
- the circos plots on the application (2330) display selected data types and chromosomes determined by the user.
- a web-based user interface was created which allowed rapid visualization of genomic data, having the architecture of FIG. IB.
- a user logged into the interface, which displayed previous projects analyzed using the interface FIG. 1C.
- the user then loaded sample data comprising genomic sequences for 10 single cell experiments, which were available for viewing in a genome browser (FIGS. 2D-2E). Variants were annotated based on querying databases (e.g., COSMIC) FIG. 2C.
- Lineage trees for variants were generated and presented to the user (FIGS. 3B and 4A).
- Information for statistics of individual cells and populations and whole-genome sequencing quality metrics were accessed through the web-based interface.
- a circos plot representing clinically relevant mutations found in the samples was generated (FIG. 5A-5B) in less than 100 ms (FIG. 6). Additional circos plots were generated with controls for which chromosomes to display and different types of mutations (controlled by filters) at FIG. 7.
- a clinician obtains cells from a tumor biopsy and subjects the cells to single-cell sequencing. After generating sequencing files for the cells, the clinician imports the files using the web-based user interface, identifies clinically relevant mutations using interface, and prescribes specific therapies based on the presence of absence of these mutations. In some instances, driver mutations identified in a minority of cells are considered such that a combination treatment is prescribed.
- NIST Cybersecurity Framework and HIPAA as the basis for its data security program has been utilized.
- the NIST Cybersecurity Framework is a set of guidelines for mitigating organizational cybersecurity risks published by the US National Institute of Standards and Technology.
- the Health Insurance Portability and Accountability Act of 1996 (HIPAA) is a federal law that requires the creation of national standards to protect sensitive patient health information from being disclosed without the patient’s consent or knowledge.
- HIPAA Health Insurance Portability and Accountability Act of 1996
- an application described herein aims to ensure the confidentiality, integrity, and availability of information within, using the following mechanisms to provide for the confidentiality, integrity, and availability of information:
- Access permissions were managed, incorporating the principles of least privilege and separation of duties. Being able to access data does not automatically give a user authorization to view that data.
- Network integrity was protected, incorporating network segregation where appropriate. If it is determined that segmentation is necessary, specific methods will be approved and adopted. Periodic internal vulnerability tests must be conducted and documented to validate controls are effective.
- Log-in procedures limited the number of unsuccessful log-in attempts, after which a user must contact the information system administrator to have his or her password reset.
- Access control was governed by AWS IAM, which provides role-based access control to the databases and files.
- Data-in-Transit is protected. Secure transmission protocols include encryption of data and e-mails; VPN’s, TLS links, SSL links, and similar technologies.
- VPN virtual private network
- SSL links Secure Sockets Layer links
- Similar technologies include encryption of data and e-mails; VPN’s, TLS links, SSL links, and similar technologies.
- User authentication was provided by AWS Cognito, which is a managed authentication service similar to Okta or AuthO. Automation was used to automatically create groups for workspaces and Password organizations. Users were then allocated to these groups which provide access to the files and data.
- AWS GuardDuty a tool that uses Al to detect anomalous behavior, was used proactively to monitor the accounts for suspicious activity.
- Automatic log-off was implemented for protection against data leaks. Automatic controls to log users off after 15 minutes of inactivity was enabled. After being automatically logged off, a user must re-enter his or her username and password to resume the interrupted activity. Users may not disable this automatic log-off feature.
- Configuration change control processes were in place. For example: 1) determining the types of changes to the information system that are configuration controlled; 2) approving configuration-controlled changes to the system with consideration for security; 3) documenting approved configuration-controlled changes to the system; 4) retaining and reviewing records of configuration-controlled changes to the system; 5) auditing activities associated with configuration-controlled changes to the system; and 6) coordinating and providing oversight for configuration change control activities through change request forms that must be approved by a Security Officer.
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
La présente invention concerne des systèmes, des plateformes et des méthodes de traitement et de visualisation de données génomiques. La présente invention concerne également des systèmes, des plateformes et des méthodes de génération rapide de résultats de biomarqueurs. La présente invention concerne en outre des systèmes, des plateformes et des méthodes d'identification de corrélations génomiques de maladies.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202263317636P | 2022-03-08 | 2022-03-08 | |
| US63/317,636 | 2022-03-08 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2023172923A2 true WO2023172923A2 (fr) | 2023-09-14 |
| WO2023172923A3 WO2023172923A3 (fr) | 2023-11-09 |
Family
ID=87935927
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2023/063877 Ceased WO2023172923A2 (fr) | 2022-03-08 | 2023-03-07 | Systèmes et méthodes se rapportant à la bioinformatique |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2023172923A2 (fr) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119943165A (zh) * | 2024-05-10 | 2025-05-06 | 中国人民解放军军事科学院军事医学研究院 | 一种病原微生物数据库及其构建方法 |
| US12559794B2 (en) | 2018-01-29 | 2026-02-24 | St. Jude Children's Research Hospital, Inc. | Method for nucleic acid amplification |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2494077A4 (fr) * | 2009-10-27 | 2013-08-21 | Caris Mpi Inc | Profilage moléculaire pour médecine personnalisée |
| EP2795501A2 (fr) * | 2011-12-21 | 2014-10-29 | Life Technologies Corporation | Procédés et systèmes pour conception et exécution expérimentales in silico d'un flux de production biologique |
| JP7621967B2 (ja) * | 2019-03-28 | 2025-01-27 | フェーズ ゲノミクス インコーポレイテッド | シーケンシングによる核型分析のためのシステムおよび方法 |
| US20210142904A1 (en) * | 2019-05-14 | 2021-05-13 | Tempus Labs, Inc. | Systems and methods for multi-label cancer classification |
| US20210118559A1 (en) * | 2019-10-22 | 2021-04-22 | Tempus Labs, Inc. | Artificial intelligence assisted precision medicine enhancements to standardized laboratory diagnostic testing |
| US11264140B1 (en) * | 2020-12-16 | 2022-03-01 | Ro5 Inc. | System and method for automated pharmaceutical research utilizing context workspaces |
-
2023
- 2023-03-07 WO PCT/US2023/063877 patent/WO2023172923A2/fr not_active Ceased
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12559794B2 (en) | 2018-01-29 | 2026-02-24 | St. Jude Children's Research Hospital, Inc. | Method for nucleic acid amplification |
| CN119943165A (zh) * | 2024-05-10 | 2025-05-06 | 中国人民解放军军事科学院军事医学研究院 | 一种病原微生物数据库及其构建方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2023172923A3 (fr) | 2023-11-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| McInnes et al. | Pharmacogenetics at scale: an analysis of the UK Biobank | |
| Robinson et al. | Interpretable clinical genomics with a likelihood ratio paradigm | |
| Rakocevic et al. | Fast and accurate genomic analyses using genome graphs | |
| Bycroft et al. | Genome-wide genetic data on~ 500,000 UK Biobank participants | |
| US20210319907A1 (en) | Multi-omic search engine for integrative analysis of cancer genomic and clinical data | |
| Mercatelli et al. | Web tools to fight pandemics: the COVID-19 experience | |
| US11640859B2 (en) | Data based cancer research and treatment systems and methods | |
| US9910957B2 (en) | Visualization, sharing and analysis of large data sets | |
| Kim et al. | KoVariome: Korean National Standard Reference Variome database of whole genomes with comprehensive SNV, indel, CNV, and SV analyses | |
| Katainen et al. | Discovery of potential causative mutations in human coding and noncoding genome with the interactive software BasePlayer | |
| KR20180136933A (ko) | 게놈, 마이크로바이옴, 및 메타볼롬 데이터의 시각적 합성을 위한 플랫폼 | |
| US10964410B2 (en) | System and method for detecting gene fusion | |
| CN104871164A (zh) | 处理和呈现基因组序列数据中核苷酸变化的基因组浏览器系统 | |
| Doig et al. | PathOS: a decision support system for reporting high throughput sequencing of cancers in clinical diagnostic laboratories | |
| Craig et al. | Assessing and managing risk when sharing aggregate genetic variant data | |
| Mechanic et al. | Next generation analytic tools for large scale genetic epidemiology studies of complex diseases | |
| Roy et al. | SeqReporter: automating next-generation sequencing result interpretation and reporting workflow in a clinical laboratory | |
| WO2023172923A2 (fr) | Systèmes et méthodes se rapportant à la bioinformatique | |
| AU2019359878A1 (en) | Data based cancer research and treatment systems and methods | |
| WO2024026376A2 (fr) | Procédés et systèmes d'analyse multiomique | |
| Kaplun et al. | PGMD: a comprehensive manually curated pharmacogenomic database | |
| Niehus et al. | PopDel identifies medium-size deletions simultaneously in tens of thousands of genomes | |
| Patel et al. | Metapipeline-DNA: A comprehensive germline and somatic genomics Nextflow pipeline | |
| Verrou et al. | Protocol for unbiased, consolidated variant calling from whole exome sequencing data | |
| Wang et al. | ClinLabGeneticist: a tool for clinical management of genetic variants from whole exome sequencing in clinical genetic laboratories |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23767623 Country of ref document: EP Kind code of ref document: A2 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 23767623 Country of ref document: EP Kind code of ref document: A2 |