WO2019191784A1 - Ingénierie automatisée des caractéristiques de connectomes d'ensemble hiérarchique - Google Patents

Ingénierie automatisée des caractéristiques de connectomes d'ensemble hiérarchique Download PDF

Info

Publication number
WO2019191784A1
WO2019191784A1 PCT/US2019/025260 US2019025260W WO2019191784A1 WO 2019191784 A1 WO2019191784 A1 WO 2019191784A1 US 2019025260 W US2019025260 W US 2019025260W WO 2019191784 A1 WO2019191784 A1 WO 2019191784A1
Authority
WO
WIPO (PCT)
Prior art keywords
networks
modality
network
data
hyperparameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2019/025260
Other languages
English (en)
Inventor
Derek PISNER
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of WO2019191784A1 publication Critical patent/WO2019191784A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/00Two-dimensional [2D] image generation
    • G06T11/20Drawing from basic elements
    • G06T11/26Drawing of charts or graphs

Definitions

  • biomedical software intended for usage by experts from a variety of fields including but not limited to neuroimaging, psychology, systems biology, bioinformatics, computational psychiatry, computational linguistics, neurogenetics, ecology, cognitive science, computer science, object-oriented programming, high-powered computing, graph theory, ensemble sampling, machine-learning, and artificial intelligence.
  • a connectome refers to the set of elements and
  • a connectome can also be construed as encompassing the full web of phenotypic connectivity relationships among genetic, molecular, cognitive, behavioral, and even social traits.
  • a connectome typically exhibits a set of network properties with a complex topology. That is, their specific pattern of pairwise connectivity, considered independently or collectively across scales, is neither completely regular nor random.
  • This complex topology which can conceivably encapsulate person-specific network traits has the power to characterize deeply multivariate statistical relationships about a given individual.
  • a network (or‘graph’) is simply a collection of connected objects, referred to as nodes (or vertices), where the connections between these nodes are referred to as edges.
  • nodes or vertices
  • edges A common example of a network would be a social networks where each person is a node, each friendship connecting them an edge.
  • Networks are typically represented in the form of n x n adjacency matrices where nodes and edges can be defined in any number of ways depending on the type of data being modeled.
  • additional thresholding is typically performed (by absolute edge weight, proportion of strongest weights, and other methods) to ensure an adequate level of sparsity in the graph that penalizes spurious (i.e.‘false positive’) connections.
  • Graphs i.e. networks
  • graph analysis which includes implementing some subset or all of several dozen algorithms for
  • Global measures broadly refer to the network as a whole, whereas local measures broadly refer to the local properties of individual network nodes.
  • Global measures include measures of efficiency, clustering, paths/distance, assortativity/core structure, density, degree, and community structure.
  • Local measures include node centrality and local efficiency, among others. In the case of a layered structural-functional multigraph, centrality measures are instead referred to as“versatility.” Many, but not all, global and local graph measures can be calculated for any type of graph. Whereas graph analysis may yield derivative measures of network
  • connectomes can be used to represent a variety of brain network types.
  • Structural brain networks are based on some combination of white-matter, grey matter, and/or molecular properties that capture direct or indirect neural connections among disparate brain regions. These properties are typically captured at a macroscopic level using Magnetic Resonance Imaging (MRI), diffusion Magnetic Resonance Imaging (dMRI), Positron Emission Tomography (PET), Magnetic Resonance Spectroscopy (MRS), Infrared Imaging (IR), Single Photon Emission Computed Tomography (SPECT), and Computed Tomography (CT), but will likely we observable through other Magnetic Resonance Imaging (MRI), diffusion Magnetic Resonance Imaging (dMRI), Positron Emission Tomography (PET), Magnetic Resonance Spectroscopy (MRS), Infrared Imaging (IR), Single Photon Emission Computed Tomography (SPECT), and Computed Tomography (CT), but will likely we observable through other Magnetic Resonance Imaging (MRI), diffusion Magnetic Resonance Imaging (dMRI),
  • fMRI Magnetic Resonance Imaging
  • fNIRS functional Near-Infrared Spectroscopy
  • EEG Electroencephalography
  • Magnetoencephalography MEG
  • Functional brain networks derived from these same modalities can also be described dynamically—i.e. as a time-series consisting of multiple functional brain networks across discrete sliding windows.
  • This perspective is a close relative to traditional functional connectivity that is sometimes referred to as‘effective’ connectivity in that it can facilitate dynamical causal modeling (DCM) or network information flow.
  • DCM dynamical causal modeling
  • dMRI data not only includes a given diffusion image (typically in nifti, .trk, .cifti, or other similar format) consisting of bO reference images and as many‘diffusion weighted’ volumes as there are directions acquired from the dMRI acquisition, it also includes an accompanying gradient table that characterizes the magnitude and orientation of those weighted volumes, all of which is stored in two accompanying text files - the b- values and b- vectors.
  • Structural or functional brain networks can be represented at various resolutions of nodes and edges that can be both weighted (i.e.
  • Brain networks can be both directed (i.e. directionality/asymmetry is encoded in edges based on some directionality of the connections) or undirected (directional information is not encoded in the edges).
  • nodes can be defined using any of several methods that include the following: 1) atlas-defined (i.e. based on some a-priori digital brain atlas composed of sulci and gyri cortical surface representations, or subcortical volumes, whereby each relevant brain region is assigned some index as an intensity value); 2) anatomically- defined (i.e. based on an individual’s structural MRI image that has been digitally-parcellated into relevant brain regions); or 3) cluster-defined (i.e. based on spatially-distinct clusters of functional activation). Accordingly, nodes can be defined based on labels (i.e.
  • nodes can be defined in several ways, they can further be‘reduced by affinity’ (vis-a-vis selecting a subset of nodes that fall within the spatial constraints of RSN’s or some manually-defined restricted network. For example, a core set of both 7 and 17 RSN’s, as defined by Yeo et al. 2011 and redefined recently by Shaefer et al. 2018 have become key networks of interest for both research and clinical purposes.
  • Edges of brain graphs are defined using an entirely different set of techniques, depending on whether the graphs are derived from functional or structural neuroimaging modalities.
  • the edges are determined using a connectivity model ‘estimator’ applied to some individual X’s time-series data (i.e. from fMRI, fNIRS, EEG, MEG, or another functional neuroimaging modality).
  • these connectivity models are based on one of two primary‘families’ of statistical relation— correlation and covariance.
  • the correlation family consists of both parametric and non-parametric approaches such as Pearson’s and Spearman’s rho correlation and partial correlation.
  • the covariance family consists of both traditional covariance estimation as well as a variety of Gaussian Graphical Models (GGM), in which the joint distribution of a set of random variables is assumed to be Gaussian and the pattern of zeros of the covariance matrix is encoded in terms of an undirected graph.
  • GGM Gaussian Graphical Models
  • the most common GGM is the inverse of the covariance matrix, also called the“precision” matrix, which is inherently sparse and thereby capable of representing only direct (as opposed to“indirect”) connections between nodes.
  • edges are most commonly determined by estimating the number and/or integrity of white-matter streamlines connecting nodes, which often consist of grey matter volumetric information.
  • This can be accomplished using either deterministic or probabilistic tractography— two common methods for using underlying directional information from dMRI (i.e. based on degree of isotropic diffusion of water molecules throughout the brain) to iteratively‘track’ white-matter connections between nodes.
  • This tracking process intimately depends both on a the type of diffusion model fit to the data, as well as the method of tractography used once the model is fit.
  • diffusion models include but are not limited to Constrained Spherical Deconvolution (CSD), tensor, ball-and-stick, and qball.
  • Structural and functional brain networks can also be modeled at higher resolutions.
  • the "microscale” - human neural networks are described neuron- by neuron.
  • a "mesoscale" connectome attempts to model anatomically and/or functionally distinct neuronal populations at a spatial resolution of hundreds of micrometers. Even at this broader scale, however, existing neuroimaging technology is remains poorly suited for fine-grained study of brain networks.
  • a‘transcriptome,’ refers to the sum total of all the messenger RNA proteins expressed from the genes of an organism.
  • Subgraphs of the transcriptome are referred to as gene regulatory networks. Both the transcriptome and gene regulatory network subgraphs can be studied using weighted gene co-expression graph analysis.
  • the more basic co-expression inference network model that one can find in the literature consists in calculating first the linear pairwise correlation coefficient r of all possible pairs of genes, and then, establishing a link between those gene pairs that show a‘large enough’ value of r.
  • the natural assumption behind this construction process is that a large value of the correlation coefficient signifies some functional relationship among the pair of genes involved.
  • a fixed cutoff (r / ) for the squared values of r is often used so that if r is larger than the cutoff, then a link between the pair of genes is established; contrarily, if r is smaller, the gene pair remains unlinked.
  • the value of the fixed r cutoff is a freely varying hyperparameter.
  • edges that represent‘reliable’ relationships among genes.
  • the networks thus inferred tend to contain a small number of edges, which results in a large number of isolated network vertices that, as in the case of brain networks, should be‘pruned’ before extracting global or local network metrics from the graph.
  • neural phenotypes both structural and functional
  • This approach leverages brain-wide atlases of gene expression, which quantify the transcriptional activity of thousands of genes across many different anatomical locations.
  • the broad anatomical and genomic coverage of brain wide gene expression atlases makes it possible to comprehensively map the molecular correlates of spatially distributed network properties, thus helping to bridge the gap between the transcriptome and connectome of the brain.
  • semantic network is used when one has knowledge that is best understood as a set of concepts that are related to one another.
  • Most semantic networks are cognitively based and are often used as a form of knowledge representation. They also consist of arcs and nodes which can be organized into a taxonomic hierarchy.
  • Semantic networks can be directed or undirected graphs consisting of vertices, which represent concepts, and edges, which represent semantic relations between concepts, mapping or connecting semantic fields.
  • semantic networks can be computationally generated using various forms of Natural Language Processing (NLP) which can parse metadata from text information acquired from audio recordings, text messages, emails, and other sources of semantic information. This metadata can then be used as a basis for defining connective relationships between words, phrases, and ideas based on a wide variety of criteria.
  • NLP Natural Language Processing
  • Definitional networks emphasize the subtype or is-a relation between a concept type and a newly defined subtype.
  • the resulting network also called a generalization or subsumption hierarchy, supports the rule of inheritance for copying properties defined for a supertype to all of its subtypes. Since definitions are true by definition, the information in these networks is often assumed to be necessarily true.
  • assertional network the information in an assertional network is assumed to be contingently true, unless it is explicitly marked with a modal operator.
  • They may be used to represent patterns of beliefs, causality, or inferences.
  • Executable networks include some mechanism, such as marker passing or attached procedures, which can perform inferences, pass messages, or search for patterns and associations.
  • Learning networks build or extend their representations by acquiring knowledge from examples.
  • the new knowledge may change the old network by adding and deleting nodes and arcs or by modifying numerical values, called weights, associated with the nodes and arcs.
  • Hybrid networks combine two or more of the previous techniques, either in a single network or in separate, but closely interacting networks. Behavioral Networks
  • nodes When investigating personality, nodes can represent cognitions, motivations, emotions, symptoms in the case of mental illness, and behavioral tendencies including geographical movement and activity patterns, that can vary across individuals or occasions. Nodes can be assessed by single items in questionnaires or interviews, or by aggregates of items, for instance personality facets or Ecological Momentary Assessment (EMA). The choice of an appropriate level of investigation (e.g., items, facets, or even broader traits) depends on which level is most useful for investigating the phenomenon of interest. For personality and psychopathology research, edge weights and directional encoding of edges are fundamental, because they allow distinguishing between intense and weak and between positive and negative associations among variables. Edge direction has been used in psychology particularly for representing temporal dependencies.
  • Examples of sources of data in psychology include participants' rating on an object of interest (e.g., themselves, a peer, or a situation) collected only once (cross-sectional studies) or many times (e.g., as in EMA studies).
  • objects of interest e.g., themselves, a peer, or a situation
  • networks can be computed both on cross-sectional and longitudinal datasets, disentangling the variation due to individuals’ deviations from routine versus situations of novelty is critical.
  • correlation networks can be used, the most common method for cross-sectional data has been to elaborate partial correlation networks, which are equivalent to standardized Gaussian Graphical Models (GGM’s).
  • GGM Gaussian Graphical Models
  • the study of social networks extends the notion of behavioral networks to encompass the full web of social relations of an individual in terms of verbal and non-verbal, active and passive interpersonal interactivity, including face-face, social media, and written/ telephonic communications. These relations include friends, family, acquaintances, peer groups, romantic partners, colleagues, sports team members, leaders, and subordinates among other relationship types.
  • This web of social connections can be represented as a‘sociogram’ - a graphical representation of both the immediate and distal social links that a person has.
  • Social networks can consist of direct or indirect linkages between an individual and others based upon shared attributes, shared attendance at events, or common affiliations.
  • connectomes as machine-learning features could provide the learner with a capacity for greater combinatorial generalization— i.e. constructing new inferences, predictions, and behaviors from known building blocks.
  • connectomes Given the power of networks to model high information complexity (i.e.‘dimensionality’) from data, connectomes that represent networks about individual persons may provide an enriched source of person-specific features for machine learning problems concerned with making predictions about individuals.
  • connectome features may be especially useful for augmenting machine-learning to more accurately forecast individual outcomes.
  • the derivative network organizational measures resulting from graph analysis can themselves be used as features.
  • graph analysis i.e. many of which are scalar values
  • one particularly promising avenue may be the use of connectome-embedding or ‘graph-analytic embedding’ to dramatically enhance the precision of diagnostic and treatment-matching algorithms in machine-learning of computational medicine.
  • connectomes that has the unique capability to perform ensemble sampling of connectome networks.
  • those connectomes corresponding to a given combination of hyperparameters can then either be selected or discarded based on any of a variety of feature- selection approaches.
  • a hyperparameter‘grid-search’ can be conducted with cross-validation to determine those networks that maximally contribute to machine-learning model prediction. Networks can even be selected at random from the sampled ensemble, filtered by some minimum variance and collinearity threshold.
  • Bayesian techniques such as Markov-Chains can be used to guide the selection of ensemble graphs based on known-priors about the impact of unique hyperparameter combinations on network reproducibility.
  • ensemble connectomes incorporating multiple networks (i.e. referring the same person, but from different vantage points defined by different combinations of hyperparameters) into ensemble connectomes
  • more stable (and hence generalizable) machine-learning algorithms could be developed while accommodating for very high-dimensionality network features.
  • ensemble connectomes features in machine-learning more directly, a special variety of ensemble machine learning algorithms may be particularly appropriate such as Random Forests or Gradient Boosting Machines.
  • Random Forests or Gradient Boosting Machines.
  • ensemble methods such as these often produce more accurate solutions than a single model would.
  • the advantage of these methods is that they provide a collaborative filtering algorithm through an ensemble of‘weak learners’.
  • Examples of ensemble machine learning methods in a context where ensemble connectomes are used, might include majority or weighted voting (i.e. for classification problems) or simple or weighted averaging (i.e. for regression problems) of the connectomes.
  • majority or weighted voting i.e. for classification problems
  • simple or weighted averaging i.e. for regression problems
  • Bootstrap Aggregating also known as‘Bagging’
  • multiple predictive models can be generated based on multiple random subsamples of the connectome ensemble using bootstrap resampling.
  • ensemble connectomics does not necessitate that ensemble machine-learning algorithms be subsequently used; rather, in certain cases, deep learning and other alternative methods may be warranted whereby the consensus aggregation of the connectome ensemble occurs as a preliminary feature-selection step instead of during the model training process itself.
  • multidimensional connectotypes anchored to the same latent construct— the person— might lend to training more information-exhaustive models with an even greater capacity for combinatorial generalization.
  • connectome networks unique to a given modality contain overlapping or complementary information with other networks generated in their ensemble, and/or networks across independent modalities contain comparable node-edge definitions, these can be combined hierarchically into so-called‘multigraphs’.
  • These hypergraphs can be used as additional features that capture some further level of emergent complexity beyond that which is available when relying on each modality-specific graph considered independently.
  • connectome features are to theoretically draw from multiple layers of individual data and across grids of hyperparameters, the compute resources needed to perform such ensemble sampling will necessarily be greater.
  • Connectome features derived from structural and functional brain data for instance, have already proven difficult to obtain due to the often unwieldly computational expense of processing such data.
  • microscopic gene assays and macroscopic social networks often involve millions of data points whose information is virtually meaningless without computationally rigorous forms of data munging and dimensionality reduction. Consequently, the next wave of connectomic analysis will require immense computing power and a pipelining framework for graph generation and analytics that can accommodate.
  • the processes implemented in biomedical software as proposed herein constitute a network feature-engineering tool, but the disclosed invention specifically outlines a pipeline of processes that can accommodate for the unique feature-engineering needs of ensembles connectomes specific to data acquired about individual persons.
  • other tools presently exist for automated feature-engineering (e.g. Deep Feature Synthesis (DFS)), but none of these explicitly handle network data.
  • DFS Deep Feature Synthesis
  • the Brain Connectivity Toolbox the Connectome Visualization Utility, and other related open-source research software, provide libraries of tools that can be used for network estimation, graph-analysis, and machine learning, but they do not provide a workflows for performing these operations in the manner described in the claims herein.
  • NDMG NeuroData MRI Graphs
  • CPAC Configurable Pipeline for the Analysis of Connectomes
  • CMTK Connectome Mapper
  • the disclosed pipeline processes actually consist of a‘meta- workflow’ that anchors several nested workflows, allowing for at least four layers of parallelization (across individuals, across data modalities, across hyperparameters, and across modality-specific vectorized functions) that consist of highly modular‘nodes’ of workflow operation.
  • the disclosed processes implemented in software represent the hyperparameter iterable expansion of all user-specified hyperparameter combinations. Consequently, the proposed series of processes constitute the only existing technology capable of automatically optimizing its own resource scheduling to accommodate for its added computational burden.
  • the proposed invention is also interoperable with any existing or future graph generating workflow unique to other data modalities not stated herein.
  • Such third-party graph-estimation workflows can optionally be included as‘plugins’ through external software dependencies such that those workflows too can be used to samples graph ensembles based on other person-specific data modalities not stated herein.
  • the claims described in the present disclosure showcases one such modality-specific sub-workflow (i.e. for generating consensus networks from functional MRI (fMRI) data) that can be executed using the disclosed pipeline.
  • That innovation here consists of a recipe for fully-automated and massively parallel network estimation routines, integrated across any person-specific data modality, and with the flexibility to vary any or all sets of network hyperparameters.
  • technology name is currently trademarked, and the associated machine-instructed syntax is copyrighted, the processes the constitute the fundamental innovation require protection.
  • the proposed invention will allow for the generation of more robust hierarchical and ensemble connectome features, the majority of which have never been created or studied before simply as a result of their computational inaccessibility. Further, the connectome outputs produced by the disclosed processes implemented in software will constitute a rich source of readily-available, person-specific network
  • the proposed processes implemented in software involves a four-stage process of graph generation and analysis, the combination of which could theoretically be implemented using any object-oriented programming language.
  • these four stages have been so far implemented using the Python programming language, with elements of Javascript (e.g. D3, node.js), and C++, with SQF, and R statistics interfaces as well.
  • Python syntax allows for maximal interoperability with existing and future workflows that produce person-specific graphs from raw data.
  • This prototype implementation (PyNetsTM) relies on several key package dependencies, all of which are open-source, but most heavily draws from: 1) Nipype
  • the four-stage workflow broadly consists of the following:
  • Stage 1 User input and parsing to configure and initialize a‘meta- workflow’ of workflows.
  • Stage 2 (22) Modality-specific workflow selection (i.e. nested within the 'meta- workflow').
  • stage 1 data inputs to stage 1 are assumed to be maximally noise-free and preprocessed. While there are no strict requirements for what constitute an adequate level preprocessing, the assumption is that this level aligns with standard accepted practices, specific to the modality, for raw data processing in preparation for network analysis.
  • All or part of the technology disclosed herein may be implemented as a computer program that includes object-oriented instructions stored on one or more non-transitory machine- readable storage media, and that are executable on one or more processing devices. All or part of the systems and techniques described herein may be implemented as an apparatus, method, or process that may include one or more processing devices and memory to store executable instructions to implement the stated functions.
  • the details of one or more implementations are set forth in the accompanying drawings and the descriptions that follow. Other features, objects, and advantages will be apparent from the description and drawings, and from the specific claims. To provide concrete examples of the technology’s use in context, we additionally provide a set of two embodiments in the final section of the specification.
  • FIGURE 1 is a visualization of a hierarchical multigraph spanning seven independent data modalities for which the disclosed invention is equipped to accommodate (from bottom to top : genetic transcriptome/ gene regulatory network, molecular neural network,
  • microstructural brain network functional brain network
  • cognitive/semantic network behavioral network
  • social network social network
  • FIGURE 2 is a Directed Acyclic execution Graph (DAG) that depicts the sequence of operations and parallelism anticipated at the instantiation of the run script.
  • DAG Directed Acyclic execution Graph
  • each flow block corresponds to classes for an exemplar ensemble connectome workflow for a single individual for a single modality (fMRI).
  • the included classes of the workflow are both modularized across workflow‘layers’ which reflect various degrees of workflow‘nesting’ and parallelism in a meta-programming framework.
  • background greyscale intensity reflects underlying nesting dimensions— i.e.
  • white corresponds to single-subject parent workflow input data, analysis, output data, and aggregation of output data
  • dark grey correspond to‘meta- workflow’ that chains, synchronizes, and aggregates domain-specific workflows corresponding to separate data modalities (e.g. functional, structural, or genetic connectome workflows, etc.);
  • light grey corresponds to a nested modality-specific sub workflow.
  • the multiple lines depicted across this figure reflect a mapping of‘iterable’ hyperparameters specified at runtime that propagate as spawning threads of downstream stages in the workflow. When iterated, some parameters trigger further‘exponential’ spawning of threads from upstream inputs, and are here depicted as grey boxes rather than ovals.
  • FIGURE 3 is a flow diagram that all four stages of the workflow, with sub-stages, parameters, inputs, and outputs included at the level of function classes, with process descriptions included where appropriate (listed numbers highlighted rectangular boxes adjacent to flow blocks correspond to numbers as referenced in the detailed description from the section that follows).
  • Stage 1 a variety of inputs (boolean, string, and numeric) are specified by the user (157)(153) as command-line options or through a Graphical User Interface (GUI) (156)(147) on a cloud or dedicated server. This can occur within a local, remote, or containerized environment (154)(155)(158), and involves running one of several available initialization scripts responsible for building and executing the workflows at runtime (4).
  • the types of options (7)(135)(9)(146) that can be specified by the user during this pre-runtime
  • configuration stage (1), (133), (134), (135), (136) include setting modality-unspecific hyperparameter values (22)(24) as well as modality-specific hyperparameter values
  • Options also include the specification of file paths (11)(2)(60)(58)(142)(59) whose inclusion are used to determine the logic of the workflow stages that are subsequently instantiated (61). These options are then parsed at runtime and assigned to variables, which then further instantiate and configure the appropriate workflow through a set of conditional triggers whose logic consists of hard-coded parameter compatibility rules (5). All invalid combinations of runtime parameters (e.g. hyperparameters specific to one data modality are specified when no valid file input from that modality was correspondingly supplied by the user) are either ignored (i.e. in the case that other valid combinations of options are included) or preset to raise anything from a warning (177)(178) (i.e.
  • Command-line inputs can be applied to data for single individuals (6)(12)(16), or in‘batch’ across multiple individuals (i.e. if file paths to multiple datasets are provided as input) (6)(14)(17)(18)(19).
  • minimally required file inputs must be specified manually with explicit path strings (i.e. or browse option in the case of GUI usage), but can also be auto-detected using the metadata of structured databases (146) whose format follows an established or custom specification protocol (e.g. Brain Imaging Data Structure) (4).
  • the user- specified options which are unrelated to file inputs e.g. hyperparameter selection
  • single or multiple files available through the standardized data structure specified will be identified based on the user-specified list of unique identifiers and if they exist will be passed as lists of file inputs in a manner that mimics the manual specification of files (4)(6)(23)(25).
  • resource-handling (78) is also addressed initially by way of the initialization script (10) and user-specified runtime options (11).
  • initialization script 10
  • user-specified runtime options (11)
  • scheduler plugins 80
  • CPU/GPU/Memory restrictions of the workflow 79
  • can either be auto-set i.e. by detecting available resources to utilize
  • manually overridden based on user specification
  • auto-set with accompanying dynamic job-scheduler optimization.
  • Figure 2 that can be exploited for network flow optimization analysis by dynamically referencing incoming resource profiler data against known priors of resource consumption stored in the DAG’s attributes.
  • the flow network analysis is useful in this scenario due to what can often be an overwhelming computational load when generating connectome ensembles.
  • the network flow optimization aims to balance external supply of compute resources with the greedy compute demands consisting of a set of scheduled‘job’ that require some overarching prioritization heuristic to ensure maximal efficiency (88).
  • several user-defined objectives can be used for the flow optimization, including but not limited to: minimizing compute dollar spending (170), minimizing overall runtime duration (171), and minimizing computational-load (89).
  • two core runtimes parameters (21) specified by the user (137) during Stage 1 (23) universally for all types of data modalities are: a) connectivity model type (27); and b) graph thresholding approach (25).
  • any one of a variety of connectivity models can be used, including covariance (29), correlation (30), and/or a modality-specific definition (43).
  • the covariance family (29) includes empirical covariance (35), Ledoit-Wolfe‘shrinkage’ covariance (37), and graphical lasso methods (36), which includes various implementations of Ll and/or L2 penalized covariance
  • the correlation family (30) includes Pearson’s correlation (32) and Ll and/or L2 penalized forms of‘partial’ correlation (33). If a method from the correlation family (30) is indicated, then 130 in Stage 3 (121) will automatically include a Fischer’s r-to- z transformation (102) to standardize the resulting graph edge weights.
  • modality specific definitions of connectivity (43) can also be used beyond correlation/covariance. Examples of these include fiber count (50) and fiber integrity (49) for the dMRI modality, as well as spectral coherence (52) and phase synchrony (53) for the EEG/MEG modality.
  • connectivity model types can be specified for the connectivity model (27) options, in which case they would be indicated by the user at (1)(138) as a comma- separated list of length L , in which case Stages 2-3 (22)(3) will be reinitiated for each of the graphs produced through the expanded L.
  • graph thresholding hyperparameters (b) (25) consist of multiple sub-parameters including types (65) and schemes (66) that can be specified.
  • Types (65) include: both global and local thresholding (69).
  • Global thresholding consists of proportional thresholding (67), absolute thresholding (68), density thresholding (118).
  • Local-thresholding (10) further consists of multiple techniques for network reduction. These include but are not limited to a disparity filter or thresholding via the Minimum Spanning Tree (MST) (74). For each of the three forms of global thresholding (67)(68)(118), various thresholding schemes can be used.
  • Iterative thresholding (70) requires the additional input of minimum, maximum, and step interval values to generate a window of unique combinations of multiple thresholds over which to iterate the selected thresholding type(s).
  • the length / tet of the resulting window of thresholds will be multiplied by the other cumulative iterable parameters specified in Stage 1 (23), and separate graphs (i.e. for each unique combination L *p! will be produced by reinitiating Stages 2-3 (22)(3) accordingly.
  • hyperparameters specified by the user at runtime (138)(126)(127) are modality-specific, but impact a variety of fundamental characteristics of how the nodes and edges of a graph can be defined (i.e. the number of nodes or sizes of nodes).
  • Stage 1 (126)(23) trigger (and are passed into) Stage 2 (22), which involves generating domain-specific graphs by spawning corresponding domain-specific‘nested’ workflows (34)(51)(60)(91)(54)(31)(90)(57) that serve to produce graphs using the various procedures specific to each unique modality.
  • the graphs produced from the nested workflows in Stage 2 (22) are then used as inputs for stage 3 (3).
  • This Stage first involves loading the ensemble of graphs generated from Stage 2 (22), optionally standardizing graph edge weights (z-scoring, loglO, or confound regression) (34), and converting the graphs into network‘objects’ (104) that can in turn be analyzed either as single-layer graphs (106), multi-layer (i.e. hierarchical) graphs (105), or both.
  • Automated graph analysis is then performed using a variety of conventional global and local graph analysis algorithms (107)(108)/(111)(112), some of which will be hard-corded to skip if they are deemed non-applicable a priori (e.g. in the case of disconnected graphs with isolated vertices where the‘pruning’ option is not specified at runtimes).
  • Some graph metrics are likewise hard-coded to be triggered only in the presence of multimodal data inputs (e.g. versatility (113) measures of genetic-brain structural multigraphs). These measures are calculated iteratively during the fully-automated graph analysis in Stage 3 (3) for each of the unique graphs produced in Stage 2 (22) for each unique combination of hyperparameters specified by the user in Stage 1 (23).
  • the graphs and other derivative and temporary files produced in Stage 2 (22), along with the graph measures calculated in Stage 3 (3), are saved to disc as lightweight, compressed file objects to avoid being held in memory cache while minimizing required disk space (i.e. in the case where large graph ensembles are to be produced (109)(110)(114)(115).
  • These files can be saved as SQL-like entries, in simple text- based formats (.csv, .ssv, .tsv, .txt, .pkl, and others not disclosed herein). Consequently, a dictionary of associated Stage 2 (22) and Stage 3 (3) file output locations is made available to all nodes of the parent and nested workflows so that they can be parsed as needed at subsequent stages of execution.
  • the dataframes of derivative graph measures (109)/(114) produced in Stage 3 (3) become inputs for the consensus analyses that can be performed in Stage 4 (98).
  • the graph measures are aggregated from each unique graph in the current workflow’s ensemble that successfully spawned from Stages 2-3 (22)(3).
  • Any of a variety of user- configurable methods of consensus analysis (101)(100)(123)(99)(102) can then be used to generate a single annotated summary dataframe that is ultimately saved to disk (178).
  • this dataframe can include a simple aggregated database (98) of graph measures produced from Stage 3 (3), a database of basic measures of central tendency (e.g.
  • each dataframe that can be produced includes headers labeled by each hyperparameter combination so as to provide a reference for each corresponding graph measure (116), along with a profile of hyperparameter-dependent variance of each graph measure. This information is critical for use in cross-validation procedures (122) integral to machine-learning conducted by third-party tools beyond the scope of this disclosure.
  • the final summary dataframe is ultimately written to disk, signifying termination of all layers of workflows (142).
  • a user has a noise-reduced functional MRI (fMRI) data for a given individual with an arbitrary unique identified of 997. That dataset, called fMRI
  • filtered_func_data_clean_standard.nii.gz is in‘nifti’ format and here consists of 182 volumes of 3D image matrices whose point values represent signal intensity of hemodynamic response as captured from the MRI machine.
  • the user also has the individual’s noise-reduced gene expression data in the form of a .txt file called‘expression_data.txt’ where rows correspond to a given genetic marker and columns consist of n independent measurements of the mRNA gene expression levels for each corresponding genetic marker.
  • the user wishes to generate hierarchical ensemble connectome features based on this multimodal genetic-fMRI data that can be used for subsequent ensemble machine-learning predictions of the given individual’s likelihood of Alzheimer’s disease.
  • DNN Default-Mode Network
  • fMRI measurements of DMN connectivity have been shown to vary considerably across brain atlases, network sparsity, and connectivity model, the user wishes to perform an ensemble sampling across several relevant hyperparameters.
  • these include five different brain atlases (e.g. 'coords_power_20H',‘coords_dosenbach_20l0’,‘aal’,‘desikan’, ‘CPAC200’), but also threshold the resulting networks iteratively to achieve multiple target densities of (e.g. 0.2, 0.3, 0.4, 0.5), as well as defined connectivity in a number of ways (e.g. correlation, covariance, Ll -penalized covariance, L2-penalized partial correlation).
  • multiple fixed‘r’ minimum thresholds e.g. 0.4, 0.5, 0.6,
  • pynets_run.py is the pre-runtime configuration script that triggers the main workflow consisting of a series of nested sub- workflows (i.e. one specific to each modality— in this case, a sub- workflow for handling the fMRI data and a sub- workflow for handling the genetic data).
  • Command-line‘flags’ are used to indicate various file modality types, and commas are used to separate‘iterable’ hyperparameter values (semi-colons enable these hyperparameter windows to be specified per modality in the order of modality-specific data files specified).
  • FIGURE 2 depicts the fMRI components of the workflow, with accompanying description in the subsequent section.
  • a user has a multi-subject dataset consisting of 10 individuals with noise-reduced functional MRI (fMRI) data in‘nifti’ format (here consisting of 182 volumes of 3D image matrices whose point values represent signal intensity of hemodynamic response as captured from the MRI machine), along with diffusion MRI (dMRI) data (here consisting of a b-values text file, and b- vectors text file, and a‘nifti’-formatted file containing 72‘directional’ volumes of 3D image matrices whose point values represent signal intensity of water diffusion as captured from the MRI machine).
  • fMRI noise-reduced functional MRI
  • dMRI diffusion MRI
  • the user wishes to generate hierarchical ensemble connectome features for all individuals in the given dataset based on this multimodal dMRI-fMRI data that can be used for subsequent ensemble machine-learning predictions to determine their likelihood of responding favorably to a new experimental medication for their Major Depressive Disorder.
  • ECN Executive Control Network
  • functional brain network disturbances of the so-called‘Executive Control Network’ (ECN) is a biomarker of interest for predicting treatment response to mood disorders like depression, particularly when underlying structural brain network information is also available to characterize the brain’s neuroplasticity (i.e. its adaptive potential to reshape neural connections based on different responses to stimuli).
  • both the fMRI and dMRI data files, for each subject are used as inputs to the software.
  • BIDS Brain Imaging Data Structure
  • the entire BIDS-formatted dataset lives in his home directory: /home/users/user l/clinical_MRI_data.
  • Subjects in the dataset have the following uniquye identifiers: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10.
  • the user will be able to process all of the individuals’ data simultaneously (i.e. in a single execution) to produce the hierarchical ensemble network features that he needs to make his machine-learning predictions.
  • ECN connectivity Given that fMRI measurements of ECN connectivity have been shown to vary considerably across brain atlases, network sparsity, and connectivity model, the user wishes to perform an ensemble sampling across several relevant hyperparameters. These include five different brain atlases (e.g. 'coords_power_20H',‘coords_dosenbach_20l0’,‘aal’,‘desikan’, ‘CPAC200’), but also threshold the resulting networks iteratively to achieve multiple target densities of (e.g. 0.2, 0.3, 0.4, 0.5), as well as defined connectivity in a number of ways (e.g. correlation, covariance, Ll -penalized covariance, L2-penalized partial correlation).
  • brain atlases e.g. 'coords_power_20H',‘coords_dosenbach_20l0’,‘aal’,‘desikan’, ‘CPAC200’
  • target densities of e.g. 0.2
  • structural brain networks derived from dMRI data are known to vary as a result of several hyperparameters that impact tractography.
  • the user will include a list of multiple step sizes (e.g. 0.1, 0.2, 0.3) and curvature thresholds (e.g. 3, 6, 12, 24, 48) to constrain the tractography, in addition to the same multiple definitions of brain atlas and threshold window used for the fMRI data.
  • the connectivity type for the dMRI data is measured beyond statistical covariance measures (e.g. fiber count, fractional anisotropy).
  • the user specifies the aforementioned hyperparameters and file inputs on the command-line of a Linux operating system running on a 500-CPU cloud server hosted by Amazon Web Services (AWS).
  • AWS Amazon Web Services
  • the user wants to ensure that the software’s execution is optimized for minimizing dollar compute cost when using all 500 cores, restricting overall memory usage to 500 GB, and verbosely logging all processing at runtime. That command line call appears as follows: pynets_run .
  • pynets_run.py is the pre-runtime configuration script that triggers the main workflow consisting of a series of nested sub- workflows (i.e. one specific to each modality— in this case, a sub- workflow for handling the fMRI data and a sub- workflow for handling the dMRI data).
  • Command-line‘flags’ are used to indicate various file modality types, and commas are used to separate‘ iterable’ hyperparameter values (semi-colons enable these hyperparameter windows to be specified per modality in the order of modality-specific data files specified). Additionally, those flags the specify hyperparameters specific to each modality (i.e.
  • Files with the suffix '_neat.csv' within the dMRI subdirectory will contain the network properties for each of the 600 combinations of dMRI hyperparameters.
  • Files with the suffix '_neat.csv' within the fmri_dmri subdirectory will contain the network properties for each of the 48000 combinations of fMRI-dMRI multigraph hyperparameters.
  • Files with the suffix '_mean.csv' within each individual’s respective base subject directory contain consensus measures along with a dictionary of file paths and dataframes of each graph measure across all hyperparameters across both modalities. The latter can in turn be fed into an ensemble classifier to make highly precise predictions about each of the ten individual’s likelihood of responding favorably to the experimental treatment for Major Depressive Disorder.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Selon la présente invention, les procédés existants d'analyse des « connectomes » spécifiques à une personne ne sont pas équipés en termes de capacité de calcul pour mettre en œuvre un traitement évolutif, flexible et intégré sur de multiples résolutions de réseau et pour mobiliser des modalités de données disparates - un frein majeur à l'utilisation d'ensembles et de hiérarchies de connectomes pour résoudre des problèmes d'apprentissage automatique spécifiques à une personne. Les processus mis en œuvre dans un logiciel selon la présente invention se composent d'un pipeline de bout en bout permettant de déployer des ensembles et des hiérarchies de flux de travaux de génération de réseau qui peuvent utiliser des données multimodales, spécifiques à une personne, pour échantillonner des réseaux, extraits de ces données, sur une grille d'hyperparamètres définissant un réseau. Essentiellement, ce pipeline permet à des utilisateurs d'effectuer un échantillonnage d'ensemble de connectomes pour un ou plusieurs individus donnés sur la base de n'importe quel type de données phénotypiques d'entrée, construit à partir de n'importe quelle modalité de données ou de n'importe quelle hiérarchie de modalités à n'importe quelle échelle, et sur la base de n'importe quel ensemble d'hyperparamètres définissant un réseau.
PCT/US2019/025260 2018-03-30 2019-04-01 Ingénierie automatisée des caractéristiques de connectomes d'ensemble hiérarchique Ceased WO2019191784A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US2018761596 2018-03-30
USPCT/US2018/761596 2018-03-30

Publications (1)

Publication Number Publication Date
WO2019191784A1 true WO2019191784A1 (fr) 2019-10-03

Family

ID=68060459

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2019/025260 Ceased WO2019191784A1 (fr) 2018-03-30 2019-04-01 Ingénierie automatisée des caractéristiques de connectomes d'ensemble hiérarchique

Country Status (1)

Country Link
WO (1) WO2019191784A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177902A (zh) * 2019-12-18 2020-05-19 北京安怀信科技股份有限公司 一种基于系统参数的总体设计工具
CN111466876A (zh) * 2020-03-24 2020-07-31 山东大学 一种基于fNIRS和图神经网络的阿尔兹海默症辅助诊断系统
WO2021157963A1 (fr) * 2020-02-03 2021-08-12 Samsung Electronics Co., Ltd. Procédé et appareil pour fournir des services informatiques de bord
US11188850B2 (en) * 2018-03-30 2021-11-30 Derek Alexander Pisner Automated feature engineering of hierarchical ensemble connectomes
CN117216530A (zh) * 2022-12-30 2023-12-12 北京九章云极科技有限公司 一种模型信息确定方法、装置及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060094001A1 (en) * 2002-11-29 2006-05-04 Torre Vicent E Method and device for image processing and learning with neuronal cultures
US9519981B2 (en) * 2011-11-04 2016-12-13 Siemens Healthcare Gmbh Visualizing brain network connectivity
US20170120043A1 (en) * 2005-01-21 2017-05-04 Michael Sasha John Programming Adjustment for Brain Network Treatment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060094001A1 (en) * 2002-11-29 2006-05-04 Torre Vicent E Method and device for image processing and learning with neuronal cultures
US20170120043A1 (en) * 2005-01-21 2017-05-04 Michael Sasha John Programming Adjustment for Brain Network Treatment
US9519981B2 (en) * 2011-11-04 2016-12-13 Siemens Healthcare Gmbh Visualizing brain network connectivity

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
H ASSAN ET AL.: "EEGNET: An Open Source Tool for Analyzing and Visualizing M/EEG Connectome", PLOS ONE, vol. 10, no. 9, pages e0138297, Retrieved from the Internet <URL:https://doi.org/10.1371/journal.pone.0138297> [retrieved on 20190625] *
SHI ET AL.: "Connectome imaging for mapping human brain pathways", MOLECULAR PSYCHIATRY, vol. 22, 2017, pages 1230 - 1240, XP055639061, Retrieved from the Internet <URL:https://www.ncbi.nlm.nih.gov/pubmed/28461700> [retrieved on 20190625] *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11188850B2 (en) * 2018-03-30 2021-11-30 Derek Alexander Pisner Automated feature engineering of hierarchical ensemble connectomes
CN111177902A (zh) * 2019-12-18 2020-05-19 北京安怀信科技股份有限公司 一种基于系统参数的总体设计工具
CN111177902B (zh) * 2019-12-18 2023-10-17 北京安怀信科技股份有限公司 一种基于系统参数的总体设计工具
WO2021157963A1 (fr) * 2020-02-03 2021-08-12 Samsung Electronics Co., Ltd. Procédé et appareil pour fournir des services informatiques de bord
US11445039B2 (en) 2020-02-03 2022-09-13 Samsung Electronics Co., Ltd. Method and apparatus for providing edge computing services
CN111466876A (zh) * 2020-03-24 2020-07-31 山东大学 一种基于fNIRS和图神经网络的阿尔兹海默症辅助诊断系统
CN111466876B (zh) * 2020-03-24 2021-08-03 山东大学 一种基于fNIRS和图神经网络的阿尔兹海默症辅助诊断系统
CN117216530A (zh) * 2022-12-30 2023-12-12 北京九章云极科技有限公司 一种模型信息确定方法、装置及系统

Similar Documents

Publication Publication Date Title
US11188850B2 (en) Automated feature engineering of hierarchical ensemble connectomes
Holzinger et al. Explainable AI methods-a brief overview
Liu et al. Costco: A neural tensor completion model for sparse tensors
Marle et al. Interactions-based risk clustering methodologies and algorithms for complex project management
WO2019191784A1 (fr) Ingénierie automatisée des caractéristiques de connectomes d&#39;ensemble hiérarchique
Craddock et al. Connectomics and new approaches for analyzing human brain functional connectivity
Saldana Ochoa et al. Beyond typologies, beyond optimization: Exploring novel structural forms at the interface of human and machine intelligence
Dash et al. Hybrid chaotic firefly decision making model for Parkinson’s disease diagnosis
Leng et al. Granular computing–based development of service process reference models in social manufacturing contexts
Tang et al. Global sensitivity analysis of a large agent-based model of spatial opinion exchange: A heterogeneous multi-GPU acceleration approach
Ji et al. Estimating effective connectivity by recurrent generative adversarial networks
Lu et al. A modified whale optimization algorithm for parameter estimation of software reliability growth models
Misirli et al. A mapping study on bayesian networks for software quality prediction
Lévy et al. DevOps model appproach for monitoring smart energy systems
Zhang et al. Temporal mapper: Transition networks in simulated and real neural dynamics
Ribeiro et al. Granger causality among graphs and application to functional brain connectivity in autism spectrum disorder
Tang et al. Diagnosis of autism spectrum disorder (ASD) by dynamic functional connectivity using GNN-LSTM
Febrinanto et al. Balanced graph structure information for brain disease detection
Gagliardi et al. PhysioEx: a new Python library for explainable sleep staging through deep learning
Butka et al. Methodologies for Knowledge Discovery Processes in Context of AstroGeoInformatics
Zhang et al. Amortization transformer for brain effective connectivity estimation from fMRI data
Jeon et al. Deep generative modeling with spatial and network images: An explainable AI (XAI) approach
Mei et al. Network analysis of multivariate time series data in biological systems: methods and applications
Lu et al. A domain-specific modeling approach supporting tool-chain development with Bayesian network models
Sathyaraj et al. Chicken swarm foraging algorithm for big data classification using the deep belief network classifier

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19776057

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19776057

Country of ref document: EP

Kind code of ref document: A1