Detailed Description
The application will be described in further detail below with reference to the drawings by means of specific embodiments. Wherein like elements in different embodiments are numbered alike in association. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted, or replaced by other elements, materials, or methods in different situations. In some instances, related operations of the present application have not been shown or described in the specification in order to avoid obscuring the core portions of the present application, and may be unnecessary to persons skilled in the art from a detailed description of the related operations, which may be presented in the description and general knowledge of one skilled in the art.
Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.
The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning. The term "coupled" as used herein includes both direct and indirect coupling (coupling), unless otherwise indicated.
Pre-training (Pre-training) plus Fine-tuning (Fine-tuning) training paradigms in the field of natural language processing have been widely used and studied in recent years. The method can effectively improve the generalization capability and performance of the model by performing generative pre-training on a large-scale corpus and then performing fine tuning on a specific task. In recent years, this pre-training plus fine-tuning training paradigm has also gradually been applied to the field of timing prediction. The data in the temporal prediction can be seen as a sequence in time, similar to a sequence of words in text. Therefore, the pre-training and fine-tuning training paradigm in the natural language processing field can be applied to time sequence prediction, and fine tuning is performed on the basis of a pre-training model aiming at a specific time sequence prediction task so as to improve the performance and generalization capability of the model. Many language pre-training models use a masked self-supervision training mechanism, which not only greatly reduces the labor cost of training, but also solves the scarcity of data to a certain extent.
The application provides a pre-training-based dynamic network flow regulation and control method and a system thereof. The dynamic network flow regulation and control method and the system thereof combine a space-time diagram neural network and a self-supervision time sequence generation model (namely a pre-training model) and accurately predict future multi-step flow information of dynamic network flow through long-time sequence characterization learning and fine-tuning training of the space-time diagram neural network. The application realizes accurate multi-step time sequence generation by pre-training the past long-time network flow time sequence data fragments in a self-supervision mode, and modifies the pre-training model to realize the downstream task of dynamic network flow prediction with lower training cost.
The technical scheme of the present application will be described in detail with reference to examples.
Referring to fig. 1, a method for dynamic network traffic regulation based on pre-training includes:
step S100: acquiring historical time sequence data from a network, and segmenting the historical time sequence data to obtain a plurality of time sequence fragments;
Step S200: inputting the time series segments into a trained predictive model for:
acquiring a potential feature representation of the time series segment;
constructing a discrete dependency graph based on the time series segments and the potential feature representations;
Constructing a space-time diagram based on the discrete dependency graph and the latest time sequence segment in the time sequence segments;
Predicting a future time sequence segment based on the potential feature representation of the latest time sequence segment and the space-time diagram to obtain a corresponding prediction result; wherein the future time series segment is located at a next time period of a latest time series segment among the time series segments.
One purpose of constructing a space-time diagram is to convert the space-time dependencies in the historical time series data into a graph structure.
The historical time series data is ubiquitous from traffic, energy to economy. For example, in intelligent traffic systems, sensors deployed on a road network constantly record traffic conditions. For example, the total number of vehicles passing through each sensor, the vehicle speed, etc. The vast amount of data includes a plurality of time series, each from a traffic sensor. Similar examples also occur in power systems, financial systems, such as power consumption in multiple areas, changes in multiple stock curves, etc. In addition, the specific procedure of "acquiring historical time series data from the network" belongs to common general knowledge in the art, and thus will not be described herein.
The present application uses a framework called STEP (Spatial-Temporal graphneural networks are Enhancedby a Pre-training model) in which the space-time diagram neural network is enhanced by an extensible time-series pre-training model (i.e., the pre-training model in STEP S200). In some embodiments, please refer to fig. 2, the pre-training model employs TSGPT (TSGPT is a generative pre-training model based on encoder-decoder architecture and self-attention mechanism, implementing self-supervised training without artificial labeling by means of masked self-coding mechanism) which aims to efficiently learn temporal patterns from very long-term historical time series data and generate segment-level representations (i.e., the latent feature representations described above). This fragment-level inclusion of rich context information is advantageous in solving the challenge of "space-time-diagram neural network (STGNN) submodule is not aware of context information outside the window". Furthermore, the learning of these segments represents the ability to calculate correlations between time series segments in combination with information of the entire long-term historical time series data, thus solving the challenge of "short-term information is unreliable for modeling of discrete dependency graphs", i.e. the challenge of lack of discrete dependency graphs. Wherein the discrete dependency graph is represented by similarities (or correlations) between the time series segments. The application uses an efficient, unsupervised pre-training model that is trained by masking an automatic encoding strategy. The trained pre-training model is able to effectively capture information of very long-term historical data (e.g., data of weeks or months) and generate segment-level representations that correctly reflect complex patterns in time-series segments. The graph structure learning sub-module based on the unsupervised pre-training model representation used in the application learns the discrete dependency graph and uses the kNN calculated based on the unsupervised pre-training model representation as regularization to guide the joint training of the graph structure and the time space graph neural network (STGNN) sub-module.
The training process of the prediction model in step S200 generally includes:
s1, cutting and masking historical time series data;
s2, constructing an encoder based on a transducer;
s3, constructing a first decoder based on a transducer;
s4, performing target reconstruction;
S5, selecting the encoder acquired in the step S4;
s6, building a graph structure study;
S7, constructing an enhanced downstream space-time diagram neural network;
and S8, training a prediction model to obtain the trained prediction model.
In some embodiments, for step S1, training samples of the time series segment are obtained from an original ultra-long time series (i.e., the historical time series data) using a sliding window. For example, traffic data at Th time points of each node in the network over a period of time (e.g., over two weeks) may be collected in real-time by the SDN controller. The above time point Th can be defined and determined by a person skilled in the art according to actual needs to determine the time point of collecting the history data.
In some embodiments, at each point in time Th, the SDN sub-controller may collect packet data associated with nodes of the network. The data comprises total length, packet loss rate, controller memory load and time stamp. These data are recorded to form a data point. The collected data points are organized into a sequence matrix S i (i.e., the historical time series data). Each row of the historical time series data S i represents a time point Th, and each column of the historical time series data may correspond to the total length of the data packet, the packet loss rate, the controller memory load and the timestamp. The SDN sub-controller (i.e., SDN controller) is an application in a Software Defined Network (SDN) that is responsible for flow control to ensure the intelligent network. The SDN controller is based on a protocol such as OpenFlow, allowing the server to tell the switch where to send the data packet.
In some embodiments, a person skilled in the art may determine the window length L and the predicted time step Tf for the data slice according to the actual requirements, where the selection of Tf and L may be controlled by the SDN subcontroller for scheduling.
In some embodiments, the input historical time series data S i may be segmented into a plurality of time series segments of length LThe above time-series fragmentFuture time series segments of length L are also used for predicting the time step Tf. Wherein the latest time-series segments (i.e. the latest time-series segments described above) are kept as verification sets. The person skilled in the art can define the latest time-series segment by himself, for example, the time-series segment of the preceding hour at the current time point in the above-mentioned time-series segment is the latest time-series segment.
In some embodiments, the masking rate may be set by those skilled in the art according to actual requirements. For example, the masking rate may be set to 75%, i.e., about 75% of the data will be masked in each time series slice. Only a portion of the data will be used in the generated output of the first decoder of the pre-training model.
In some embodiments, in each of the time-series segments, a portion of the data points may be randomly selected and a mask flag set according to the mask rate. These data points are not input into the predictive model (e.g., the second decoder of the predictive model and the graph structure learning sub-module), but are targeted for self-supervising tasks.
In some embodiments, a self-supervising task may be created in which the goal of the pre-training model is to generate data for the mask portion from the known fragments.
In some embodiments, the time-series segments may be generated in a sliding window of size h+l. The data in the front H length is used to represent historical data, and the data in the rear L length is used to represent future data, which correspond to the input of the model in machine learning and the true value to be predicted (i.e. the model inputs the front H part of data, tries to predict the rear L part of data) respectively. Each time the time window is slid, a time series of segments is generated. For example, the historical length of the historical time series data in the database is h=p×l, where L is the length to be predicted. That is, where the time series segment of the first p×l length is used as a training sample, the latest time series of the L length is used as a corresponding tag. The value of P here can be set by the user himself. For another example, the i-th time series in the historical time series data may be set to S i. The input sequence S i is divided into L non-overlapping segments of P length. Wherein the jth time series segment may be denoted as S j i. L is the usual length of the time series segment of the input space-time diagram neural network (STGNN) submodule. The partially non-overlapping segments are randomly masked (masked) at a high masking rate r of 75% and then restored, creating a challenging self-supervising task.
In some embodiments, the encoder of the pre-training model of the present application includes an input embedding layer and a multi-headed sparse self-attention sub-module (which includes a series of transducer models with position coding). The encoder only runs on the unmasked segments. Specifically, referring to fig. 2, the input embedded layer submodule is a linear projection for converting unmasked segments (i.e., unmasked segments) into hidden space: where W and b are the learnable parameters, d is the dimension of the hidden space, For the input vector (e.g. each unmasked segment),Is the corresponding potential spatial representation. Thereafter, the position-coding layer is used to add sequence information. For example, the position-coding layer may traverse the position-coding vector for each position i. A position-coding vector Upos is generated for each position i. The position-coding vectors are typically generated using sine and cosine functions to capture position information. The specific formula for generating the position coding vector is as follows: Where k represents the dimension of the position code, d represents the dimension of the hidden space, and i represents the position index. The potential spatial representation may then be presented And the position-coded vector U pos are added by element to obtain a new vector. The new vector is used to represent a feature vector that has been added to the location information, i.e.,Then, the feature vector X with_position added with the position information is input to a multi-headed sparse self-attention sub-module composed of a transducer model. Wherein the multi-headed sparse self-attention sub-module may comprise a 4-layer self-attention transducer model. Finally, the input sequence is encoded through a multi-head sparse self-attention sub-module, and the encoding obtains potential characteristic representation (i.e. instant time characterization) of the j-th unmasked segment (such as a time sequence segment Sj i) through trainingThe transducer model is a neural network model based on a self-attention mechanism and is used for processing sequence data.
In some embodiments, the first decoder of the pre-training model also includes a series of transducers models (e.g., the masked multi-headed self-attention sub-module in fig. 2). The first decoder reconstructs the potential feature representations back to lower semantic level, i.e., digital level, information. The first decoder operates on a complete set of fragments, including masking fragments. Wherein no more position vectors need to be added here, since all segments have already added position information in the encoder. The first decoder is only used during the pre-training phase to perform the sequence reconstruction task (i.e. the target reconstruction described above).
In some embodiments, the first decoder may use only a single layer transducer model. Finally, a multi-layer perceptron (e.g., a regression prediction layer in a pre-training model) is applied to predict, the number of output dimensions of which is equal to the length of each time series segment. Specifically, the potential features of a given segment j representThe first decoder generates a corresponding reconstructed sequence
In some embodiments, the original sequence (i.e., the above-described time-series segment Sj i) and the reconstructed sequence are calculatedMean Absolute Error (Mean-Absolute-Error) between: The average absolute error is used as a loss function of the target reconstruction step. Wherein n represents the total number of data points, such as the number of data points used to calculate MAE; representing an actual observed or actual target value (actual value), i.e. an actual flow data point in the data; Representing the predicted value generated by the first decoder, i.e., the predicted value of the corresponding data point by the first decoder.
The encoder of the prediction module is the encoder in the pre-training model that has completed training. The encoder of the pre-trained model may assist in learning the graph structure required by the space-time graph neural network (STGNN) submodule of the predictive model. The characterization of the encoder of the pre-training model may also be added as overlength history information to the space-time-diagram neural network (STGNN) submodule.
It should be noted that, for a specific process of constructing the discrete dependency graph corresponding to the time-series segment Sj i by the graph structure learning submodule according to the present application based on the potential feature representation H j i of the time-series segment (Sj i described above), reference may be made to the following related content "graph structure learning submodule in the" multi-element time-series prediction method and system based on pre-training enhancement "paragraph [0063] to [0072] of chinese patent document CN115688871, where the objective is to learn a discrete sparse graph. Specifically, the STEP framework is expected to learn a Bernoulli distribution parameter Θij from which a discrete dependency graph can then be sampled. First, regularization is introduced based on the TSFormer representation, providing supervisory information for graph optimization. Specifically, first willFeatures denoted as time series segments i, where l denotes a stitching operation. A kNN graph a a is then calculated between all nodes. Here the sparsity of the learning graph can be controlled by setting different k. Benefited from the ability of TSFormer, a a helps to guide the training of graph structures to reflect the dependencies between nodes. Then, Θ ij is calculated as follows: Θ ij=FC(relu(FC(Zi||Zj)));Zi=relu(FC(Hi))+Gi; where Θ ij is a non-normalized probability. The first dimension represents the probability of a positive (i.e., there is a relationship between time series) and the second dimension represents the probability of a negative. G i is a global feature of the time series segment i, formed by a convolutional networkThe result, where S train i is the entire training set time series, L train is the length of the training data set. S train i is static for all samples during training, helping to make the training process more robust and accurate. Feature H i is dynamic for different training samples to reflect the dynamics of the dependency graph. Thus, the present invention uses the cross entropy between Θ and kNN graphs a a as graph structure regularization:
Where Θ' ij=softmax(Θij) is the normalized probability. A final problem with the discrete diagram structure learning is that the sampling operation from Θ' ij to the adjacent matrix a is not trivial. Therefore, gumbel-Softmax reparameterization techniques are applied in STEP: a ij=softmax((Θij +g)/τ); where g is independently co-distributed extracted data from the gummel (0, 1) distribution. τ is the temperature parameter of softmax. Gumbel-Softmax converges to a discrete state when τ→0.
In some embodiments, the traffic data at the latest time point may be selected from the traffic data of each node of the network collected in real time by the SDN controller over a period of time (e.g., over two weeks), for example, the data of the previous hour at the current time point.
It should be noted that the discrete dependency graph is used to capture the spatial dependency relationship between nodes in the time series segment. For example, the connection relationship between nodes: edges in a discrete dependency graph represent whether there are connections or dependencies between different nodes (if there are edges between two nodes, there may be some kind of association or interaction between them); for another example, the strength of the dependency: the weights of the edges may represent the strength or weight of the dependencies between the nodes (which helps to quantify the degree of interaction between the nodes); as another example, the network structure: the topology of the graph may provide information about how the nodes are organized and connected (which is helpful in understanding the overall structure of the system); as another example, local and global dependencies: by analyzing the discrete dependency graph, it can be known which nodes are more susceptible to other nodes and which nodes play a key role in the whole network; for another example, time correlation: if time factors in the time series segments are also taken into account, the discrete dependency graph may also help understand how the nodes change over time, and if there are time-dependent dependencies.
The discrete dependency graph is a graph structure for representing the dependency relationship between nodes. In the graph, each node represents a representation of a time series segment, while the edges in the graph represent dependencies or associations between different time series segments. The purpose of constructing the discrete dependency graph is to aid in understanding the relationships between nodes in the time series segment for further analysis, prediction or control. Wherein all "nodes" refer to all time-series segments in the time-series data. Each time series segment corresponds to a node, wherein the characterization of the node is typically a characteristic representation of the time series segment. These node sets constitute the node sets of the discrete dependency graph.
Referring to fig. 3, the above-mentioned construction of the discrete dependency graph based on the time-series segment Sj i and the potential-feature representation H j i includes:
Step S21 of feature-based node representation: taking each time sequence segment Sj i as one node of the discrete dependency graph, and taking the potential feature representation H j i corresponding to the time sequence segment Sj i as a feature vector corresponding to the node;
step S22: creating an initial adjacency matrix Aa; the dimension of the adjacency matrix Aa is the number of nodes multiplied by the number of nodes, the adjacency matrix Aa is used for representing the connection relation among the nodes, and the number of the nodes represents the total number of the nodes in the discrete dependency graph;
Step S23 of automatically constructing connection relation: for each node, K nodes nearest to the node are calculated, and the K nodes nearest to the node are used as neighbor nodes of the node in the discrete dependency graph; determining the connection weight between the nodes according to the similarity between the nodes in the discrete dependency graph; for each node, connecting the node with K nodes nearest to the node according to the corresponding connection weight;
step S24: the above step S23 of automatically constructing the connection relationship is repeated until each node establishes a connection to obtain a discrete dependency graph.
Wherein, the adjacency matrix Aa is used for representing the connection relation between nodes. Each element Aa [ i ] [ j ] of the adjacency matrix indicates whether a connection or dependency exists between node i and node j. The adjacency matrix is typically a binary matrix wherein the value of Aa [ i ] [ j ] can be 0 or 1, indicating no connection or the presence of a connection, respectively.
In some embodiments, in the step S23 of automatically constructing the connection relationship, the metric value (i.e. the corresponding connection weight) may be filled into the corresponding position of the adjacency matrix Aa, so as to reflect the dependency relationship between each node through the adjacency matrix Aa. For example, for node i and node j, if node j belongs to one of the k nearest neighbors of node i, then the corresponding element Aa [ i ] [ j ] =1 in the adjacency matrix; otherwise, the corresponding element Aa [ i ] [ j ] =0 in the adjacent matrix. The specific process of calculating the K nodes nearest to the node (for example, obtained by using the KNN algorithm) belongs to the prior art in the field, so that details thereof are not repeated here. KNN (k-NearestNeighbor, the k nearest neighbor algorithm) is an existing and commonly used supervised learning method.
In some embodiments, in the step S23 of automatically constructing the connection relationship, the "similarity" in the "similarity between nodes according to the discrete dependency graph" may be characterized by cosine similarity. Therefore, the corresponding connection weight may be determined by the cosine similarity between the node and one of the K nodes. The connection weights may be determined using a binarization method. For example, if the cosine similarity is higher than a preset threshold, setting the corresponding connection weight to 1; otherwise, the corresponding connection weight is set to 0. The specific value of the preset threshold may be determined by a person skilled in the art according to actual needs, and the preset threshold is not limited herein.
It can be seen that in the step S21 of the feature-based node representation, each time-series segment Sj i is regarded as a node in the discrete dependency graph and has a corresponding feature vector, so that each node can be used to represent the potential feature representation of the corresponding time-series segment, thereby better capturing the information of the time-series segment; step S23 of automatically constructing the connection relation is implemented by calculating cosine similarity among the nodes and selecting K nearest neighbor nodes of a certain node, so that the connection relation among the nodes is automatically constructed without manual definition, and the characterization quality of data is improved; capturing of spatial dependencies: the connection relation among the nodes is determined through cosine similarity among the nodes, so that the space dependence relation among the nodes can be better captured, and the analysis and understanding of space-time data are facilitated; adjustable connection weight: in some embodiments, the connection weight is determined by adopting a binarization method, so that a person skilled in the art can adjust the weight of the corresponding connection according to actual requirements, thereby better meeting different data analysis requirements; efficiency and automation: through the above-mentioned connection relation between each node of automatic construction (namely the above-mentioned step S23 of automatic construction connection relation), have raised the efficiency of the analysis process, has reduced the demand of manual intervention, thus make the processing of the time series data more automatic; in summary, the steps for constructing the discrete dependency graph have the technical advantages of better feature representation, capturing of spatial dependency relationship, automatic construction of connection relationship, adjustable weight and the like, and the technical advantages are helpful for improving the analysis and characterization quality of the spatio-temporal data.
In some embodiments, potential characterization of all time series segments Sj i that may correspond to historical time series segmentsThe graph structure learning sub-module is input to train the graph structure learning sub-module.
In some embodiments, the connection between nodes may be determined by calculating cosine similarity based on the relationships between the time series segments.
It should be noted that, the "relationship between time series segments" refers to a temporal sequence of different time series segments, for example, the relationship between two time series segments may be that data trends are similar in a specific time period. The "connection between nodes" refers to a method of determining which nodes should have a connection between them when constructing a node dependency graph. Measuring similarity between nodes based on cosine similarity refers to computing a representation of time series segments corresponding to each pair of nodesCosine similarity between them. The cosine similarity is a common similarity measure, and measures the cosine value of the included angle between two vectors. For example, if the corresponding cosine similarity is higher than a preset threshold, it may be considered that there is a connection between the nodes, otherwise it is not. For another example, assume that there are multiple sensor nodes in a network, each node recording temperature data over a different period of time. To construct a discrete dependency graph, cosine similarity between the temperature data (i.e., time series segments) of each pair of nodes may be calculated. If the remaining chords are similar above a preset threshold, the two nodes can be connected to indicate that they have similarity in terms of temperature change.
In some embodiments, K Nearest Neighbors (KNNs) are calculated for all nodes in the discrete dependency graph: for each node, the K nodes closest to it are calculated as time interval distance metrics.
It should be noted that the "time interval distance measure" described above relates to how the distance between nodes is calculated to determine the connection between them. Here, it refers to a distance metric method for calculating connections between nodes. Specifically, it includes time intervals or time differences between nodes. For example, it may be considered to use the time stamp between two nodes to calculate the time difference between them, which may be used as a distance metric. The K Nearest Neighbor (KNN) method is a method of determining connections between nodes using a time interval distance metric, which finds the nearest K nodes of each node, which are considered to have a connection with the target node.
In some embodiments, the space-time diagram neural network submodule may be used to extract the latest time series segmentsCharacterization of (2)The method can effectively encode the time sequence fragments to represent, and further better understand and process the space-time information in the time sequence data. Characterization as described aboveFor characterising the latest time series fragmentI.e. periodicity and trend of the data. Characterization as described aboveThese features may be included, i.e., represented in a more abstract manner. A space-time diagram neural network model, i.e., a diagram convolution (GCN), may perform diagram structure learning to process space-time diagram data.
In some embodiments, only the traffic data of the last time-series segment (i.e., the latest time-series segment) of each node may be input to the fine-tuning learning architecture of the space-time diagram neural network (STGNN) submodule and trained, so as to obtain the corresponding spatial dependency relationship.
It should be noted that, the fine adjustment learning architecture from the flow data of the latest time sequence segment to the time-space diagram neural network (STGNN) submodule is adopted and trained, so that the data change can be responded quickly, meanwhile, the cost is reduced, and a large amount of historical data does not need to be processed. Instead of directly employing the existing untrained space-time diagram neural network (STGNN) sub-module, a fine-tuning improved version model based on the STGNN model (i.e., the fine-tuning learning architecture described above) is employed herein.
It should be noted that the function of the "space-time diagram neural network submodule" is to analyze and process the space-time relationship in the time-series segment. In particular, it uses the latest time series segments as central nodes to construct a time-space diagram, which is then analyzed and modeled using neural network techniques. In the present application, it uses the latest time series segment as a central node to construct a time-space diagram so as to better understand and process the time-space information in the time series segment.
It should be noted that the main purpose of constructing a space-time diagram is to transform the space-time relationship in the time series segment into a graphic structure for better understanding, representing and analyzing the data. This may provide a more powerful tool and method for various spatiotemporal related tasks. Wherein when constructing the time space diagram, the connection relation between nodes and the information between neighboring nodes can be used to learn the higher-level feature representation. This helps to improve the characterization of the data to improve the performance of subsequent tasks. In addition, based on the constructed space-time diagram, various tasks such as prediction of time series fragments, anomaly detection, analysis, data clustering and the like can be performed. And the space-time diagram may provide more rich information to support these tasks.
Referring to fig. 4, constructing a time-space diagram based on the discrete dependency diagram and the latest time-series segment of the time-series segments Sj i includes:
Step S30: the latest time sequence segment in the time sequence segments Sj i As a central node of the discrete dependency graph;
Step S31: calculating the association degree between the central node and the non-latest time sequence segment; wherein the association is characterized by adopting a corresponding similarity measure; the non-latest time sequence segment is the time sequence segment Sj i divided by the latest time sequence segment Time-series segments outside;
Step S32: selecting a preset number of time sequence fragments with highest association degree with the center node from the non-latest time sequence fragments as neighbor nodes of the center node;
Step S33: connecting the central node and the neighbor nodes of the central node to form a space-time diagram; wherein the space-time diagram represents a space-time dependency relationship between the central node and a preset number of time-series segments.
In some embodiments, the center node and the last time-series segment of the time-series segments Sj i are used in the step S31Cosine similarity between other time series segments characterizes the above-mentioned degree of association. The calculation of the cosine similarity belongs to the prior art in the field, so specific calculation of the cosine similarity is not repeated.
In some embodiments, the latest time-series segment S P i may be used as a central node of the space-time diagram, and its neighbor nodes in the discrete dependency diagram, which together with the central node constitute the space-time diagram, may be found for the latest time-series segment S P i according to the initial discrete dependency diagram.
It can be seen that the above-described dynamic central node selection (i.e., step S30) technique allows the latest timing segment to be selected from the timing data as the central node of the discrete dependency graph. This means that the discrete dependency graph of the present application is dynamic, i.e. different central nodes can be selected according to different moments to accommodate different data changes and requirements; the above-mentioned technical means of automatically constructing neighbor nodes (i.e., steps S31 to S32) automatically constructs neighbor nodes by calculating the degree of association and setting a threshold value. This helps identify time series segments associated with the central node, eliminating the need to manually define neighboring nodes. The above-mentioned capturing of the space-time dependency (i.e. step S33) is a technical means of constructing a space-time diagram by connecting the central node with its neighboring nodes, and the constructed space-time diagram helps to represent the space-time dependency between the central node and its related time-series segment (e.g. helps to capture the spatial dependency of the latest time-series segment S P i). This helps to better understand and analyze the correlation in the time series data.
In some embodiments, a characterization of the latest timing segment may be taken as an input feature corresponding to the center node, where the characterization of the latest timing segment may include a feature vector for the segment.
It can be seen that the technical means of using the characterization of the latest time sequence segment as the input feature corresponding to the central node shows that the time space diagram can provide personalized feature characterization so as to better reflect the characteristics and information of the time sequence segment.
In some embodiments, the connection relationship and the weight of the constructed space-time diagram may be integrated with the feature vector of the central node to form the input data. And then inputting the input data into the regression prediction layer sub-module.
It can be seen that the above-mentioned technical means of "integrating to form input data" can integrate the connection relationship and weight of the space-time diagram with the feature vector of the central node to form input data. This provides a comprehensive data representation that includes spatio-temporal dependencies and characteristic information to facilitate subsequent analysis and modeling. In summary, the foregoing steps provide technical effects of dynamic center node selection, automatic construction of neighbor nodes, capturing of space-time dependencies, personalized feature characterization, integration to form input data, and the like. These technical effects help to improve the quality of characterization and analysis of time series data, making it more suitable for modeling and analysis of spatio-temporal data.
In some embodiments, the predictive model is further used to: updating the potential feature representation of the central node of the time-space diagram according to the information acquired from the neighbor nodes of the time-space diagram; and predicting the future time sequence segment based on the potential characteristic representation of the latest time sequence segment and the updated time space diagram to obtain a corresponding prediction result. Referring to fig. 5, updating the potential feature representation of the center node of the time-space diagram according to the information collected from the neighbor nodes of the time-space diagram includes:
step S40: collecting feature vectors corresponding to neighbor nodes from the neighbor nodes in the time-space diagram;
step S41: aggregating feature vectors collected from different neighbor nodes to generate a new potential feature representation of the center node;
Step S42: inputting the new potential feature representation into a nonlinear activation function to obtain a nonlinear transformed new potential feature representation;
Step S43: the original potential feature representation of the central node is replaced by the new potential feature representation after nonlinear transformation.
In some embodiments, the potential feature representation of the center node of the time-space graph may be initialized before updating the potential feature representation. For example, the potential feature representation of the center node may be initialized to the original feature vector or other representation of the center node. The person skilled in the art can determine the other characteristics described above according to the actual requirements. Thereafter, relevant information is collected from the neighboring nodes to update the potential feature representation of the central node by taking into account the relationship between the central node and its neighboring nodes.
In some embodiments, the manner of aggregation in step S41 includes: weighted averaging, summing or averaging.
It can be seen that the specific manner in which the potential feature representations of the central node are initialized described above means that different methods can be employed to initialize the potential feature representations of the central node, including, for example, the use of raw feature vectors or other types of characterizations.
It can be seen that through the above step of information dissemination (i.e., step S40), information can be efficiently collected from each neighbor node, for example, by weighted averaging of potential feature representations of neighbor nodes, where weights are typically calculated based on connection strength or other similarity metrics. This helps to better spread information and features; the step of feature aggregation described above (i.e., step S41) describes an aggregation process of information, including weighted averages at the element level, summary functions, or other task-specific aggregation methods, that facilitate the generation of new potential feature representations of the central node, including information of neighboring nodes; the step of nonlinear transformation described above (i.e., step S42) introduces nonlinear transformation to better capture complex patterns and dependencies in the data by passing new latent feature representations to nonlinear activation functions. This helps to increase the expressive power of the characterization; the step of updating the feature (i.e. step S43) allocates the new potential feature representation after the nonlinear transformation to the central node to replace the original potential feature representation of the central node, which means that the feature representation of the central node will include information about itself and neighboring nodes, thereby improving the information richness of the feature; in summary, by the above steps S40 to S43, effective aggregation of information can be achieved. This includes collecting information from neighbor nodes, weighted averaging feature representations of neighbor nodes, and applying a nonlinear transformation. In general, according to the steps described above, technical effects including efficient feature representation initialization, aggregation and propagation of information, nonlinear transformation, and updating of features can be achieved, which helps to improve the characterization and analysis capabilities of spatio-temporal data, thereby better capturing associations and patterns in the data.
It should be noted that the above-mentioned "updating the potential feature representation of the center node of the time-space diagram according to the information collected from the neighbor nodes of the time-space diagram" helps to capture the spatial dependency relationship of the latest time-sequence segment. For example, assume a city traffic network, each node representing an intersection, each node having a characteristic representation (e.g., the potential characteristic representation described above) that includes traffic flow, speed, congestion, etc. information for the intersection. If it is desired to analyze how traffic flows propagate in cities to predict future traffic conditions. In this scenario, the above-mentioned "spatial dependency relationship" may refer to a case where traffic flows between different portals affect each other. For example, the traffic flow at one intersection may be affected by the traffic flow at an adjacent intersection, and if one intersection is congested, the intersections in the vicinity may be affected by traffic congestion. Therein, it is assumed that there is a central node V i having an initial characteristic representation X i and a set of neighboring nodes N i, which also have respective characteristic representations X j, where j belongs to N i. In some embodiments, the neural network layer may be used to update the feature representation of the central node as follows: x i'=f(Xi,Xj, for j belongs to Ni), where f is a function or neural network layer that combines the feature representations of the central node V i and its neighboring nodes X j to generate an updated feature representation X i'. This updated feature representation X i' will better capture the spatial dependence of the central node V i.
It should be noted that, the end-to-end training of the space-time diagram neural network submodule is as follows: the parameters of the model are optimized by jointly training the entire space-time diagram neural network sub-module to better adapt it to the task. This task includes spatiotemporal relationship modeling, prediction. End-to-end training helps the model learn more meaningful representations from the data to improve task performance. Wherein the latest time series segmentCharacterization of (2)Refers to a representation of the most recent time series segment for each node i. It may include the characteristics of the node and other information about the most recent time series segment. Which is a representation of the most recent state of a node. And the latest time series fragmentCharacterization of (2)The corresponding dependency representation H gw refers to a representation of the dependency information of the node. Dependency information refers to relationship information between a node and its neighboring nodes or other related nodes, which relate to temporal and spatial dependencies. That is, H gw is used for information indicating these dependencies.
It should be noted that the purpose of fusing the latest segment representation Hp of each node with its corresponding dependency representation H gw is to combine the latest state (Hp) of the node with the information (H gw) of its dependency relationship to obtain a more comprehensive and informative node representation. This helps the model better understand the time series segments of the nodes, including their spatio-temporal relationship with other nodes, thereby improving the performance of the model in the task.
In some embodiments, a semantic projector SP may be introduced to convert the latest segment representation Hp into semantic space of the dependency graph representation Hgw, and learn the projective transformation matrix through training by Hp to form a combined matrix H final:Hfinal =sp (Hp) + Hgw. Finally, please refer to fig. 6, the prediction is performed by the regression prediction layer submodule: given the future true value Y ε R TxNxd, the average absolute error is used as a regression loss.
It should be noted that the main purpose of the above-mentioned "introducing a semantic projector SP" is to transform the latest fragment representation Hp from its original representation space to the semantic space of the dependency graph representation H gw, so as to better fuse the two representations and improve the model performance. The general flow of the above "convert the latest fragment representation Hp to semantic space of the dependency graph representation Hgw" is: first, a projective transformation matrix needs to be defined that will be used to map Hp from its original representation space to semantic space; wherein the matrix may be a learnable parameter of the model, learned by a training process; 2) Learning a projective transformation matrix: in the training process, the model learns how to adjust the weight of the projective transformation matrix so as to reserve the useful information of Hp to the maximum extent and map the useful information to semantic space; this is typically done by minimizing a loss function, the design of which can be determined according to the task requirements; 3) Applying a projective transformation: once the projective transformation matrix is learned, the latest fragment representation Hp can be transformed through the matrix; this transformation can be done using matrix multiplication: h final=SP(Hp)+Hgw; where SP represents the semantic projector, H final is the transformed latest fragment representation, and Hp is the original latest fragment representation.
It should be noted that, the specific process of forming the combined matrix H final is in the prior art. For example, reference may be made to chinese patent publication CN115688871, "pretraining enhanced multivariate time series prediction based on pretraining" section [0077] below, "the general space-time-graph neural network (STGNN) sub-module takes as input the last segment (such as the latest timing segment described above) and the discrete dependency graph, while the enhanced space-time-graph neural network (STGNN) sub-module also takes into account the representation of the input segment. The above representation is due to the strong ability of the transducer model to extract long-term dependenciesContains rich context information. The STEP framework can be extended to almost any STGNN, where an existing approach is used as a back-end model, GRAPH WAVENET. GRAPH WAVENET by combining graph convolution with expanded random convolution, the spatio-temporal dependencies are effectively captured. It predicts through a regression prediction layer (e.g., a multi-layer perceptron) based on its output latent hidden representation H gw∈RNxd. The representation H i P of the transducer model of the last segment P of all nodes i can be combined to be represented as: h P∈RNxd, fusing the representations of GRAPH WAVENET and TSFormer by:
Hfinal=SP(HP)+Hgw
Where SP (·) is a semantic projector (e.g., implemented by a multi-layer perceptron) that converts H i P into the semantic space of H gw. Finally, prediction is carried out through a regression prediction layer: Given the future true value Y ε R TxNxd, use the average absolute error as a regression loss:
Where N is the number of nodes, T is the number of prediction steps, and C is the dimension of the output.
The space-time diagram neural network (STGNN) sub-module and the diagram structure learning sub-module train in an end-to-end manner. The expression of the total loss function L of the space-time diagram neural network (STGNN) sub-module and the diagram structure learning sub-module may be:
L=Lregression+λLgraph,
Where λL graph is a gradually decaying graph regularization term. The set-up graph regularization term λ decays gradually during training to surpass the kNN graph. The graph regularization term L graph is used for learning a constraint model to better capture the relevance among data points. λ is a superparameter that adjusts the weights of the graph regularization term. The graph regularization term may be tapered, meaning that during training, the value of λ will gradually decrease, allowing the model to be initially more constrained by graph regularization, and then gradually decrease as training progresses so that the model is increasingly focused on regression tasks. The lead-in graph regularization term is used to capture the correlation between data.
It should be noted that, the "correlation between captured data" refers to that the model tries to understand and learn the correlation or correlation between different data points. For example, time correlation in time series data: for time series data, there may be a temporal correlation between data points at different points in time, and the model may attempt to capture such temporal correlation to better predict future data points; for another example, node dependencies in graph data: in graph data, there may be node dependencies between data points of different nodes. For example, there may be a relationship between users in a social network, or traffic between different intersections in a traffic network may affect each other. The model may also attempt to capture such node dependencies to better understand the network data.
It should be noted that, the above "introducing a graph regularization term" generally refers to restricting learning of a model by adding a regularization term (typically, a regularization term based on a graph structure) when training a machine learning model, so as to better capture the relevance between data points. In particular, a graph structure between data may be defined, where nodes represent data points and edges represent associations between data. This may be a predefined graph structure or may be automatically constructed from data. Thereafter, a regularization term associated with the graph structure, commonly referred to as a graph regularization term, needs to be defined. This regularization term is typically based on similarity, connectivity, or other relevance metrics between nodes. One common example is the laplace regularization term. Then, the graph regularization term is added to the loss function of the model, so that the model can try to fit data in the training process, and can be constrained by the regularization term to better capture the relevance between data points.
In some embodiments, a random gradient descent (SGD) may be employed to train the space-time graph neural network (STGNN) submodule and the graph structure learning submodule to minimize the total loss function L described above. For example, the gradient is propagated from the total loss function back to the model parameters to update the parameters. And performing multiple iterations on the training data, and continuously adjusting model parameters so that a predicted result gradually approaches to a real flow value. The specific process of training the space-time graph neural network (STGNN) sub-module and the graph structure learning sub-module belongs to the prior art in the field, and therefore is not described herein.
In some embodiments, the time series segment corresponding to the reserved verification set may be input into the regression prediction layer to predict the traffic data of the next time series segment Tf of each node. The method belongs to the stage of adopting the fine tuning learning architecture of the flow data of the latest time sequence segment to the time-space diagram neural network (STGNN) submodule and training (namely regression prediction and continuous fine tuning stage). The regression prediction and continuous fine tuning stage herein may further include: initializing parameters of a pre-training model, and initializing a regression model by using the parameters of the pre-training model; preparing a training data set with a label, collecting the training data set with a real flow value label, and ensuring that each sample has a corresponding real flow fragment value at the future Tf moment; scheduling regulation is carried out by utilizing an SDN sub-controller, and whether the total time used for predicting the prediction model to obtain a corresponding prediction result is smaller than the time corresponding to a future time step number range (such as a prediction range of a Tf step sliding window) which can be predicted by the prediction model is monitored; if the total time is greater than the time corresponding to the future time step number range which can be predicted by the prediction model, immediately stopping the prediction work of the prediction model; or if the precision of the corresponding prediction result is lower than a preset precision threshold value, immediately stopping the prediction work of the prediction model, and retraining the prediction model. Wherein, the "prediction range of Tf step sliding window" refers to a window in the time sequence segment, and the window includes the future time step number range predicted by the model. The size of this window is determined by a parameter Tf, which represents the number of future time steps to be predicted. For example, if tf=24, then the "prediction horizon of Tf step sliding window" indicates that the model is predicting data for the next 24 time steps. This window would cover the data points for the 24 future time steps, starting from the current time step. The model will attempt to predict the data values in the time range from the current time step to 24 time steps later.
It should be noted that the "parameters of the pre-training model" include weights, deviations, and other learnable parameters of the model, which are learned during the pre-training phase, and are used to capture general features of the input data. The above "regression model" refers to a model for regression tasks (e.g., a regression prediction layer sub-module of a prediction model), typically including one or more neural network layers. The regression model in the present application aims to predict the traffic data of the next time series segment of each node from the input data. The function of the regression model is to perform a regression task, i.e. to predict flow data for future time steps from the input data. By initializing the regression model using the parameters of the pre-trained model, the generic features learned in previous tasks of the pre-trained model can be exploited to improve the performance of the regression model. This approach to transfer learning can generally accelerate and improve the learning process of the model on a particular task.
In some embodiments, when coordination of network resource allocation and traffic prediction is required, the SDN sub-controller starts a prediction flow (i.e. step S200 described above). This step involves resource allocation and network traffic prediction. In this process, the SDN sub-controller needs to consider the current network topology, resource availability and existing traffic data to predict future traffic demands.
In some embodiments, the dynamic network traffic regulation method of the present application further includes:
Step S300: and regulating and controlling the network according to the corresponding prediction result.
The above step S300 means that the SDN sub-controller will dynamically adjust the network resource allocation to meet the predicted traffic demand to ensure network performance and efficiency.
It should be noted that, a specific process of regulating and controlling the network according to the corresponding prediction result (e.g. using the SDN sub-controller) belongs to the prior art in the field, so that a detailed description is omitted herein.
It can be seen that the SDN sub-controller starts the prediction flow when it needs to coordinate network resource allocation and traffic prediction, which means that network resources can be adjusted in real time according to actual requirements, so as to improve network performance and efficiency.
In some embodiments, please refer to fig. 7, the dynamic network traffic control method of the present application further includes:
Step S400: monitoring the total time used by the prediction model for prediction to obtain a corresponding prediction result; and if the total time is greater than the time corresponding to the future time step number range which can be predicted by the prediction model, immediately stopping the prediction work of the prediction model.
In some embodiments, the dynamic network traffic regulation method of the present application further includes:
and if the precision of the corresponding prediction result is lower than a preset precision threshold value, immediately stopping the prediction work of the prediction model.
In some embodiments, the dynamic network traffic regulation method of the present application further includes: and if the precision of the corresponding prediction result is lower than a preset precision threshold value, immediately stopping the prediction work of the prediction model.
In some embodiments, if the total time is greater than the time corresponding to the future time step range that can be predicted by the prediction model, the prediction operation of the prediction model is stopped immediately to avoid exceeding the acceptable time range.
In some embodiments, if the precision of the corresponding prediction result is lower than the preset precision threshold, the prediction work of the prediction model is immediately stopped, and the prediction model is retrained.
In some embodiments, the SDN sub-controller may monitor the total time used by the prediction model to predict to get the corresponding prediction result. The total time includes the time elapsed from the start of the execution of the prediction to the completion of the prediction. The total time is monitored to ensure that it is less than Tf, where Tf represents the predicted range of the sliding window.
It can be seen that when the SDN sub-controller performs resource allocation and network traffic prediction (as in step S200 above), the current network topology, resource availability and existing traffic data are combined, which helps to predict future traffic demands, so that resources are better allocated and network efficiency is improved.
It can be seen that the SDN sub-controllers monitor the above-mentioned total usage to ensure that they are completed within an acceptable time frame. This helps ensure that the network regulation process does not cause unnecessary delays or performance degradation. If the total time exceeds the prediction range (Tf), the SDN sub-controller can automatically end the prediction process to avoid unnecessary resource waste and time delay.
It can be seen that if the accuracy of the corresponding prediction result is lower than the preset accuracy threshold, the SDN sub-controller may terminate the current prediction flow (i.e. perform the quality control operation) and re-predict the model. This helps to improve the accuracy and quality of the predictions. In general, these steps include real-time network regulation, resource optimization and traffic prediction, time-efficient monitoring, automatic suspension of flows, and quality control, which helps to improve the performance, efficiency, and reliability of the SDN network and ensure efficient utilization of network resources.
It should be noted that the method and the system for regulating and controlling the dynamic network flow based on the pre-training provided by the application can be applied and implemented based on the intelligent ecological network IEN. That is, the pre-training-based dynamic network traffic control method and system thereof of the present application can also be used or applied to the system architecture of the intelligent ecological network (INTELLIGENT ECO NETWORKING, abbreviated as IEN).
The above is some description of a dynamic network traffic regulation method based on pre-training. The application also discloses a dynamic network flow regulation system based on pre-training in some embodiments. Referring to fig. 8, the system includes:
The prediction module 100 is configured to obtain a historical time sequence segment from a network, and segment the historical time sequence segment to obtain a plurality of time sequence segments; inputting the time sequence segments into a trained prediction model to predict future time sequence segments through the trained prediction model so as to obtain corresponding prediction results;
The prediction model comprises an encoder, a graph structure learning sub-module and a second decoder;
the encoder is extracted from a pre-trained model that has been pre-trained,
The encoder is used for acquiring potential characteristic representations of the time series segments;
the graph structure learning submodule is used for constructing a discrete dependency graph based on the time sequence segments and the potential characteristic representation;
The second decoder comprises a space-time diagram neural network sub-module and a regression prediction layer sub-module;
wherein the space-time diagram neural network sub-module is used for constructing a space-time diagram based on the discrete dependency diagram and the latest time sequence segment in the time sequence segments,
And updating the potential feature representation of the central node of the time-space diagram according to the information acquired from the neighbor nodes of the time-space diagram;
the regression prediction layer submodule is used for predicting the future time series segment based on the potential feature representation of the latest time series segment and the space-time diagram before update or the space-time diagram after update.
It should be noted that, the specific flow and technical effects of the dynamic network flow control system are substantially similar to those of the dynamic network flow control method described above, that is, the specific flow and technical effects of the dynamic network flow control system may refer to the specific flow and technical effects of the dynamic network flow control method described above, and will not be repeated herein.
The foregoing is a few descriptions of a pre-training based dynamic network traffic regulation system. Also disclosed in some embodiments of the application is a computer readable storage medium comprising a program executable by a processor to implement a method according to any of the embodiments of the application.
Reference is made to various exemplary embodiments herein. However, those skilled in the art will recognize that changes and modifications may be made to the exemplary embodiments without departing from the scope herein. For example, the various operational steps and components used to perform the operational steps may be implemented in different ways (e.g., one or more steps may be deleted, modified, or combined into other steps) depending on the particular application or taking into account any number of cost functions associated with the operation of the system.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. Additionally, as will be appreciated by one of skill in the art, the principles herein may be reflected in a computer program product on a computer readable storage medium preloaded with computer readable program code. Any tangible, non-transitory computer readable storage medium may be used, including magnetic storage devices (hard disks, floppy disks, etc.), optical storage devices (CD-to-ROM, DVD, blu-Ray disks, etc.), flash memory, and/or the like. These computer program instructions may be loaded onto a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions which execute on the computer or other programmable data processing apparatus create means for implementing the functions specified. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including means which implement the function specified. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified.
While the principles herein have been shown in various embodiments, many modifications of structure, arrangement, proportions, elements, materials, and components, which are particularly adapted to specific environments and operative requirements, may be used without departing from the principles and scope of the present disclosure. The above modifications and other changes or modifications are intended to be included within the scope of this document.
The foregoing detailed description has been described with reference to various embodiments. However, those skilled in the art will recognize that various modifications and changes may be made without departing from the scope of the present disclosure. Accordingly, the present disclosure is to be considered as illustrative and not restrictive in character, and all such modifications are intended to be included within the scope thereof. Also, advantages, other advantages, and solutions to problems have been described above with regard to various embodiments. The benefits, advantages, solutions to problems, and any element(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, system, article, or apparatus. Furthermore, the term "couple" and any other variants thereof are used herein to refer to physical connections, electrical connections, magnetic connections, optical connections, communication connections, functional connections, and/or any other connection.
Those skilled in the art will recognize that many changes may be made to the details of the above-described embodiments without departing from the underlying principles of the invention. Accordingly, the scope of the invention should be determined only by the following claims.