CN117675615A

CN117675615A - A dynamic network traffic control method and system based on pre-training

Info

Publication number: CN117675615A
Application number: CN202311652833.1A
Authority: CN
Inventors: 雷凯; 何亦凡; 李琦; 邱炜伟; 山金孝; 段洪琳; 张�雄; 袁国辉; 陈佩淑; 方欣
Original assignee: Beijing Red Date Technology Co ltd; Zhaoshang Xinzhi Technology Co ltd; Peking University Shenzhen Graduate School; Hangzhou Qulian Technology Co Ltd
Current assignee: Beijing Red Date Technology Co ltd; Zhaoshang Xinzhi Technology Co ltd; Peking University Shenzhen Graduate School; Hangzhou Qulian Technology Co Ltd
Priority date: 2023-12-01
Filing date: 2023-12-01
Publication date: 2024-03-08
Anticipated expiration: 2043-12-01
Also published as: CN117675615B

Abstract

The invention provides a dynamic network traffic control method and system based on pre-training. The method includes: obtaining historical time series data from the network, and obtaining multiple time series segments based on the historical time series data; inputting the time series segments into a trained prediction model, and the prediction model is used to obtain the potential feature representation of the time series segments; based on Time series segments and latent feature representations are used to construct a discrete dependency graph; a spatio-temporal graph is constructed based on the discrete dependency graph and the latest time series segment in the time series segment; future time series segments are predicted based on the latent feature representation and spatio-temporal graph of the latest time series segments. The corresponding prediction result is obtained; the future time series segment is located in the next period of the latest time series segment. The method and system enable long-term accurate prediction of dynamic network traffic. The method and system can be used not only in traditional SDN networks, but also in new forms of SDN wide-area blockchain networks.

Description

Pre-training-based dynamic network flow regulation and control method and system thereof

Technical Field

The invention relates to the technical field of information, in particular to a pre-training-based dynamic network flow regulation and control method and a system thereof.

Background

With the rapid development of cloud computing, internet of vehicles and web3.0 blockchain distributed networks, new service types and numbers are rapidly increased, and problems such as service quality (QoS) degradation and network stability degradation caused by insufficient monitoring of network indexes occur. How to reasonably schedule network traffic, avoid generating network congestion, further improve the network resource utilization rate and ensure the user experience quality, and become a problem that needs to be studied in the network field. One of the major challenges of traffic prediction is the long-term unpredictability of network traffic. In addition, the high spatial dynamics of blockchain networks, internet of vehicles, and satellite networks increase the difficulty of accurately predicting and timely suppressing congestion by the networks. High dynamics is an important factor limiting some network system performance enhancements, while accurate predictions of network dynamics can provide effective support for relevant decisions of the network infrastructure. The software defined network (Software Defined Network, SDN) separates the control layer from the data layer, the control layer can uniformly allocate network resources from a global view, a network manager can make various strategies at the control layer to ensure the stable and efficient operation of the network, the SDN network controller gradually develops towards clustering, the stronger computing power capability can support the working mode of a large model mode, and the SDN bottom data transmission plane also extends towards a wider network boundary, such as a time delay sensitive network, a blockchain Internet web3.0 network and the like. It is increasingly being applied in high network performance scenarios, such as data center networks (Data Center Network, DCN), as the most promising technology for optimizing network resource management in the future.

On the other hand, artificial intelligence technology is very different day by day, and the realization of network traffic prediction by using deep learning is a new research direction, so that the performance and reliability of the network can be effectively improved. The current network traffic prediction technology using deep learning technology is mainly divided into three types: the first method is to treat the flow data as time series data directly, neglect the time-space correlation of the flow data, and predict by using a time series prediction method directly, wherein the prediction accuracy in a dynamic network environment is necessarily affected; the second method considers the space-time characteristics of flow data, becomes a popular method for multi-variable time sequence (MTS) prediction in recent years by using a space-time graph neural network (STGNN), and remarkably improves the prediction precision by jointly modeling the space-time mode of MTS by using the graph neural network and a sequence model. However, the characterization of time series and the spatial dependency between them require analysis based on long-term historical data, which is impractical for cold start problems such as new links of the network; the third approach further considers the scarcity of spatiotemporal data, using a generative countermeasure model (GAN) for data generation and prediction. In contrast to other generative models, GAN does not require a hypothetical data distribution anymore, but rather samples directly using a distribution, thus truly achieving a theoretical complete approximation of real data. However, for complex data, such as spatiotemporal data, this approach based on simple GAN, which does not require pre-modeling, is too free to be controlled and model training is not stable and is prone to failure.

Accordingly, there is a need for improvements over the prior art.

Disclosure of Invention

The invention mainly solves the technical problem of providing a pre-training-based dynamic network flow regulation method and a pre-training-based dynamic network flow regulation system, and aims to realize long-term accurate prediction of dynamic network flow.

According to a first aspect, in one embodiment, a method for dynamic network traffic regulation based on pre-training is provided. The method comprises the following steps:

acquiring historical time sequence data from a network, and segmenting the historical time sequence data to obtain a plurality of time sequence fragments;

inputting the time series segment into a trained predictive model,

the predictive model is used for:

acquiring a potential feature representation of the time series segment;

constructing a discrete dependency graph based on the time series segments and the potential feature representations;

constructing a space-time diagram based on the discrete dependency graph and the latest time sequence segment in the time sequence segments;

predicting a future time sequence segment based on the potential feature representation of the latest time sequence segment and the space-time diagram to obtain a corresponding prediction result; wherein the future time series segment is located at a next time period of a latest time series segment among the time series segments.

In an embodiment, the constructing a discrete dependency graph based on the time-series segments and the potential feature representation includes:

the step of feature-based node representation: taking each time sequence segment as one node of the discrete dependency graph, and taking potential characteristic representation corresponding to the time sequence segment as a characteristic vector corresponding to the node;

creating an initial adjacency matrix; the dimension of the adjacency matrix is the number of nodes multiplied by the number of nodes, the adjacency matrix is used for representing the connection relation among the nodes, and the number of the nodes represents the total number of the nodes in the discrete dependency graph;

automatically constructing a connection relation: for each node, K nodes nearest to the node are obtained through calculation, and the K nodes nearest to the node are used as neighbor nodes of the node in the discrete dependency graph; determining the connection weight between the nodes according to the similarity between the nodes in the discrete dependency graph; for each node, connecting the node with K nodes nearest to the node according to the corresponding connection weight;

repeating the step of automatically constructing the connection relation until each node establishes connection to obtain the discrete dependency graph.

In an embodiment, the constructing a space-time diagram based on the discrete dependency graph and the latest time-series segment of the time-series segments includes:

taking the latest time sequence segment in the time sequence segments as a central node of the discrete dependency graph;

calculating the association degree between the central node and the non-latest time sequence segment; wherein the association is characterized by adopting a corresponding similarity measure; wherein the non-latest time-series segment is a time-series segment other than the latest time-series segment among the time-series segments;

selecting a preset number of time sequence fragments with highest association degree with the center node from the non-latest time sequence fragments as neighbor nodes of the center node;

interconnecting the central node and the neighbor nodes of the central node to form the time-space diagram; wherein the space-time diagram represents a space-time dependency between the central node and the predetermined number of the time-series segments.

In an embodiment, the prediction model is further used to: updating potential feature representations of a central node of the space-time diagram according to information acquired from neighbor nodes of the space-time diagram; and predicting the future time sequence segment based on the potential feature representation of the latest time sequence segment and the updated time-space diagram to obtain a corresponding prediction result.

In an embodiment, the updating the potential feature representation of the center node of the space-time diagram according to the information collected from the neighbor nodes of the space-time diagram includes:

collecting feature vectors corresponding to the neighbor nodes from the neighbor nodes in the time-space diagram;

aggregating the feature vectors collected from the different neighboring nodes to generate a new potential feature representation of the central node;

inputting the new potential feature representation into a nonlinear activation function to obtain a nonlinear transformed new potential feature representation;

and replacing the original potential feature representation of the central node by the new potential feature representation after the nonlinear transformation.

In an embodiment, the method for regulating and controlling the dynamic network traffic further includes: regulating and controlling the network according to the corresponding prediction result;

in an embodiment, the method for regulating and controlling the dynamic network traffic further includes: further comprises:

monitoring the total time used by the prediction model for carrying out the prediction to obtain the corresponding prediction result; if the total time is greater than the time corresponding to the future time step number range which can be predicted by the prediction model, immediately stopping the prediction work of the prediction model;

In an embodiment, the method for regulating and controlling the dynamic network traffic further includes: further comprises: and if the precision of the corresponding prediction result is lower than a preset precision threshold value, immediately stopping the prediction work of the prediction model.

According to a second aspect, an embodiment provides a pre-training based dynamic network traffic regulation system. The system comprises:

the prediction module is configured to acquire historical time series data from a network, and segment the historical time series data to obtain a plurality of time series fragments;

inputting the time sequence segments into a trained prediction model to predict future time sequence segments through the trained prediction model so as to obtain corresponding prediction results;

wherein the future time series segment is located at a next time period of a latest one of the time series segments;

wherein the prediction model comprises an encoder, a graph structure learning sub-module and a second decoder;

the encoder is extracted from a pre-trained model that has been pre-trained,

the encoder is configured to obtain a potential feature representation of the time series segment;

the graph structure learning submodule is used for constructing a discrete dependency graph based on the time sequence segments and the potential characteristic representation;

The second decoder comprises a space-time diagram neural network sub-module and a regression prediction layer sub-module;

the space-time diagram neural network submodule is used for constructing a space-time diagram based on the discrete dependency diagram and the latest time sequence fragment in the time sequence fragments, and updating potential characteristic representations of a central node of the space-time diagram according to information acquired from neighbor nodes of the space-time diagram;

the regression prediction layer submodule is used for predicting a future time series segment based on the potential feature representation of the latest time series segment and the pre-update space-time diagram or the post-update space-time diagram.

According to a third aspect, a computer-readable storage medium is provided in one embodiment. The computer-readable storage medium includes a program. The program is capable of being executed by a processor to implement a dynamic network traffic regulation method as described in any of the embodiments herein.

The beneficial effects of this application are:

according to the dynamic network flow regulation method and the system thereof, historical time sequence data are acquired from a network, and a plurality of time sequence fragments are obtained based on the historical time sequence data; inputting the time sequence segments into a trained prediction model, wherein the prediction model is used for acquiring potential characteristic representations of the time sequence segments; constructing a discrete dependency graph based on the time series segments and the potential feature representations; constructing a space-time diagram based on the discrete dependency diagram and the latest time sequence fragment in the time sequence fragments; predicting the future time sequence segment based on the potential feature representation and the space-time diagram of the latest time sequence segment to obtain a corresponding prediction result; wherein the future time series segment is located at a next time period of the latest time series segment among the time series segments. The dynamic network flow regulation and control method and the system thereof can realize long-term accurate prediction of dynamic network flow.

Drawings

FIG. 1 is a flow diagram of a pre-training based dynamic network traffic control method according to an embodiment;

FIG. 2 is a block diagram of a pre-training model of one embodiment;

FIG. 3 is a schematic flow diagram of constructing a discrete dependency graph in accordance with one embodiment;

FIG. 4 is a schematic flow diagram of constructing a space-time diagram for one embodiment;

FIG. 5 is a flow diagram of updating a potential feature representation of a central node, according to one embodiment;

FIG. 6 is a block diagram of a predictive model of one embodiment;

FIG. 7 is a flow chart of a dynamic network traffic control method according to another embodiment;

FIG. 8 is a block diagram of a dynamic network traffic regulation system according to one embodiment.

Detailed Description

The invention will be described in further detail below with reference to the drawings by means of specific embodiments. Wherein like elements in different embodiments are numbered alike in association. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted, or replaced by other elements, materials, or methods in different situations. In some instances, some operations associated with the present application have not been shown or described in the specification to avoid obscuring the core portions of the present application, and may not be necessary for a person skilled in the art to describe in detail the relevant operations based on the description herein and the general knowledge of one skilled in the art.

Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.

The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning. The terms "coupled" and "connected," as used herein, are intended to encompass both direct and indirect coupling (coupling), unless otherwise indicated.

Pre-training (Pre-training) plus Fine-tuning (Fine-tuning) training paradigms in the field of natural language processing have been widely used and studied in recent years. The method can effectively improve the generalization capability and performance of the model by performing generative pre-training on a large-scale corpus and then performing fine tuning on a specific task. In recent years, this pre-training plus fine-tuning training paradigm has also gradually been applied to the field of timing prediction. The data in the temporal prediction can be seen as a sequence in time, similar to a sequence of words in text. Therefore, the pre-training and fine-tuning training paradigm in the natural language processing field can be applied to time sequence prediction, and fine tuning is performed on the basis of a pre-training model aiming at a specific time sequence prediction task so as to improve the performance and generalization capability of the model. Many language pre-training models use a masked self-supervision training mechanism, which not only greatly reduces the labor cost of training, but also solves the scarcity of data to a certain extent.

The application provides a pre-training-based dynamic network flow regulation and control method and a system thereof. The dynamic network flow regulation and control method and the system thereof combine a space-time diagram neural network and a self-supervision time sequence generation model (namely a pre-training model) and accurately predict future multi-step flow information of dynamic network flow through long-time sequence characterization learning and fine-tuning training of the space-time diagram neural network. The method and the device realize accurate multi-step time sequence generation by pre-training the past long-time network traffic time sequence data fragments in a self-supervision mode, and modify the pre-training model to realize the downstream task of dynamic network traffic prediction with lower training cost.

The technical scheme of the present application will be described in detail with reference to examples.

Referring to fig. 1, a method for dynamic network traffic regulation based on pre-training includes:

step S100: acquiring historical time sequence data from a network, and segmenting the historical time sequence data to obtain a plurality of time sequence fragments;

step S200: inputting the time series segments into a trained predictive model for:

Acquiring a potential feature representation of the time series segment;

One purpose of constructing a space-time diagram is to convert the space-time dependencies in the historical time series data into a graph structure.

The historical time series data is ubiquitous from traffic, energy to economy. For example, in intelligent traffic systems, sensors deployed on a road network constantly record traffic conditions. For example, the total number of vehicles passing through each sensor, the vehicle speed, etc. The vast amount of data includes a plurality of time series, each from a traffic sensor. Similar examples also occur in power systems, financial systems, such as power consumption in multiple areas, changes in multiple stock curves, etc. In addition, the specific procedure of "acquiring historical time series data from the network" belongs to common general knowledge in the art, and thus will not be described herein.

The present application uses a framework called STEP-Temporal graphneural networks are Enhancedby a Pre-training model in which the space-time diagram neural network is enhanced by an extensible time-series pre-training model (i.e., the pre-training model in STEP S200). In some embodiments, please refer to fig. 2, the pre-training model employs a TSGPT (TSGPT is a generated pre-training model based on encoder-decoder architecture and self-attention mechanism, and artificial-annotation-free self-supervised training is implemented by means of a masked self-coding mechanism), which aims to efficiently learn time patterns from very long-term historical time series data and generate segment-level representations (i.e., the above-mentioned latent feature representations). This fragment-level inclusion of rich context information is advantageous to address the challenge of "Space Time Graphic Neural Network (STGNN) submodules being unaware of context information outside the window". Furthermore, the learning of these segments represents the ability to calculate correlations between time series segments in combination with information of the entire long-term historical time series data, thus solving the challenge of "short-term information is unreliable for modeling of discrete dependency graphs", i.e. the challenge of lack of discrete dependency graphs. Wherein the discrete dependency graph is represented by similarities (or correlations) between the time series segments. Wherein, the application uses an efficient, unsupervised pre-training model that is trained by masking an automatic encoding strategy. The trained pre-training model is able to effectively capture information of very long-term historical data (e.g., data of weeks or months) and generate segment-level representations that correctly reflect complex patterns in time-series segments. The graph structure learning sub-module, as used herein, based on an unsupervised pre-training model representation, learns discrete dependency graphs and directs joint training of graph structures and space-time graph neural network (STGNN) sub-modules using the kNN calculated based on the unsupervised pre-training model representation as regularization.

The training process of the prediction model in step S200 generally includes:

s1, cutting and masking historical time series data;

s2, constructing an encoder based on a transducer;

s3, constructing a first decoder based on a transducer;

s4, performing target reconstruction;

s5, selecting the encoder acquired in the step S4;

s6, building a graph structure study;

s7, constructing an enhanced downstream space-time diagram neural network;

and S8, training a prediction model to obtain the trained prediction model.

In some embodiments, for step S1, training samples of the time series segment are obtained from an original ultra-long time series (i.e., the historical time series data) using a sliding window. For example, traffic data at Th time points of each node in the network over a period of time (e.g., over two weeks) may be collected in real-time by the SDN controller. The above time point Th can be defined and determined by a person skilled in the art according to actual needs to determine the time point of collecting the history data.

In some embodiments, at each point in time Th, the SDN sub-controller may collect packet data associated with nodes of the network. The data comprises total length, packet loss rate, controller memory load and time stamp. These data are recorded to form a data point. The collected data points are arranged into a sequence matrix S ⁱ (i.e., the historical time series data described above). Wherein the historical time series data S ⁱ Each row of the (c) represents a time point Th, and each column may correspond to the total length of the data packet, the packet loss rate, the controller memory load, and the time stamp. The SDN sub-controller (i.e., SDN controller) is an application in a Software Defined Network (SDN) that is responsible for flow control to ensure the intelligent network. The SDN controller is based on a protocol such as OpenFlow, allowing the server to tell the switch where to send the data packet.

In some embodiments, a person skilled in the art may determine the window length L and the predicted time step Tf for the data slice according to the actual requirements, where the selection of Tf and L may be controlled by the SDN subcontroller for scheduling.

In some embodiments, the input historical time series data S may be used ⁱ Cut into a plurality of time-series segments of length LThe above time series fragment->Future time series segments of length L are also used for predicting the time step Tf. Wherein the latest time-series segments (i.e. the latest time-series segments described above) are kept as verification sets. The person skilled in the art can define the latest time-series segment by himself, for example, the time-series segment of the preceding hour at the current time point in the above-mentioned time-series segment is the latest time-series segment.

In some embodiments, the masking rate may be set by those skilled in the art according to actual requirements. For example, the masking rate may be set to 75%, i.e., about 75% of the data will be masked in each time series slice. Only a portion of the data will be used in the generated output of the first decoder of the pre-training model.

In some embodiments, in each of the time-series segments, a portion of the data points may be randomly selected and a mask flag set according to the mask rate. These data points are not input into the predictive model (e.g., the second decoder of the predictive model and the graph structure learning sub-module), but are targeted for self-supervising tasks.

In some embodiments, a self-supervising task may be created in which the goal of the pre-training model is to generate data for the mask portion from the known fragments.

In some embodiments, the time-series segments may be generated in a sliding window of size h+l. The data in the front H length is used to represent historical data, and the data in the rear L length is used to represent future data, which correspond to the input of the model in machine learning and the true value to be predicted (i.e. the model inputs the front H part of data, tries to predict the rear L part of data) respectively. Each time the time window is slid, a time series of segments is generated. For example, the historical length of the historical time series data in the database is h=p×l, where L is the length to be predicted. That is, where the time series segment of the first p×l length is used as a training sample, the latest time series of the L length is used as a corresponding tag. The value of P here can be set by the user himself. For another example, the ith time series in the historical time series data may be set to S ⁱ . Will input a sequence S ⁱ Split into L non-overlapping segments of P length. Wherein the jth time series segment may be denoted as S _j ⁱ . L is the usual length of the time series segment of the input space-time diagram neural network (STGNN) submodule. The partially non-overlapping segments are randomly masked (masked) at a high masking rate r of 75% and then restored, creating a challenging self-supervising task.

In some embodiments, the encoder of the pre-training model in the present application includes an input embedding layer and a multi-headed sparse self-attention sub-module (which includes a series of transducer models with position coding). The encoder only runs on the unmasked segments. Specifically, referring to fig. 2, the input embedded layer submodule is a linear projection for converting unmasked segments (i.e., unmasked segments) into hidden space:wherein W and b are learnable parameters, d is the dimension of the hidden space, ++>For the input vector (e.g. per unmasked segment),>is the corresponding potential spatial representation. Thereafter, the position-coding layer is used to add sequence information. For example, the position-coding layer may traverse the position-coding vector for each position i. A position-coding vector Upos is generated for each position i. The position-coding vectors are typically generated using sine and cosine functions to capture position information. The specific formula for generating the position coding vector is as follows: / >Where k represents the dimension of the position code, d represents the dimension of the hidden space, and i represents the position index. Thereafter, the potential can be set forth aboveSpatial representation +.>And a position-coding vector U _pos Adding according to elements to obtain a new vector. The new vector is used to represent the feature vector that has added position information, i.e.Then, the feature vector X added with the position information is added _{with_position} Input to a multi-headed sparse self-attention sub-module composed of a transducer model. Wherein the multi-headed sparse self-attention sub-module may comprise a 4-layer self-attention transducer model. Finally, the inputted sequence is encoded by a multi-head sparse self-attention sub-module, and the j-th unmasked segment (such as a time sequence segment Sj is obtained by training the encoding ⁱ ) Potential feature representation (instant moment representation)/(instant moment representation)>The transducer model is a neural network model based on a self-attention mechanism and is used for processing sequence data.

In some embodiments, the first decoder of the pre-training model also includes a series of transducers models (e.g., the masked multi-headed self-attention sub-module in fig. 2). The first decoder reconstructs the potential feature representations back to lower semantic level, i.e., digital level, information. The first decoder operates on a complete set of fragments, including masking fragments. Wherein no more position vectors need to be added here, since all segments have already added position information in the encoder. The first decoder is only used during the pre-training phase to perform the sequence reconstruction task (i.e. the target reconstruction described above).

In some embodiments, the first decoder may use only a single layer transducer model. Finally, a multi-layer perceptron (e.g., a regression prediction layer in a pre-training model) is applied to predict, the number of output dimensions of which is equal to the length of each time series segment. Specifically, the potential features of a given segment j representThe first decoder generates a corresponding reconstruction sequence +.>

In some embodiments, the original sequence (i.e., the above-described time-series segment Sj is calculated ⁱ ) And reconstructing the sequenceMean Absolute Error (Mean-Absolute-Error) between:The average absolute error is used as a loss function of the target reconstruction step. Wherein n represents the total number of data points, such as the number of data points used to calculate MAE;Representing an actual observed or actual target value (actual value), i.e. an actual flow data point in the data;Representing the predicted value generated by the first decoder, i.e., the predicted value of the corresponding data point by the first decoder.

The encoder of the prediction module is the encoder in the pre-training model that has completed training. The encoder of the pre-trained model may assist in learning the required graph structure of the space-time graph neural network (STGNN) sub-module of the predictive model. The characterization of the encoder of the pre-training model may also be added as overlength history information to the space-time-diagram neural network (STGNN) sub-module.

It should be noted that the graph structure learning submodule of the present application is based on a time series segment (e.g. Sj ⁱ ) Latent feature representation H of (1) _j ⁱ Construction and time-series fragment Sj ⁱ Specific processes of the corresponding discrete dependency graph can be referred to the Chinese patent document with publication number CN115688871 based on pretrainingStrong multivariate time series prediction method and System [0063 ]]To [0072 ]]The following related content of the segment "graph structure learning sub-module aims to learn a discrete sparse graph. Specifically, the STEP framework is expected to learn a Bernoulli distribution parameter Θij from which a discrete dependency graph can then be sampled. First, a TSFormer-based representation introduces regularization, providing supervisory information for graph optimization. Specifically, first willFeatures denoted as time series segments i, where l denotes a stitching operation. Then calculate a kNN graph a between all nodes _a . Here the sparsity of the learning graph can be controlled by setting different k. Benefiting from the capability of TSFormer, A _a To reflect the dependency between nodes and to facilitate training of the graph structure. Then, calculate Θ _ij The following are provided: theta (theta) _ij ＝FC(relu(FC(Z _i ||Z _j )))；Z ⁱ ＝relu(FC(H ⁱ ))+G ⁱ The method comprises the steps of carrying out a first treatment on the surface of the Wherein Θ _ij Non-normalized probabilities. The first dimension represents the probability of a positive (i.e., there is a relationship between time series) and the second dimension represents the probability of a negative. G ⁱ Is a global feature of time-series segment i, is defined by convolutional network +.>Obtained by S _train ⁱ Time series of the whole training set, L _train Is the length of the training data set. S is S _train ⁱ Being static for all samples during training helps to make the training process more robust and accurate. Feature H ⁱ Are dynamic for different training samples to reflect the dynamics of the dependency graph. Thus, the present invention uses Θ and kNN diagram A _a Cross entropy between as graph structure regularization:

wherein Θ' _ij ＝softmax(Θ _ij ) Is the normalized probability. A final problem with discrete diagram structure learning is from Θ' _ij The sampling operation to the adjacent matrix a is not trivial. Therefore, gumbel-Softmax reparameterization techniques are applied in STEP: a is that _ij ＝softmax((Θ _ij +g)/τ); where g is independently co-distributed extracted data from the gummel (0, 1) distribution. τ is the temperature parameter of softmax. Gumbel-Softmax converges to a discrete state when τ→0.

In some embodiments, the traffic data at the latest time point may be selected from the traffic data of each node of the network collected in real time by the SDN controller over a period of time (e.g., over two weeks), for example, the data of the previous hour at the current time point.

It should be noted that the discrete dependency graph is used to capture the spatial dependency relationship between nodes in the time series segment. For example, the connection relationship between nodes: edges in a discrete dependency graph represent whether there are connections or dependencies between different nodes (if there are edges between two nodes, there may be some kind of association or interaction between them); for another example, the strength of the dependency: the weights of the edges may represent the strength or weight of the dependencies between the nodes (which helps to quantify the degree of interaction between the nodes); as another example, the network structure: the topology of the graph may provide information about how the nodes are organized and connected (which is helpful in understanding the overall structure of the system); as another example, local and global dependencies: by analyzing the discrete dependency graph, it can be known which nodes are more susceptible to other nodes and which nodes play a key role in the whole network; for another example, time correlation: if time factors in the time series segments are also taken into account, the discrete dependency graph may also help understand how the nodes change over time, and if there are time-dependent dependencies.

The discrete dependency graph is a graph structure for representing the dependency relationship between nodes. In the graph, each node represents a representation of a time series segment, while the edges in the graph represent dependencies or associations between different time series segments. The purpose of constructing the discrete dependency graph is to aid in understanding the relationships between nodes in the time series segment for further analysis, prediction or control. Wherein all "nodes" refer to all time-series segments in the time-series data. Each time series segment corresponds to a node, wherein the characterization of the node is typically a characteristic representation of the time series segment. These node sets constitute the node sets of the discrete dependency graph.

Referring to fig. 3, the time sequence segment Sj is based on ⁱ And potential features represent H _j ⁱ Constructing a discrete dependency graph comprising:

step S21 of feature-based node representation: each time-series segment Sj ⁱ As a node of the discrete dependency graph, it will be associated with a time-series segment Sj ⁱ The corresponding potential features represent H _j ⁱ As a feature vector corresponding to the node;

step S22: creating an initial adjacency matrix Aa; the dimension of the adjacency matrix Aa is the number of nodes multiplied by the number of nodes, the adjacency matrix Aa is used for representing the connection relation among the nodes, and the number of the nodes represents the total number of the nodes in the discrete dependency graph;

Step S23 of automatically constructing connection relation: for each node, K nodes nearest to the node are calculated, and the K nodes nearest to the node are used as neighbor nodes of the node in the discrete dependency graph; determining the connection weight between the nodes according to the similarity between the nodes in the discrete dependency graph; for each node, connecting the node with K nodes nearest to the node according to the corresponding connection weight;

step S24: the above step S23 of automatically constructing the connection relationship is repeated until each node establishes a connection to obtain a discrete dependency graph.

Wherein, the adjacency matrix Aa is used for representing the connection relation between nodes. Each element Aa [ i ] [ j ] of the adjacency matrix indicates whether a connection or dependency exists between node i and node j. The adjacency matrix is typically a binary matrix wherein the value of Aa [ i ] [ j ] can be 0 or 1, indicating no connection or the presence of a connection, respectively.

In some embodiments, in the step S23 of automatically constructing the connection relationship, the metric value (i.e. the corresponding connection weight) may be filled into the corresponding position of the adjacency matrix Aa, so as to reflect the dependency relationship between each node through the adjacency matrix Aa. For example, for node i and node j, if node j belongs to one of the k nearest neighbors of node i, then the corresponding element Aa [ i ] [ j ] =1 in the adjacency matrix; otherwise, the corresponding element Aa [ i ] [ j ] =0 in the adjacent matrix. The specific process of calculating the K nodes nearest to the node (for example, obtained by using the KNN algorithm) belongs to the prior art in the field, so that details thereof are not repeated here. KNN (k-NearestNeighbor, i.e., k nearest neighbor algorithm) is an existing common supervised learning method.

In some embodiments, in the step S23 of automatically constructing the connection relationship, the "similarity" in the "similarity between nodes according to the discrete dependency graph" may be characterized by cosine similarity. Therefore, the corresponding connection weight may be determined by the cosine similarity between the node and one of the K nodes. The connection weights may be determined using a binarization method. For example, if the cosine similarity is higher than a preset threshold, setting the corresponding connection weight to 1; otherwise, the corresponding connection weight is set to 0. The specific value of the preset threshold may be determined by a person skilled in the art according to actual needs, and the preset threshold is not limited herein.

It can be seen that in step S21 of the above feature-based node representation, each time-series segment Sj ⁱ Are considered as one node in the discrete dependency graph and have corresponding feature vectors so that each node can be used to represent a potential feature representation of a corresponding time series segment, thereby better capturing the information of the time series segment; step S23 of automatically constructing connection relations is performed by constructing a discrete dependency graph by calculating cosine similarity between nodes and selecting K nearest neighbor nodes of a certain node, thereby automatically constructing connection relations between nodes without manually defining the connection relations Thereby improving the characterization quality of the data; capturing of spatial dependencies: the connection relation among the nodes is determined through cosine similarity among the nodes, so that the space dependence relation among the nodes can be better captured, and the analysis and understanding of space-time data are facilitated; adjustable connection weight: in some embodiments, the connection weight is determined by adopting a binarization method, so that a person skilled in the art can adjust the weight of the corresponding connection according to actual requirements, thereby better meeting different data analysis requirements; efficiency and automation: through the above-mentioned connection relation between each node of automatic construction (namely the above-mentioned step S23 of automatic construction connection relation), have raised the efficiency of the analysis process, has reduced the demand of manual intervention, thus make the processing of the time series data more automatic; in summary, the steps for constructing the discrete dependency graph have the technical advantages of better feature representation, capturing of spatial dependency relationship, automatic construction of connection relationship, adjustable weight and the like, and the technical advantages are helpful for improving the analysis and characterization quality of the spatio-temporal data.

In some embodiments, all time-series segments Sj that may correspond to historical time-series segments ⁱ Latent characterization of (a)The graph structure learning sub-module is input to train the graph structure learning sub-module.

In some embodiments, the connection between nodes may be determined by calculating cosine similarity based on the relationships between the time series segments.

It should be noted that, the "relationship between time series segments" refers to a temporal sequence of different time series segments, for example, the relationship between two time series segments may be that data trends are similar in a specific time period. The "connection between nodes" refers to a method of determining which nodes should have a connection between them when constructing a node dependency graph. Measuring similarity between nodes based on cosine similarity refers to calculating eachRepresentation of time series segments corresponding to nodesCosine similarity between them. The cosine similarity is a common similarity measure, and measures the cosine value of the included angle between two vectors. For example, if the corresponding cosine similarity is higher than a preset threshold, it may be considered that there is a connection between the nodes, otherwise it is not. For another example, assume that there are multiple sensor nodes in a network, each node recording temperature data over a different period of time. To construct a discrete dependency graph, cosine similarity between the temperature data (i.e., time series segments) of each pair of nodes may be calculated. If the remaining chords are similar above a preset threshold, the two nodes can be connected to indicate that they have similarity in terms of temperature change.

In some embodiments, K Nearest Neighbors (KNNs) are calculated for all nodes in the discrete dependency graph: for each node, the K nodes closest to it are calculated as time interval distance metrics.

It should be noted that the "time interval distance measure" described above relates to how the distance between nodes is calculated to determine the connection between them. Here, it refers to a distance metric method for calculating connections between nodes. Specifically, it includes time intervals or time differences between nodes. For example, it may be considered to use the time stamp between two nodes to calculate the time difference between them, which may be used as a distance metric. The K Nearest Neighbor (KNN) method is a method of determining connections between nodes using a time interval distance metric, which finds the nearest K nodes of each node, which are considered to have a connection with the target node.

In some embodiments, the space-time diagram neural network submodule may be used to extract the latest time series segmentsCharacterization of->The method can effectively encode the time sequence fragments to represent, and further better understand and process the space-time information in the time sequence data. The above characterization- >For characterizing the latest time-series fragment->I.e. periodicity and trend of the data. The above characterization->These features may be included, i.e., represented in a more abstract manner. A space-time diagram neural network model, i.e., a diagram convolution (GCN), may perform diagram structure learning to process space-time diagram data. />

In some embodiments, only the traffic data of the last time-series segment (i.e., the latest time-series segment) of each node may be input to the fine-tuning learning architecture of the space-time diagram neural network (STGNN) sub-module and trained, so as to obtain the corresponding spatial dependency relationship.

It should be noted that, the fine-tuning learning architecture from the traffic data of the latest time sequence segment to the space-time diagram neural network (STGNN) sub-module is adopted and trained, so that the data change can be responded quickly, meanwhile, the cost is reduced, and a large amount of historical data does not need to be processed. Instead of directly employing the existing untrained space-time graph neural network (STGNN) sub-module, a version model based on a trim improvement of the STGNN model (i.e., the trim learning architecture described above) is employed herein.

It should be noted that the function of the "space-time diagram neural network submodule" is to analyze and process the space-time relationship in the time-series segment. In particular, it uses the latest time series segments as central nodes to construct a time-space diagram, which is then analyzed and modeled using neural network techniques. In this application, it uses the latest time series segment as a central node to construct a time-space diagram in order to better understand and process the time-space information in the time series segment.

It should be noted that the main purpose of constructing a space-time diagram is to transform the space-time relationship in the time series segment into a graphic structure for better understanding, representing and analyzing the data. This may provide a more powerful tool and method for various spatiotemporal related tasks. Wherein when constructing the time space diagram, the connection relation between nodes and the information between neighboring nodes can be used to learn the higher-level feature representation. This helps to improve the characterization of the data to improve the performance of subsequent tasks. In addition, based on the constructed space-time diagram, various tasks such as prediction of time series fragments, anomaly detection, analysis, data clustering and the like can be performed. And the space-time diagram may provide more rich information to support these tasks.

Referring to FIG. 4, a discrete dependency graph and a time series segment Sj are based ⁱ Constructing a space-time diagram from the latest time sequence segment of (a), comprising:

step S30: segment the time sequence Sj ⁱ Is the most recent time series segment in (a)As a central node of the discrete dependency graph;

step S31: calculating the association degree between the central node and the non-latest time sequence segment; wherein the association is characterized by adopting a corresponding similarity measure; the non-latest time sequence segment is the time sequence segment Sj ⁱ Dividing the latest time series segmentTime-series segments outside;

step S32: selecting a preset number of time sequence fragments with highest association degree with the center node from the non-latest time sequence fragments as neighbor nodes of the center node;

step S33: connecting the central node and the neighbor nodes of the central node to form a space-time diagram; wherein the space-time diagram represents a space-time dependency relationship between the central node and a preset number of time-series segments.

In some embodiments, the center node and the time-series segment Sj are adopted in the step S31 ⁱ Divide-by-last time series segmentCosine similarity between other time series segments characterizes the above-mentioned degree of association. The calculation of the cosine similarity belongs to the prior art in the field, so specific calculation of the cosine similarity is not repeated.

In some embodiments, the latest time series segment S may be _P ⁱ As the central node of the space-time diagram, the latest time sequence segment S is based on the initial discrete dependency diagram _P ⁱ Its neighbor nodes in the discrete dependency graph are found, which together with the central node constitute the space-time graph.

It can be seen that the above-described dynamic central node selection (i.e., step S30) technique allows the latest timing segment to be selected from the timing data as the central node of the discrete dependency graph. This means that the discrete dependency graph of the present application is dynamic, i.e. different central nodes can be selected according to different moments to accommodate different data changes and requirements; the above-mentioned technical means of automatically constructing neighbor nodes (i.e., steps S31 to S32) automatically constructs neighbor nodes by calculating the degree of association and setting a threshold value. This helps identify time series segments associated with the central node, eliminating the need to manually define neighboring nodes. The above-mentioned technique of capturing the space-time dependency (i.e., step S33) constructs a space-time diagram by connecting the center node with its neighboring nodes, and the constructed space-time diagram helps to represent the space-time dependency between the center node and its associated time-series segment (e.g., helps to capture the latest time-series segment S) _P ⁱ Spatial dependency of (c) in the set of (c). This helps to better understand and analyze the correlation in the time series data.

In some embodiments, a characterization of the latest timing segment may be taken as an input feature corresponding to the center node, where the characterization of the latest timing segment may include a feature vector for the segment.

The technical means of the 'taking the characterization of the latest time sequence segment as the input characteristic corresponding to the central node' shows that the time-space diagram can provide personalized characteristic characterization so as to better reflect the characteristics and information of the time sequence segment.

In some embodiments, the connection relationship and the weight of the constructed space-time diagram may be integrated with the feature vector of the central node to form the input data. And then inputting the input data into the regression prediction layer sub-module.

It can be seen that the above-mentioned technical means of "integrating to form input data" can integrate the connection relationship and weight of the space-time diagram with the feature vector of the central node to form input data. This provides a comprehensive data representation that includes spatio-temporal dependencies and characteristic information to facilitate subsequent analysis and modeling. In summary, the foregoing steps provide technical effects of dynamic center node selection, automatic construction of neighbor nodes, capturing of space-time dependencies, personalized feature characterization, integration to form input data, and the like. These technical effects help to improve the quality of characterization and analysis of time series data, making it more suitable for modeling and analysis of spatio-temporal data.

In some embodiments, the predictive model is further used to: updating the potential feature representation of the central node of the time-space diagram according to the information acquired from the neighbor nodes of the time-space diagram; and predicting the future time sequence segment based on the potential characteristic representation of the latest time sequence segment and the updated time space diagram to obtain a corresponding prediction result. Referring to fig. 5, updating the potential feature representation of the center node of the time-space diagram according to the information collected from the neighbor nodes of the time-space diagram includes:

step S40: collecting feature vectors corresponding to neighbor nodes from the neighbor nodes in the time-space diagram;

step S41: aggregating feature vectors collected from different neighbor nodes to generate a new potential feature representation of the center node;

step S42: inputting the new potential feature representation into a nonlinear activation function to obtain a nonlinear transformed new potential feature representation;

step S43: the original potential feature representation of the central node is replaced by the new potential feature representation after nonlinear transformation.

In some embodiments, the potential feature representation of the center node of the time-space graph may be initialized before updating the potential feature representation. For example, the potential feature representation of the center node may be initialized to the original feature vector or other representation of the center node. The person skilled in the art can determine the other characteristics described above according to the actual requirements. Thereafter, relevant information is collected from the neighboring nodes to update the potential feature representation of the central node by taking into account the relationship between the central node and its neighboring nodes.

In some embodiments, the manner of aggregation in step S41 includes: weighted averaging, summing or averaging.

It can be seen that the specific manner in which the potential feature representations of the central node are initialized described above means that different methods can be employed to initialize the potential feature representations of the central node, including, for example, the use of raw feature vectors or other types of characterizations.

It can be seen that through the above step of information dissemination (i.e., step S40), information can be efficiently collected from each neighbor node, for example, by weighted averaging of potential feature representations of neighbor nodes, where weights are typically calculated based on connection strength or other similarity metrics. This helps to better spread information and features; the step of feature aggregation described above (i.e., step S41) describes an aggregation process of information, including weighted averages at the element level, summary functions, or other task-specific aggregation methods, that facilitate the generation of new potential feature representations of the central node, including information of neighboring nodes; the step of nonlinear transformation described above (i.e., step S42) introduces nonlinear transformation to better capture complex patterns and dependencies in the data by passing new latent feature representations to nonlinear activation functions. This helps to increase the expressive power of the characterization; the step of updating the feature (i.e. step S43) allocates the new potential feature representation after the nonlinear transformation to the central node to replace the original potential feature representation of the central node, which means that the feature representation of the central node will include information about itself and neighboring nodes, thereby improving the information richness of the feature; in summary, by the above steps S40 to S43, effective aggregation of information can be achieved. This includes collecting information from neighbor nodes, weighted averaging feature representations of neighbor nodes, and applying a nonlinear transformation. In general, according to the steps described above, technical effects including efficient feature representation initialization, aggregation and propagation of information, nonlinear transformation, and updating of features can be achieved, which helps to improve the characterization and analysis capabilities of spatio-temporal data, thereby better capturing associations and patterns in the data.

It should be noted that the above-mentioned "updating the potential feature representation of the center node of the time-space diagram according to the information collected from the neighbor nodes of the time-space diagram" helps to capture the spatial dependency relationship of the latest time-sequence segment. For example, assume a city traffic network, each node representing an intersection, each node having a characteristic representation (e.g., the potential characteristic representation described above) that includes traffic flow, speed, congestion, etc. information for the intersection. If it is desired to analyze how traffic flows propagate in cities to predict future traffic conditions. In this scenario, the above-mentioned "spatial dependency relationship" may refer to a case where traffic flows between different portals affect each other. For example, the traffic flow at one intersection may be affected by the traffic flow at an adjacent intersection, and if one intersection is congested, the intersections in the vicinity may be affected by traffic congestion. Wherein a central node V is assumed _i It has an initial characteristic representation X _i And a set of neighbor nodes N _i They also have the respective characteristic X _j Wherein j is N _i . In some embodiments, the neural network layer may be used to update the feature representation of the central node as follows: x is X _i ’＝f(X _i ，X _j For j belongs to Ni), where f is a function or neural network layer that will center node V _i And its neighbor nodeSign representation X _j In combination, an updated feature representation X is generated _i '. This updated feature representation X _i ' will better capture the central node V _i Is a spatial dependency of (a).

It should be noted that, the end-to-end training of the space-time diagram neural network submodule is as follows: the parameters of the model are optimized by jointly training the entire space-time diagram neural network sub-module to better adapt it to the task. This task includes spatiotemporal relationship modeling, prediction. End-to-end training helps the model learn more meaningful representations from the data to improve task performance. Wherein the latest time series segmentCharacterization of->Refers to a representation of the most recent time series segment for each node i. It may include the characteristics of the node and other information about the most recent time series segment. Which is a representation of the most recent state of a node. And the latest time-series fragment->Characterization of->Corresponding dependency characterization H _gw Refers to a representation of dependency information of a node. Dependency information refers to relationship information between a node and its neighboring nodes or other related nodes, which relate to temporal and spatial dependencies. Namely H _gw Information representing these dependencies.

It should be noted that, the latest segment of each node represents Hp and its corresponding dependency representation H _gw The purpose of the fusion is to combine the latest state (Hp) of the node with the information (H) of its dependency _gw ) Taken together to obtain a more comprehensive, informative representation of the nodes. This helps the model better understand the time series segments of the node, including its and other nodesAnd the space-time relationship between the two models improves the performance of the models in the task.

In some embodiments, a semantic projector SP may be introduced to convert the latest segment representation Hp into the semantic space of the dependency graph representation Hgw, and the Hp is used to learn the projective transformation matrix through training to form a combined matrix H _final ：H _final =sp (Hp) + Hgw. Finally, please refer to fig. 6, the prediction is performed by the regression prediction layer submodule:given future true value Y ε R ^TxNxd The average absolute error is used as regression loss.

It should be noted that the main function of the above-mentioned "introducing a semantic projector SP" is to spatially transform the latest fragment representation Hp from its original representation to the dependency representation H _gw To better fuse the two representations and to improve model performance. The general flow of the above "convert the latest fragment representation Hp to semantic space of the dependency graph representation Hgw" is: first, a projective transformation matrix needs to be defined that will be used to map Hp from its original representation space to semantic space; wherein the matrix may be a learnable parameter of the model, learned by a training process; 2) Learning a projective transformation matrix: in the training process, the model learns how to adjust the weight of the projective transformation matrix so as to reserve the useful information of Hp to the maximum extent and map the useful information to semantic space; this is typically done by minimizing a loss function, the design of which can be determined according to the task requirements; 3) Applying a projective transformation: once the projective transformation matrix is learned, the latest fragment representation Hp can be transformed through the matrix; this transformation can be done using matrix multiplication: h _final ＝SP(Hp)+H _gw The method comprises the steps of carrying out a first treatment on the surface of the Wherein SP represents a semantic projector, H _final Is the transformed most recent fragment representation and Hp is the original most recent fragment representation.

The matrix H after the combination is formed _final The specific procedures of (2) are well known in the art. For example, reference may be made to publication number CN115688871 in the Chinese patent literature [0077 ] of the pretraining-enhanced-based multivariate time series prediction method and system ]]The following of the segments is that the "usual space-time-diagram neural network (STGNN) sub-module takes as input the last segment (such as the latest timing segment described above) and the discrete dependency graph, while the enhanced space-time-diagram neural network (STGNN) sub-module also considers the representation of the input segment. The above representation is due to the strong ability of the transducer model to extract long-term dependenciesContains rich context information. The STEP framework can be extended to almost any STGNN, where an existing approach is used as a back-end model, graph WaveNet. Graph WaveNet effectively captures space-time dependencies by combining Graph convolution with an expanded random convolution. It is based on its output potential hidden representation H _gw ∈R ^Nxd The prediction is performed by a regression prediction layer, e.g. a multi-layer perceptron. Representation H of a transducer model of the last fragment P of all nodes i ⁱ _P The representation can be combined as: h _P ∈R ^Nxd Representations of Graph WaveNet and TSFormer are fused in the following manner:

H _final ＝SP(H _P )+Hgw

wherein SP (&) is H ⁱ _P Conversion to H _gw Semantic projector of semantic space (e.g., implemented by a multi-layer perceptron). Finally, prediction is carried out through a regression prediction layer:given future true value Y ε R ^TxNxd Average absolute error was used as regression loss:

where N is the number of nodes, T is the number of prediction steps, and C is the dimension of the output.

The space-time diagram neural network (STGNN) sub-module and the diagram structure learning sub-module train in an end-to-end fashion. The expression of the total loss function L of the space-time diagram neural network (STGNN) sub-module and the diagram structure learning sub-module may be:

L＝L _regression +λL _graph ，

wherein λL _graph Is a gradual decaying graph regularization term. The set-up graph regularization term λ decays gradually during training to surpass the kNN graph. Wherein the graph regularization term L _graph For constraining the learning of the model to better capture the correlation between data points. λ is a superparameter that adjusts the weights of the graph regularization term. The graph regularization term may be tapered, meaning that during training, the value of λ will gradually decrease, allowing the model to be initially more constrained by graph regularization, and then gradually decrease as training progresses so that the model is increasingly focused on regression tasks. The lead-in graph regularization term is used to capture the correlation between data.

It should be noted that, the "correlation between captured data" refers to that the model tries to understand and learn the correlation or correlation between different data points. For example, time correlation in time series data: for time series data, there may be a temporal correlation between data points at different points in time, and the model may attempt to capture such temporal correlation to better predict future data points; for another example, node dependencies in graph data: in graph data, there may be node dependencies between data points of different nodes. For example, there may be a relationship between users in a social network, or traffic between different intersections in a traffic network may affect each other. The model may also attempt to capture such node dependencies to better understand the network data.

It should be noted that, the above "introducing a graph regularization term" generally refers to restricting learning of a model by adding a regularization term (typically, a regularization term based on a graph structure) when training a machine learning model, so as to better capture the relevance between data points. In particular, a graph structure between data may be defined, where nodes represent data points and edges represent associations between data. This may be a predefined graph structure or may be automatically constructed from data. Thereafter, a regularization term associated with the graph structure, commonly referred to as a graph regularization term, needs to be defined. This regularization term is typically based on similarity, connectivity, or other relevance metrics between nodes. One common example is the laplace regularization term. Then, the graph regularization term is added to the loss function of the model, so that the model can try to fit data in the training process, and can be constrained by the regularization term to better capture the relevance between data points.

In some embodiments, a random gradient descent (SGD) may be employed to train a space-time graph neural network (STGNN) sub-module and a graph structure learning sub-module to minimize the total loss function L described above. For example, the gradient is propagated from the total loss function back to the model parameters to update the parameters. And performing multiple iterations on the training data, and continuously adjusting model parameters so that a predicted result gradually approaches to a real flow value. The specific process of training the space-time graph neural network (STGNN) sub-module and the graph structure learning sub-module belongs to the prior art in the field, and therefore will not be described herein.

In some embodiments, the time series segment corresponding to the reserved verification set may be input into the regression prediction layer to predict the traffic data of the next time series segment Tf of each node. The method belongs to the stage of adopting the fine-tuning learning architecture of the latest time series segment flow data to a space-time diagram neural network (STGNN) submodule and training (namely regression prediction and continuous fine-tuning stage). The regression prediction and continuous fine tuning stage herein may further include: initializing parameters of a pre-training model, and initializing a regression model by using the parameters of the pre-training model; preparing a training data set with a label, collecting the training data set with a real flow value label, and ensuring that each sample has a corresponding real flow fragment value at the future Tf moment; scheduling regulation is carried out by utilizing an SDN sub-controller, and whether the total time used for predicting the prediction model to obtain a corresponding prediction result is smaller than the time corresponding to a future time step number range (such as a prediction range of a Tf step sliding window) which can be predicted by the prediction model is monitored; if the total time is greater than the time corresponding to the future time step number range which can be predicted by the prediction model, immediately stopping the prediction work of the prediction model; or if the precision of the corresponding prediction result is lower than the preset precision threshold value, immediately stopping the prediction work of the prediction model, and retraining the prediction model. Wherein, the "prediction range of Tf step sliding window" refers to a window in the time sequence segment, and the window includes the future time step number range predicted by the model. The size of this window is determined by a parameter Tf, which represents the number of future time steps to be predicted. For example, if tf=24, then the "prediction horizon of Tf step sliding window" indicates that the model is predicting data for the next 24 time steps. This window would cover the data points for the 24 future time steps, starting from the current time step. The model will attempt to predict the data values in the time range from the current time step to 24 time steps later.

It should be noted that the "parameters of the pre-training model" include weights, deviations, and other learnable parameters of the model, which are learned during the pre-training phase, and are used to capture general features of the input data. The above "regression model" refers to a model for regression tasks (e.g., a regression prediction layer sub-module of a prediction model), typically including one or more neural network layers. The goal of the regression model in this application is to predict the traffic data of the next time series segment of each node from the input data. The function of the regression model is to perform a regression task, i.e. to predict flow data for future time steps from the input data. By initializing the regression model using the parameters of the pre-trained model, the generic features learned in previous tasks of the pre-trained model can be exploited to improve the performance of the regression model. This approach to transfer learning can generally accelerate and improve the learning process of the model on a particular task.

In some embodiments, when coordination of network resource allocation and traffic prediction is required, the SDN sub-controller starts a prediction flow (i.e. step S200 described above). This step involves resource allocation and network traffic prediction. In this process, the SDN sub-controller needs to consider the current network topology, resource availability and existing traffic data to predict future traffic demands.

In some embodiments, the dynamic network traffic regulation method of the present application further includes:

step S300: and regulating and controlling the network according to the corresponding prediction result.

The above step S300 means that the SDN sub-controller will dynamically adjust the network resource allocation to meet the predicted traffic demand to ensure network performance and efficiency.

It should be noted that, a specific process of regulating and controlling the network according to the corresponding prediction result (e.g. using the SDN sub-controller) belongs to the prior art in the field, so that a detailed description is omitted herein.

It can be seen that the SDN sub-controller starts the prediction flow when it needs to coordinate network resource allocation and traffic prediction, which means that network resources can be adjusted in real time according to actual requirements, so as to improve network performance and efficiency.

In some embodiments, please refer to fig. 7, the dynamic network traffic regulation method of the present application further includes:

step S400: monitoring the total time used by the prediction model for prediction to obtain a corresponding prediction result; and if the total time is greater than the time corresponding to the future time step number range which can be predicted by the prediction model, immediately stopping the prediction work of the prediction model.

And if the precision of the corresponding prediction result is lower than a preset precision threshold value, immediately stopping the prediction work of the prediction model.

In some embodiments, the dynamic network traffic regulation method of the present application further includes: and if the precision of the corresponding prediction result is lower than a preset precision threshold value, immediately stopping the prediction work of the prediction model.

In some embodiments, if the total time is greater than the time corresponding to the future time step range that can be predicted by the prediction model, the prediction operation of the prediction model is stopped immediately to avoid exceeding the acceptable time range.

In some embodiments, if the precision of the corresponding prediction result is lower than the preset precision threshold, the prediction work of the prediction model is immediately stopped, and the prediction model is retrained.

In some embodiments, the SDN sub-controller may monitor the total time used by the prediction model to predict to get the corresponding prediction result. The total time includes the time elapsed from the start of the execution of the prediction to the completion of the prediction. The total time is monitored to ensure that it is less than Tf, where Tf represents the predicted range of the sliding window.

It can be seen that when the SDN sub-controller performs resource allocation and network traffic prediction (as in step S200 above), the current network topology, resource availability and existing traffic data are combined, which helps to predict future traffic demands, so that resources are better allocated and network efficiency is improved.

It can be seen that the SDN sub-controllers monitor the above-mentioned total usage to ensure that they are completed within an acceptable time frame. This helps ensure that the network regulation process does not cause unnecessary delays or performance degradation. If the total time exceeds the prediction range (Tf), the SDN sub-controller can automatically end the prediction process to avoid unnecessary resource waste and time delay.

It can be seen that if the accuracy of the corresponding prediction result is lower than the preset accuracy threshold, the SDN sub-controller may terminate the current prediction flow (i.e. perform the quality control operation) and re-predict the model. This helps to improve the accuracy and quality of the predictions. In general, these steps include real-time network regulation, resource optimization and traffic prediction, time-efficient monitoring, automatic suspension of flows, and quality control, which helps to improve the performance, efficiency, and reliability of the SDN network and ensure efficient utilization of network resources.

It should be noted that the pre-training-based dynamic network flow regulation method and the system thereof provided by the application can be applied and implemented based on the intelligent ecological network IEN. That is, the pre-training-based dynamic network traffic regulation method and system thereof can also be adopted or applied to the system architecture of the intelligent ecological network (Intelligent Eco Networking, abbreviated as IEN).

The above is some description of a dynamic network traffic regulation method based on pre-training. The application also discloses a dynamic network flow regulation and control system based on pre-training in some embodiments. Referring to fig. 8, the system includes:

the prediction module 100 is configured to obtain a historical time sequence segment from a network, and segment the historical time sequence segment to obtain a plurality of time sequence segments; inputting the time sequence segments into a trained prediction model to predict future time sequence segments through the trained prediction model so as to obtain corresponding prediction results;

the prediction model comprises an encoder, a graph structure learning sub-module and a second decoder;

the encoder is extracted from a pre-trained model that has been pre-trained,

the encoder is used for acquiring potential characteristic representations of the time series segments;

wherein the space-time diagram neural network sub-module is used for constructing a space-time diagram based on the discrete dependency diagram and the latest time sequence segment in the time sequence segments,

And updating the potential feature representation of the central node of the time-space diagram according to the information acquired from the neighbor nodes of the time-space diagram;

the regression prediction layer submodule is used for predicting the future time series segment based on the potential feature representation of the latest time series segment and the space-time diagram before update or the space-time diagram after update.

It should be noted that, the specific flow and technical effects of the dynamic network flow control system are substantially similar to those of the dynamic network flow control method described above, that is, the specific flow and technical effects of the dynamic network flow control system may refer to the specific flow and technical effects of the dynamic network flow control method described above, and will not be repeated herein.

The foregoing is a few descriptions of a pre-training based dynamic network traffic regulation system. Also disclosed in some embodiments of the present application is a computer readable storage medium comprising a program executable by a processor to implement a method as in any of the embodiments of the present application.

Reference is made to various exemplary embodiments herein. However, those skilled in the art will recognize that changes and modifications may be made to the exemplary embodiments without departing from the scope herein. For example, the various operational steps and components used to perform the operational steps may be implemented in different ways (e.g., one or more steps may be deleted, modified, or combined into other steps) depending on the particular application or taking into account any number of cost functions associated with the operation of the system.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. Additionally, as will be appreciated by one of skill in the art, the principles herein may be reflected in a computer program product on a computer readable storage medium preloaded with computer readable program code. Any tangible, non-transitory computer readable storage medium may be used, including magnetic storage devices (hard disks, floppy disks, etc.), optical storage devices (CD-to-ROM, DVD, blu-Ray disks, etc.), flash memory, and/or the like. These computer program instructions may be loaded onto a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions which execute on the computer or other programmable data processing apparatus create means for implementing the functions specified. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including means which implement the function specified. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified.

While the principles herein have been shown in various embodiments, many modifications of structure, arrangement, proportions, elements, materials, and components, which are particularly adapted to specific environments and operative requirements, may be used without departing from the principles and scope of the present disclosure. The above modifications and other changes or modifications are intended to be included within the scope of this document.

The foregoing detailed description has been described with reference to various embodiments. However, those skilled in the art will recognize that various modifications and changes may be made without departing from the scope of the present disclosure. Accordingly, the present disclosure is to be considered as illustrative and not restrictive in character, and all such modifications are intended to be included within the scope thereof. Also, advantages, other advantages, and solutions to problems have been described above with regard to various embodiments. The benefits, advantages, solutions to problems, and any element(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, system, article, or apparatus. Furthermore, the term "couple" and any other variants thereof are used herein to refer to physical connections, electrical connections, magnetic connections, optical connections, communication connections, functional connections, and/or any other connection.

Those skilled in the art will recognize that many changes may be made to the details of the above-described embodiments without departing from the underlying principles of the invention. Accordingly, the scope of the invention should be determined only by the following claims.

Claims

1. A pre-training-based dynamic network traffic regulation method, comprising:

inputting the time series segment into a trained predictive model, the predictive model being for:

acquiring a potential feature representation of the time series segment;

2. The dynamic network traffic regulation method of claim 1, wherein the constructing a discrete dependency graph based on the time-series segments and the potential feature representation comprises:

3. The dynamic network traffic regulating method according to claim 2, wherein said constructing a time-space diagram based on the discrete dependency graph and the latest time-series segment among the time-series segments comprises:

4. The dynamic network traffic regulation method of claim 1, wherein the predictive model is further configured to:

updating potential feature representations of a central node of the space-time diagram according to information acquired from neighbor nodes of the space-time diagram;

and predicting the future time sequence segment based on the potential feature representation of the latest time sequence segment and the updated time-space diagram to obtain a corresponding prediction result.

5. The method of dynamic network traffic regulation according to claim 4, wherein updating the potential feature representation of the central node of the space-time diagram based on information collected from the neighbor nodes of the space-time diagram comprises:

6. The dynamic network traffic regulating method according to claim 5, further comprising:

and regulating and controlling the network according to the corresponding prediction result.

7. The dynamic network traffic regulating method according to claim 1 or 6, further comprising:

monitoring the total time used by the prediction model for carrying out the prediction to obtain the corresponding prediction result;

And if the total time is greater than the time corresponding to the future time step number range which can be predicted by the prediction model, immediately stopping the prediction work of the prediction model.

8. The dynamic network traffic regulating method according to claim 1 or 6, further comprising: and if the precision of the corresponding prediction result is lower than a preset precision threshold value, immediately stopping the prediction work of the prediction model.

9. A pre-training based dynamic network traffic regulation system, comprising:

the prediction module is configured to acquire historical time series data from a network, and segment the historical time series data to obtain a plurality of time series fragments; inputting the time sequence segments into a trained prediction model to predict future time sequence segments through the trained prediction model so as to obtain corresponding prediction results;

the encoder is extracted from a pre-trained model that has been pre-trained,

10. A computer-readable storage medium comprising a program executable by a processor to implement the dynamic network traffic regulation method according to any one of claims 1 to 8.