WO2024021990A1 - 一种路径确定的方法及相关设备 - Google Patents

一种路径确定的方法及相关设备 Download PDF

Info

Publication number
WO2024021990A1
WO2024021990A1 PCT/CN2023/103818 CN2023103818W WO2024021990A1 WO 2024021990 A1 WO2024021990 A1 WO 2024021990A1 CN 2023103818 W CN2023103818 W CN 2023103818W WO 2024021990 A1 WO2024021990 A1 WO 2024021990A1
Authority
WO
WIPO (PCT)
Prior art keywords
network device
network
network devices
communication
paths
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2023/103818
Other languages
English (en)
French (fr)
Inventor
单良
温华锋
吴涛
李军
吴钦志
王炳权
龚翔宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to EP23845203.1A priority Critical patent/EP4557688A4/en
Publication of WO2024021990A1 publication Critical patent/WO2024021990A1/zh
Priority to US19/027,916 priority patent/US20250168116A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/38Flow based routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/42Centralised routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/64Routing or path finding of packets in data switching networks using an overlay routing layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering

Definitions

  • the present application relates to the field of communications, and in particular, to a path determination method and related equipment.
  • the forwarding device may include a router, a switch, a virtual machine, etc.
  • the communication device As the scale of the network becomes larger and larger, it may be necessary to forward the data flows between different communication devices through multi-layer network forwarding.
  • this multi-layer network there are often multiple paths between the communication device corresponding to the source address of the data flow and the communication device corresponding to the destination address of the data flow.
  • the communication device When the communication device as a forwarding device forwards a data flow, it selects a path among the multiple paths based on local policies, and forwards the data flow based on the locally selected path.
  • each communication device serving as a forwarding device there are often multiple communication devices serving as forwarding devices, and the path determination of each communication device serving as a forwarding device is based on its own local data flow, which can easily lead to conflicts in the paths determined by different forwarding devices. Affects the forwarding efficiency of data flow.
  • This application provides a path determination method and related equipment to improve the forwarding efficiency of data flows.
  • the first aspect of the present application provides a method for path determination, which method is executed by a first network device, or the method is executed by some components (such as a processor, a chip or a chip system, etc.) in the first network device, or The method may also be implemented by a logic module or software that can realize all or part of the functions of the first network device.
  • the method is described by taking the example that the method is executed by a first network device.
  • the first network device may be a router, a switch, a virtual machine, etc.
  • the first network device obtains first topology information.
  • the first topology information includes connection relationships between N second network devices and P third network devices. Any second network device is any first network device.
  • the upstream network device of the three network devices N is an integer greater than or equal to 2, and P is an integer greater than or equal to 1; the first network device obtains the communication relationship of M data streams, and each piece of data in the M data streams
  • the communication relationship of the flow includes source address information and destination address information.
  • M is an integer greater than or equal to 2.
  • the M data flows are transmitted to the P third network devices through the N second network devices respectively; the first network The device determines M paths based on the communication relationships of the M data flows and the first topology information.
  • the M paths respectively correspond to the M data flows.
  • the M paths indicate the P data flows through the N second network devices.
  • the third network device transmits the M data flow paths; the first network device sends the M paths to the N second network devices respectively.
  • the first network device obtains the first topology information including the connection relationships between N second network devices and P third network devices, and the first network device obtains the communication relationships of the M data streams Afterwards, the first network device determines and sends M paths to the N second network devices according to the communication relationships of the M data flows and the first topology information. Thereafter, N second network devices may respectively send the M data streams to P third network devices based on the M paths.
  • the first network device serves as a path determination device, and the path determination basis of the first network device is the connection relationship between the N second network devices and the P third network devices and the communication relationship of the M data streams.
  • the first network device can determine the path based on global information to avoid path conflicts. Conflicts improve the forwarding efficiency of data flows.
  • any one of the M data streams may be a unidirectional data stream or a bidirectional data stream, which is not limited in this application.
  • the communication relationship of the bidirectional data flow may only include source address information and destination address information of a certain flow direction.
  • the communication relationship of the bidirectional data flow may also include source address information and destination address information corresponding to the two flow directions, which are not limited here.
  • the M data flows pass through the P third network devices to the K fourth network devices.
  • Transmission, K is an integer greater than or equal to 1;
  • the M data streams include a first data stream and a second data stream, and the source address information of the first data stream and the source address information of the second data stream correspond to different The second network device, the destination address information of the first data flow and the destination address information of the second data flow correspond to the same fourth network device,
  • the M paths include the first path and the second path, and the first path and The first data flow corresponds to the second data flow, the second path corresponds to the second data flow, and the first path and the second path correspond to different third network devices.
  • any fourth network device is different from any second network device.
  • At least one network device among the N second network devices and at least one network device among the K fourth network devices are the same network device.
  • N and K are equal, and the N second network devices and the K fourth network devices are the same network devices.
  • the M paths determined by the first network device include a first path corresponding to the first data flow and a second path corresponding to the second data flow, and the first path and the second path correspond to Different third network devices, wherein the source address information of the first data flow and the source address information of the second data flow correspond to different second network devices, and the destination address information of the first data flow is different from the second data flow.
  • the destination address information of the data flow corresponds to the same fourth network device.
  • any fourth network device is a downstream network device of any third network device, after the first data flow and the second data flow are respectively sent to different third network devices through different second network devices, the different The third network device respectively sends the first data flow and the second data flow to the same fourth network device. Therefore, it is possible to avoid network congestion caused by data flows from different second network devices being transmitted through the same third network device and then being transmitted to the same fourth network device through the same third network device, thereby improving the first The transmission efficiency of the data stream and the second data stream.
  • the M paths also indicate egress ports of the M data flows on the N second network devices.
  • the M paths determined by the first network device in addition to indicating the paths for transmitting the M data streams through the N second network devices to the P third network devices, the M paths determined by the first network device also indicate the The egress ports of the M data flows on the N second network devices, so that the N second network devices can clearly send the egress ports of the M data flows after receiving M paths.
  • the M data streams include a third data stream and a fourth data stream
  • the source address information of the third data stream and the source address information of the fourth data stream correspond to
  • the M paths include a third path and a fourth path, the third path corresponds to the third data flow, the fourth path corresponds to the fourth data flow, and the third path corresponds to the third data flow. The four paths are different.
  • the M paths determined by the first network device include a third path corresponding to the third data flow and a fourth path corresponding to the fourth data flow, and the third path and the fourth path are different.
  • the source address information of the third data flow and the source address information of the fourth data flow correspond to the same second network device.
  • the third data stream and the fourth data stream are transmitted through different paths through the same second network device. Therefore, network congestion caused when the data flow from the same second network device is transmitted through the same path can be avoided, thereby improving the transmission efficiency of the third data flow and the fourth data flow.
  • the topology information also includes connection relationships between P third network devices and K fourth network devices.
  • an implementation method for the first network device to determine M paths is provided, so that the first network device determines the M paths based on the mapping relationship between the N second network devices and the K fourth network devices. path.
  • the first network device determining the M paths according to the first mapping relationship includes: the first network device determining the first sorting information according to the first mapping relationship, the first A sorting information is used to indicate the sorting of the number of second network devices corresponding to the K fourth network devices; the first network device sequentially traverses the egress ports of the N second network devices according to the first sorting information, Obtain a second mapping relationship; wherein the second mapping relationship is used to indicate the mapping relationship between the egress ports of the N second network devices and the K fourth network devices; the first network device is based on the second mapping The relationship determines the M paths.
  • an implementation method for the first network device to determine M paths is provided, so that the first network device can determine M paths based on The first mapping relationship, the first sorting information and the second mapping relationship determined in sequence determine the M paths.
  • the first network device sequentially traverses the egress ports of the N second network devices according to the first sorting information
  • the second mapping relationship obtained includes: the first network device The device sequentially traverses the egress ports of the N second network devices according to the first sorting information to obtain a third mapping relationship; wherein the third mapping relationship is used to indicate the second port corresponding to each fourth network device.
  • the third mapping relationship is used to indicate the optional number of egress ports of the second network device corresponding to each fourth network device, wherein the larger the value of the optional number is, the larger the value of the corresponding fourth network device is.
  • the smaller the uncertainty of the optional path of the network device conversely, the smaller the value of the optional number, the greater the uncertainty of the corresponding optional path of the fourth network device.
  • the second mapping relationship determined based on the third mapping relationship can be traversed first for the egress port of the second network device corresponding to the fourth network device with less uncertainty in the optional path, so as to improve the accuracy of the solution. property, to avoid subsequent conflicts among the M paths determined based on the second mapping relationship.
  • the method further includes: the first network device acquiring second topology information, the second topology information including A second network devices and the P third network devices.
  • the connection relationship between, at least one second network device among the A second network devices is the same as at least one second network device among the N second network devices, and the A is an integer greater than or equal to 1; the A network device obtains the communication relationship of B data streams.
  • the communication relationship of each data stream in the B data streams includes source address information and destination address information.
  • B is an integer greater than or equal to 1.
  • the B data streams Transmit to the P third network devices respectively through the A second network devices; after the first network device determines M paths based on the communication relationships of the M data flows and the topology information, the method also includes: The first network device determines B paths according to the communication relationship of the B data flows and the topology information. The B paths respectively correspond to the B data flows. The B paths indicate the direction to the P through the A second network device. Paths through which the M third network devices send the M data flows; wherein the egress ports of the second network devices corresponding to the B paths are different from the egress ports of the second network devices corresponding to the M paths; the first network device Send the B paths to the A second network devices respectively.
  • the egress ports of the second network device corresponding to the B paths determined by the first network device are different from the egress ports of the second network device corresponding to the M paths, so as to avoid M data flows and B data flows.
  • the data transmission efficiency of the M data flows and the B data flows is improved when traffic conflicts occur when corresponding to the egress port of the same second network device.
  • the first network device obtaining the communication relationships of the M data streams includes: the first network device respectively receiving the communication relationships of the M data streams from the N second network devices. communication relationship.
  • the first network device can obtain the communication relationships of the M data streams based on respectively receiving the communication relationships of the M data streams from the N second network devices, so that the first network The communication relationship of the M data streams is obtained through interaction between the device and the N second network devices.
  • the communication relationships of the M data streams are preconfigured on the first network device to avoid an increase in overhead and delay caused by interactions between different network devices.
  • the M data streams correspond to one of multiple artificial intelligence (artificial intelligence, AI) collective communication tasks.
  • AI artificial intelligence
  • the M data flows correspond to a long-term steady-state traffic task, and the traffic size of the data flows in the long-term steady-state traffic task is greater than a preset threshold within a certain period of time.
  • the first network device is a controller or a network device among the P third network devices.
  • the first network device that executes the method to determine and send M paths can be a controller or one of the P third network devices to improve the flexibility of solution implementation.
  • the second aspect of the present application provides a method for determining a path.
  • the method is executed by a second network device, or the method is executed by some components (such as a processor, a chip or a chip system, etc.) in the second network device, or The method may also be implemented by a logic module or software that can realize all or part of the functions of the second network device.
  • the method is described by taking the example that the method is executed by a second network device.
  • the second network device may be a router, a switch, a virtual machine, etc.
  • the second network device sends the communication relationship of Q data streams to the first network device.
  • the communication relationship of each data stream in the Q data streams includes source address information and destination address information.
  • Q is greater than or An integer equal to 1; the second network device receives Q paths from the first network device, the Q The paths indicate the paths used by the second network device to transmit the Q data streams; the second network device transmits the Q data streams based on the Q paths.
  • the second network device receives an instruction from the first network device indicating that the second network device transmits the Q data streams.
  • Q paths are used, and the second network device transmits the Q data streams based on the Q paths.
  • the first network device serves as a device for determining paths, and the first network device can determine M paths corresponding to M data flows transmitted between N second network devices and P third network devices. Therefore, compared with the implementation in which N second network devices only use local data flows as the basis for path determination, which may easily lead to path conflicts, in the above method, the first network device can determine the path based on global information to avoid path conflicts. Conflicts improve the forwarding efficiency of data flows.
  • any one of the Q data streams can be a unidirectional data stream or a bidirectional data stream, which is not limited in this application.
  • the communication relationship of the bidirectional data flow can only include source address information and destination address information of a certain flow direction.
  • the communication relationship of the bidirectional data flow may also include source address information and destination address information corresponding to the two flow directions, which are not limited here.
  • the path information also indicates the egress ports of the Q data flows on the second network device.
  • the Q paths received by the second network device in addition to indicating the paths for transmitting the Q data flows through the second network device, the Q paths received by the second network device also indicate that the Q data flows are transmitted through the second network device. so that after receiving Q paths, the second network device can clearly send the egress ports of the Q data flows.
  • the Q data streams include a third data stream and a fourth data stream
  • the source address information of the third data stream and the source address information of the fourth data stream correspond to
  • the path information includes a third path and a fourth path
  • the third path corresponds to the third data flow
  • the fourth path corresponds to the fourth data flow
  • the third path corresponds to the fourth data flow.
  • the paths are different.
  • the Q paths received by the second network device include a third path corresponding to the third data flow and a fourth path corresponding to the fourth data flow, and the third path and the fourth path are different.
  • the source address information of the third data flow and the source address information of the fourth data flow correspond to the same second network device.
  • the third data stream and the fourth data stream are transmitted through different paths through the same second network device. Therefore, network congestion caused when the data flow from the same second network device is transmitted through the same path can be avoided, thereby improving the transmission efficiency of the third data flow and the fourth data flow.
  • the Q data streams correspond to one task among multiple AI collective communication tasks.
  • the Q data flows correspond to a long-term steady-state traffic task, and the traffic size of the data flows in the long-term steady-state traffic task is greater than a preset threshold within a certain period of time.
  • a third aspect of the present application provides a communication device that can implement the method in the above first aspect or any possible implementation of the first aspect.
  • the device includes corresponding units or modules for performing the above method.
  • the units or modules included in the device can be implemented by software and/or hardware.
  • the device can be a first network device, or the device can be a component (such as a processor, a chip or a chip system, etc.) in the first network device, or the device can also be a device that can implement all or part of the first network device.
  • Logic modules or software for device functionality are examples of the devices.
  • the device includes a transceiver unit and a processing unit; the transceiver unit is used to obtain first topology information.
  • the first topology information includes connection relationships between N second network devices and P third network devices.
  • Any second network The device is an upstream network device of any third network device, N is an integer greater than or equal to 2, and P is an integer greater than or equal to 1; the transceiver unit is also used to obtain the communication relationship of the M data streams.
  • the communication relationship of each data flow in the flow includes source address information and destination address information.
  • M is an integer greater than or equal to 2.
  • the M data flows pass through the N second network devices to the P third network devices. Transmission; the processing unit is configured to determine M paths based on the communication relationships of the M data streams and the first topology information.
  • the M paths respectively correspond to the M data streams, and the M paths indicate passing through the N second
  • the network device transmits the M data flow paths to the P third network devices; the transceiver unit is also used to send the M paths to the N
  • the M data streams are transmitted to K fourth network devices through the P third network devices, where K is an integer greater than or equal to 1;
  • the M data streams include The first data stream and the second data stream, the source address information of the first data stream and the source address information of the second data stream correspond to different second network devices, and the destination address information of the first data stream is the same as the second data stream.
  • the destination address information of the data flow corresponds to the same fourth network device.
  • the M paths include a first path and a second path. The first path corresponds to the first data flow, and the second path corresponds to the second data flow. , the first path and the second path correspond to different third network devices.
  • the M paths also indicate egress ports of the M data flows on the N second network devices.
  • the M data streams include a third data stream and a fourth data stream
  • the source address information of the third data stream and the source address information of the fourth data stream correspond to
  • the M paths include a third path and a fourth path, the third path corresponds to the third data flow, the fourth path corresponds to the fourth data flow, and the third path corresponds to the third data flow. The four paths are different.
  • the M data streams are transmitted to K fourth network devices through the P third network devices, and K is a positive integer; the processing unit is specifically configured to: according to the M The communication relationship of the M data streams and the first topology information determine a first mapping relationship.
  • the first mapping relationship is used to indicate the second network device corresponding to the source address information of each of the M data streams and the M data streams.
  • a mapping relationship between fourth network devices corresponding to the target address information of each data stream in the data stream; the M paths are determined according to the first mapping relationship.
  • the processing unit is specifically configured to: determine first sorting information according to the first mapping relationship, where the first sorting information is used to indicate the Kth fourth network device corresponding to Sorting of the number of the two network devices; sequentially traversing the egress ports of the N second network devices according to the first sorting information to obtain a second mapping relationship; wherein the second mapping relationship is used to indicate the N second Mapping relationship between the egress port of the network device and the K fourth network devices; determining the M paths based on the second mapping relationship.
  • the processing unit is specifically configured to: sequentially traverse the egress ports of the N second network devices according to the first sorting information to obtain a third mapping relationship; wherein, the The third mapping relationship is used to indicate an optional number of egress ports of the second network device corresponding to each fourth network device; the second mapping relationship is determined based on the third mapping relationship.
  • the transceiver unit is also configured to obtain second topology information, where the second topology information includes connection relationships between A second network devices and the P third network devices.
  • the second topology information includes connection relationships between A second network devices and the P third network devices.
  • at least one second network device among the A second network devices is the same as at least one second network device among the N second network devices, and A is an integer greater than or equal to 1;
  • the transceiver unit is also used to Obtain the communication relationship of B data streams.
  • the communication relationship of each data stream in the B data streams includes source address information and destination address information.
  • B is an integer greater than or equal to 1.
  • the B data streams pass through the A second network devices transmit to the P third network devices; the processing unit is also used to determine B paths based on the communication relationships of the B data flows and the topology information, and the B paths are respectively connected to the B data flows.
  • the B paths indicate paths for sending the M data streams to the P third network devices through the A second network devices; wherein, the egress ports of the second network devices corresponding to the B paths are different from the The M paths correspond to the outgoing ports of the second network device; the transceiver unit is also used to send the B paths to the A second network devices respectively.
  • the transceiver unit is specifically configured to receive the communication relationships of the M data streams from the N second network devices respectively.
  • the M data streams correspond to one of multiple artificial intelligence AI collective communication tasks.
  • the first network device is a controller or a network device among the P third network devices.
  • the component modules of the communication device can also be used to perform the steps performed in each possible implementation manner of the first aspect, and achieve corresponding technical effects.
  • the component modules of the communication device can also be used to perform the steps performed in each possible implementation manner of the first aspect, and achieve corresponding technical effects.
  • a fourth aspect of the present application provides a communication device that can implement the method in the above second aspect or any possible implementation manner of the second aspect.
  • the device includes corresponding units or modules for performing the above method.
  • the units or modules included in the device can be implemented by software and/or hardware.
  • the device can be a second network device, or the device can be a component (such as a processor, a chip or a chip system, etc.) in the second network device, or the device can also be a device that can implement all or part of the second network.
  • Logic modules or software for device functionality are examples of the devices.
  • the device includes a transceiver unit and a processing unit; the processing unit is used to determine the communication relationship of Q data streams.
  • the communication relationship of each data stream includes source address information and destination address information, Q is an integer greater than or equal to 1;
  • the transceiver unit is used to send the communication relationship of Q data streams to the first network device;
  • the transceiver unit is also used to Receive Q paths from the first network device, the Q paths indicate the paths used by the second network device to transmit the Q data streams;
  • the transceiver unit is also used to transmit the Q data streams based on the Q paths.
  • the path information also indicates the egress ports of the Q data flows on the second network device.
  • the Q data streams include a third data stream and a fourth data stream
  • the source address information of the third data stream and the source address information of the fourth data stream correspond to
  • the path information includes a third path and a fourth path
  • the third path corresponds to the third data flow
  • the fourth path corresponds to the fourth data flow
  • the third path corresponds to the fourth data flow.
  • the paths are different.
  • the Q data streams correspond to one of multiple artificial intelligence AI collective communication tasks.
  • the component modules of the communication device can also be used to perform the steps performed in each possible implementation manner of the second aspect, and achieve corresponding technical effects.
  • the component modules of the communication device can also be used to perform the steps performed in each possible implementation manner of the second aspect, and achieve corresponding technical effects.
  • a fifth aspect of this application provides a communication device.
  • the communication device includes at least one processor.
  • the at least one processor is coupled to memory. This memory is used to store programs or instructions.
  • the at least one processor is used to execute the program or instructions, so that the device implements the foregoing first aspect or the method described in any possible implementation manner of the first aspect, or so that the device implements the foregoing second aspect or the third aspect.
  • the sixth aspect of the present application provides a communication device, including at least one logic circuit and an input and output interface; the logic circuit is used to perform the method described in the aforementioned first aspect or any possible implementation of the first aspect, or , the logic circuit is used to perform the method described in the aforementioned second aspect or any possible implementation manner of the second aspect.
  • a seventh aspect of the present application provides a computer-readable storage medium for storing computer instructions; when the computer instructions are executed by a processor, the processor executes the above-mentioned first aspect or any possible implementation manner of the first aspect.
  • the method described above, or the processor performs the method described in the above second aspect or any possible implementation manner of the second aspect.
  • An eighth aspect of the present application provides a computer program product (or computer program).
  • the computer program product includes instructions.
  • the processor executes the first aspect or the first aspect.
  • the method of any possible implementation of the aspect, or the processor executes the above second aspect or the method of any possible implementation of the second aspect.
  • a ninth aspect of the present application provides a chip system.
  • the chip system includes at least one processor for supporting a communication device to implement the functions involved in the above-mentioned first aspect or any possible implementation of the first aspect, or, It is used to support the communication device to implement the functions involved in the above second aspect or any possible implementation manner of the second aspect.
  • the chip system may also include a memory for storing necessary program instructions and data of the communication device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the chip system further includes an interface circuit that provides program instructions and/or data to the at least one processor.
  • a tenth aspect of the present application provides a communication system, which includes the first network device in any of the above aspects.
  • the communication system includes one or more second network devices in any of the above aspects.
  • the communication system includes one or more third network devices in any of the above aspects.
  • the communication system includes one or more fourth network devices in any of the above aspects.
  • Figure 1a is a schematic diagram of the communication system provided by this application.
  • FIG 1b is another schematic diagram of the communication system provided by this application.
  • Figure 1c is another schematic diagram of the communication system provided by this application.
  • FIG. 1d is another schematic diagram of the communication system provided by this application.
  • Figure 1e is another schematic diagram of the communication system provided by this application.
  • Figure 1f is another schematic diagram of the communication system provided by this application.
  • FIG. 2a is a schematic diagram of the communication scenario involved in this application.
  • Figure 2b is another schematic diagram of the communication scenario involved in this application.
  • Figure 2c is another schematic diagram of the communication scenario involved in this application.
  • Figure 2d is another schematic diagram of the communication scenario involved in this application.
  • Figure 2e is another schematic diagram of the communication scenario involved in this application.
  • Figure 2f is another schematic diagram of the communication scenario involved in this application.
  • Figure 2g is another schematic diagram of the communication scenario involved in this application.
  • FIG. 3 is a schematic diagram of the path determination method provided by this application.
  • Figure 4a is another schematic diagram of the path determination method provided by this application.
  • Figure 4b is another schematic diagram of the path determination method provided by this application.
  • FIG. 5a is a schematic diagram of the AI collective communication process provided by this application.
  • Figure 5b is another schematic diagram of the AI collective communication process provided by this application.
  • Figure 5c is another schematic diagram of the AI collective communication process provided by this application.
  • FIG. 6 is another schematic diagram of the AI collective communication process provided by this application.
  • FIG. 7 is another schematic diagram of the AI collective communication process provided by this application.
  • FIG. 8 is a schematic diagram of the communication device provided by this application.
  • Figure 9 is another schematic diagram of the communication device provided by this application.
  • system and “network” in the embodiments of this application may be used interchangeably.
  • “At least one” means one or more, and “plurality” means two or more.
  • “And/or” describes the relationship between associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist simultaneously, and B alone exists, where A, B can be singular or plural.
  • the character “/” generally indicates that the related objects are in an “or” relationship.
  • “At least one of the following” or similar expressions thereof refers to any combination of these items, including any combination of a single item (items) or a plurality of items (items).
  • At least one of A, B, and C includes A, B, C, AB, AC, BC, or ABC.
  • the ordinal numbers such as “first” and “second” mentioned in the embodiments of this application are used to distinguish multiple objects and are not used to limit the order, timing, priority or importance of multiple objects. degree.
  • the forwarding device may include a router, a switch, a virtual machine, etc.
  • the system includes multiple customer edge (CE) devices, such as user edge device 101 and user edge device 102, as well as other user edge devices that may exist; the system also includes multiple network devices, For example, network device 103, network device 104, and network device 105, as well as other network devices that may exist.
  • CE customer edge
  • the user edge device can serve as an entry device for the data flow transmitted by the communication system.
  • the user edge device can connect to the sending end of the data flow, that is, the device indicated by the source address information of the data flow.
  • the user edge device can serve as an egress device for the data flow transmitted by the communication system.
  • the user edge device can connect to the receiving end of the data flow, that is, the device indicated by the destination address information of the data flow.
  • the sending end of the data flow or the receiving end of the data flow can be a terminal device, a server, a virtual machine and other devices with data sending and receiving requirements.
  • network device 103, network device 104 and network device 105 are routers, switches, virtual machines and other devices. These network devices can be used to transmit data flows from one user edge device to another user edge device.
  • the user edge device may be called an edge switching device.
  • FIG. 1b An implementation example is shown in Figure 1b.
  • the arrow in the figure points to the direction of the data flow.
  • the data flow sequentially flows from the source device indicated by the source address information of the data flow to N (N is an integer greater than or equal to 2) second devices.
  • N N is an integer greater than or equal to 2
  • P P is an integer greater than or equal to 1
  • P is greater than or equal to 2 as an example
  • K K is an integer greater than or equal to 2 fourth network devices
  • the data stream is received by the destination device indicated by the destination address information.
  • the N second network devices are upstream devices of the P third network devices.
  • the P third network devices can also be called downstream devices of the N second network devices.
  • the P third network devices are upstream devices of the K fourth network devices.
  • the K fourth network devices may also be called downstream devices of the P third network devices.
  • scenario example shown in Figure 1b may also include a controller connected to each second network device.
  • the path determination method provided in this application may be executed by the controller or any third network device. implement.
  • the source device of a certain data flow and the destination device of the data flow may both be connected to the same network device, or the source device of a certain data flow and the destination device of another data flow may be connected to the same network device. It is possible that both are connected to the same network device, or the source device of a certain data flow and the destination device of the data flow may not be connected to the same network device, or the source device of a certain data flow is not connected to the same network device of another data flow.
  • the destination device may not be connected to the same network device.
  • the N second network devices may be completely different from the K fourth network devices, or the N second network devices may be partially or identically identical to the K fourth network devices.
  • N second network devices are completely different from K fourth network devices, that is, any fourth network device is different from any second network device.
  • FIG. 1d Another implementation example is shown in Figure 1d.
  • the value of N is equal to the value of K, and the N second network devices and the K fourth network devices are exactly the same.
  • the multi-layer network composed of N second network devices, P third network devices and K fourth network devices may be a three-layer network including an access layer, an aggregation layer and a core layer.
  • the layer network may also be a two-layer network including spine nodes and leaf nodes, or other implementations, which are not limited in this application.
  • the above multi-layer network is taken as an example of a two-layer network including Spine nodes and Leaf nodes.
  • N second network devices and K fourth network devices are completely identical as an example for explanation.
  • Leaf nodes can be connected to one or more servers (Servers), and the data flows interacted between different servers can be forwarded through Leaf nodes.
  • Servers servers
  • the data flow between the two Servers can be interacted through the forwarding of the same Leaf node; when the two Servers are connected to different Leaf nodes, the data flow between the two Servers The data flow between them needs to be forwarded through the respective connected Leaf nodes and Spine nodes to achieve interaction.
  • the communication process between the two-layer networks will be mainly introduced for description.
  • the following embodiments can also be applied to the communication process between any two networks in other networks.
  • the following embodiments may be applied to the network between the access layer and the aggregation layer, or may be applied to the network between the aggregation layer and the core layer.
  • the following embodiments may be applied to the network, or may be applied to the network between the access layer and the core layer.
  • the above-mentioned multi-layer network is still taken as a two-layer network including Spine nodes and Leaf nodes as an example.
  • the implementation process of the following embodiments mainly involves network devices (such as Spine nodes and Leaf nodes) for Improvement of at least one of a receiving module for receiving data, a processing module for processing data locally, and a sending module for sending data.
  • ECMP Equal-Cost Multi-Path
  • Devices at multi-path bifurcations will select paths based on specific policies and send packets to different paths to achieve traffic load sharing.
  • the ECMP mechanism uses the characteristic fields of the data packet (such as source media access control (MAC) address, destination MAC address, IP quintuple information, etc.) as hash factors and generates them through the hash (HASH) algorithm.
  • HASH-KEY value selects a member link in the load sharing link to forward the data packet based on the HASH-KEY value.
  • the communication device as a forwarding device forwards a data flow, it selects a path among the multiple paths based on local policies, and forwards the data flow based on the locally selected path.
  • the following will exemplarily describe the implementation process of determining a path based on local selection, taking the architecture shown in Figure 1b as an example and combining the implementation examples of Figures 2a to 2g.
  • FIG. 2a when server A (Server-A) needs to communicate with server B (Server-B) and server C (Server-C), it needs to go through the switching device A (Switch-A) and the switching device.
  • the data streams involved in Figure 2a include four, namely data stream 1 for transmitting messages 1, 5, and 9 in Figure 2a, data stream 2 for transmitting messages 5, 6, and 10, and data stream 3, 7, and 11 for transmitting messages Data flow 3, transmission message 4, 8, 12 data flow 4.
  • Switch-A is an implementation example of the second network device
  • Switch-1, Switch-2, Switch-3 and Switch-4 are implementation examples of the third network device
  • Switch-B is Implementation example of fourth network device.
  • Server-A communicates with Server-B and Server-C through a multi-layer network
  • there are 4 identical paths between Switch-A and Switch-B in the network which are 4 equal-cost paths.
  • ECMP is calculated through hash. The hash-key value selects and forwards the data packet among the member links of equal cost, so that the four data flows are transmitted separately on the four ECMP.
  • the implementation process of determining the path based on local selection can be implemented based on load sharing technology.
  • the most common ones based on granularity are Packet-based Load Balancing, Flow-based Load Balancing, and Flowlet-based Load Balancing. .
  • Implementation method one the implementation process of packet-by-packet load sharing.
  • the switching device forwards the Nth packet to path i, the N+1th packet to path i+1, and so on.
  • the port is polled.
  • the mathematical description of its behavior is that the path selection is based on the number of the packet modulo the number of optional equivalent paths. The purpose is to evenly distribute all packets on the equal-cost member links of the next hop.
  • Server-A sends four messages to Server-B in chronological order, namely messages 1-4 in the figure. These four messages also arrive in the order in which they were sent.
  • Switch-A forwards packets 1 and 3 to the path connected to Switch-1 according to the packet-by-packet load balancing policy, and Switch-A forwards packets 2 and 4 to the path connected to Switch-2 superior.
  • Implementation method two is the implementation process of flow-by-flow load sharing.
  • the ECMP mechanism is relied upon to realize the selection of multiple next-hop mechanisms.
  • This mechanism uses the characteristic fields of the data packet (such as source MAC address, destination MAC address, IP quintuple information, etc.) as hash factors, and uses HASH The algorithm generates a HASH-KEY value, and then selects a member link in the load-sharing link to forward the data packet based on the HASH-KEY value. For data packets with different characteristic fields, since their HASH-KEY values may be different, different member links may be selected for forwarding; for data packets with the same characteristic field, since their HASH-KEY values are the same, different member links may be selected for forwarding. Select the same member link for forwarding. In this way, load-sharing forwarding of different data flows on different member links is realized, and the timing of arrival of each data packet in the same data flow at the receiving end is ensured.
  • Server-A transmits data flow 1 (recorded as Flow 1) containing messages 1, 2, 5, and 6 to Server-D.
  • Server-B transmits data flow 2 (recorded as Flow 2) containing messages 3 and 4 to Server-C.
  • Switch-A performs hash calculation based on the five-tuple information of the incoming port packets. Messages 1, 2, 5, and 6 obtain the same Key value and take the same path on the next hop of Switch-A. In the same way, packets 3 and 4 take another downstream path of Switch-A.
  • the disadvantage of flow-by-flow load sharing is that it does not consider the utilization of each member link in the load sharing link, which results in unbalanced load sharing among member links, especially when large data flows occur. It will aggravate the congestion of the selected member links and even cause packet loss.
  • conflict One type of conflict can be called a local conflict.
  • the reason is that the hash-key results calculated by the hash algorithm for different input feature fields are the same, resulting in different flows being forwarded to the same path, resulting in conflicts.
  • FIG. 2e An exemplary implementation process of local conflict is shown in Figure 2e.
  • the IP addresses of Server-A, Server-B, Server-C, and Server-D are: 0.0.0.1, 0.0.0.2, 0.0.0.3, 0.0.0.4 respectively.
  • Server-A sends packets 1, 2 to Server-C.
  • Server-B sends packets 3, 4 to Server-D.
  • the HASH function on Switch-A is "source ip + destination ip".
  • the path calculation method performed locally by Switch-A is as follows:
  • the message sent by Server-A to Server-C the HASH function calculation result is 4, the number of paths is 2, the HASH function calculation result modulo the number of paths is 0, that is, the 0th path is selected (the path number starts from 0 ).
  • the HASH function calculation result is 6 and the number of paths is 2.
  • the HASH function calculation result modulo the number of paths is 0, that is, the 0th path is selected (the path number starts from 0 ).
  • Another type of conflict can be called a global conflict.
  • the reason is that the current load sharing technology uses a distributed decision-making mechanism and lacks a global perspective.
  • the switching device cannot predict and control the flow of upstream traffic, resulting in traffic conflicts.
  • Flow-1 which contains packets 1 and 2
  • Flow-2 which contains packets 3 and 4
  • Flow-1 passes through Switch-A
  • the member link to Switch-1 is selected to forward the traffic.
  • Flow-2 passes through Switch-B
  • the member link to Switch-1 is also selected to forward flow-2.
  • the destination switching devices of Flow-1 and Flow-2 are both local to Switch-C, and there is only one link between Switch-1 and Switch-C. Therefore, Flow-1 and Flow-2 will conflict on the downstream link of Switch-1, and in the current distributed decision-making load sharing technology, the global conflict problem cannot be foreseen and controlled to avoid, and such conflicts will be serious. Affect network performance.
  • Implementation method three is the implementation process of sub-flow load sharing.
  • the network device when forwarding a data packet, the network device will determine the time interval between the data packet to be forwarded and the previous data packet in the data flow to which it belongs. If it is greater than the maximum link transmission delay of each member link in the load sharing link ( flowlet_gap_time), the data packet to be forwarded is considered to be the first packet of a new sub-flow (Flowlet); if it is less than the maximum link transmission delay of each member link in the load-sharing link, it is regarded as the same Flowlet as the previous data packet. . Based on the Flowlet, the device selects the lighter-loaded member link in the current load sharing link for forwarding. Data packets in the same Flowlet select the same member link for forwarding.
  • FIG. 2g An implementation example is shown in Figure 2g.
  • Server-A and Server-B respectively send Flow-1 and Flow-2 to Server-C through the network.
  • Packets 1, 2, and 3 in Flow-1 are forwarded by the link between Switch-1 and Switch-A and are recorded as data flow 1-1 (Flowlet1-1).
  • Packets 4 and 5 in Flow-2 are If the flow is not the same, it is forwarded by another equal-cost member link, that is, the link between Switch-2 and Switch-A, which is recorded as data flow 2-1 (Flowlet2-1). Because the distance between messages 6 and 7 in Flow-1 is greater than flowlet_gap, message 6 is detected as the first packet of the new flowlet. Then the new flowlet composed of messages 6 and 7 is different from message 1.
  • Flowlet1-2 data flow 1-2
  • Flowlet1-1 in Flow-1 that is, packets 1, 2, and 3, are forwarded through the link between Switch-A and Switch-1
  • Flowlet1-2 and Flow-1 in Flow-1 2 means that packets 4, 5, 6, and 7 are forwarded through the link between Switch-A and Switch-2.
  • the host side cannot actively construct a flowlet; on the other hand, if the host side is forced to use a send-stop-send mechanism to forcibly construct a flowlet, it will affect throughput when the network is not congested and the link utilization will be low; and when the network is congested , there is no guarantee that the interval of the constructed flowlet is greater than the flowlet gap set by the switch, which will cause the receiving end to be out of order and trigger retransmission.
  • the Flowlet-based multi-path load sharing technology has the disadvantages of both packet-based and flow-based load sharing technologies, namely the risk of load imbalance and packet disorder.
  • flowlet-based load sharing technology needs to be considered, balancing maximizing network throughput and minimizing packet disorder, and accurately setting the flowlet_gap_time value.
  • flowlet_gap_time is not a static parameter, and the flowlet_gap_time value needs to be dynamically adjusted through feedback from network performance indicators.
  • this mechanism of dynamic flowlet_gap_time value will affect the throughput rate of the network and fail to achieve the global optimal value.
  • each communication device in a multi-layer network, there are often multiple communication devices as forwarding devices, and each communication device as a forwarding device needs to perform a path selection process among multiple paths based on its own local policy before it can determine forwarding path, resulting in a less efficient path determination method.
  • the path determination method is prone to causing other problems. For example, in the first implementation method, it is easy to cause local traffic conflicts on the switching device. In the second implementation method, it is easy to cause global conflicts and local conflicts. In the third implementation method, it is easy to affect the throughput rate of the network.
  • this application provides a path determination method and related equipment to improve the forwarding efficiency of data flows. This will be described in further detail below with reference to the accompanying drawings.
  • Figure 3 is a schematic diagram of an implementation of a path determination method provided in this application.
  • the method includes the following steps. It should be noted that this method can be executed by the first network device, or this method can also be executed by some components (such as a processor, a chip or a chip system, etc.) in the first network device, or this method can also be executed by a capable Logic modules or software implementations that implement all or part of the functions of the first network device. In the following embodiments, the method is described by taking the example that the method is executed by a first network device.
  • the first network device may be a router, a switch, a virtual machine, etc.
  • the first network device obtains the first topology information.
  • the first topology information obtained by the first network device in step S301 includes connection relationships between N second network devices and P third network devices, and any second network device is any third network device.
  • the upstream network device of the network device N is an integer greater than or equal to 2
  • P is an integer greater than or equal to 1.
  • the first network device is a controller or one of the P third network devices.
  • the first network device that executes the method to determine and send the M paths may be a controller, or may be one of the P third network devices to improve the flexibility of solution implementation.
  • the first network device may be the controller in Figure 1b, or the first network device may be any third network device in Figure 1b.
  • the data stream sent by the server may need to pass through the server connected to the server. Forwarding of the Leaf node connected to the Leaf node (for ease of reference later, marked as the source Leaf node), Spine node, and the other Server (for ease of reference later, marked as the destination Leaf node).
  • the source Leaf node may be the N second network devices indicated by the first topology information in step S301
  • the spine node may be the P third network devices indicated by the first topology information in step S301
  • the destination Leaf node may be The K fourth network devices mentioned later.
  • the first network device may be any Spine node connected to each Leaf node in Figure 1e, or the first network device may be a controller (not shown in the figure) connected to each Leaf node in Figure 1e , so that the first network device sends M paths to each Leaf node in the subsequent step S304.
  • a certain Leaf node is both the source Leaf node of one direction of the data flow and the destination Leaf node of the other direction of the data flow.
  • a certain Leaf node It is the source Leaf node of the data flow, or the destination Leaf node of the data flow.
  • the same Leaf node can be used to transmit different data streams. In a certain data stream, the Leaf node may be the source Leaf node, and in another data stream, the Leaf node may be the source Leaf node or the destination Leaf. node.
  • the first network device obtains the communication relationships of M data streams.
  • the communication relationship of each of the M data streams obtained by the first network device in step S302 includes source address information and destination address information.
  • M is an integer greater than or equal to 2.
  • the M data streams The flows are respectively transmitted to the P third network devices through the N second network devices.
  • step S301 does not limit the implementation process of step S301 and step S302. That is, the first network device may first execute step S301 and then execute step S302. The first network device may also execute step S302 first and then execute step S302. S301.
  • any one of the M data streams may be a unidirectional data stream or a bidirectional data stream, which is not limited in this application.
  • the communication relationship of the bidirectional data flow may only include source address information and destination address information of a certain flow direction.
  • the communication relationship of the bidirectional data flow may also include source address information and destination address information corresponding to the two flow directions, which are not limited here.
  • the first network device obtaining the communication relationships of the M data streams includes: the first network device respectively receiving the communication relationships of the M data streams from the N second network devices. Specifically, the first network device may obtain the communication relationships of the M data streams based on respectively receiving the communication relationships of the M data streams from the N second network devices, so that the first network device and The communication relationships of the M data streams are obtained through interaction between the N second network devices.
  • step S302 the communication relationships of the M data streams are pre-configured on the first network device to avoid an increase in overhead and delay caused by interactions between different network devices.
  • the first topology information obtained by the first network device in step S301 may be determined by the respective topology information sent by the N second network devices, or the first topology information may be pre-configured. on the first network device.
  • the communication relationship of the first topology information in step S301 and the M data streams in step S302 is determined based on the respective communication relationships and topology information sent by the first network device through the N second network devices.
  • each second network device needs to establish a flow table for local large flows, including the following process.
  • the second network device adds the five-tuple and characteristic information of the flow (for example: first packet time, number of bytes) to the original flow table.
  • the second network device sets a threshold value for large and small flows based on the number of bytes in the collection period, and filters out small flows while retaining large flows.
  • the second network device converts the original flow table into a tuple flow table, retaining the source IP and destination IP.
  • the purpose is to retain the key fields required by the global optimal path allocation algorithm and save storage space.
  • the second network device aggregates local topology knowledge from the filtered tuple flow table to form a local traffic information table.
  • Implementation examples of the local traffic information table are shown in Tables 1 to 3.
  • the second network device can delete the aged table entries.
  • all communication nodes of a task can also be provided in the form of files, such as using a host file (hostfile) to explicitly provide all communication devices of the task. With the explicit communication algorithm name, the communication relationship of the current task can be obtained.
  • hostfile host file
  • the second network device aggregates the built flow table to the first network device, so that the first network device obtains the communication relationships of the M data flows in step S302.
  • Table 4 an implementation example of the table obtained by aggregating the above Tables 1 to 3 is shown in Table 4 below.
  • the above implementation process can be expressed as the implementation process in Figure 4a, including "step 401. Establish a flow table for local large flows and aggregate communication relationships.” It should be understood that the "global computing node” in the implementation process shown in Figure 4a is the first network device, and the “edge switching node” is the N second network devices.
  • the first network device periodically acquires local traffic communication relationships and topology information. And aggregated to the first network device, the first network device integrates the local traffic information tables of the aggregated N second network devices, and divides communication clusters through communication relationships to determine the M data flows.
  • the first network device can determine whether the M data streams currently to be calculated belong to a single task scenario or a multi-task scenario. If it is a scenario with multiple tasks, resource allocation needs to be performed for each individual task.
  • the implementation process can be expressed as "Step 402. Obtain local topology information and aggregate the topology information of the complete task" in the implementation process in Figure 4a. .
  • the first network device can also perform "Step 403 to determine whether all nodes are covered” in Figure 4a, and, if it is determined that all nodes are covered, the first network device performs "Step 405. Guide according to the aggregation flow table All tasks and all step communication relationships"; if it is determined that all nodes are not covered, the first network device executes the implementation process in "Step 404. Allocate resources by tasks according to communication relationships" in Figure 4a.
  • step 404 in Figure 4a The implementation process corresponding to step 404 in Figure 4a will be further described below, including the following processes.
  • the number of Tor occurrences is traversed and distributed in descending order.
  • the first network device converts the communication relationships of the same task into a communication relationship matrix based on the topology information and the first packet time. Furthermore, the first network device converts the communication relationship matrix into a mapping matrix from Dtor to Stor, and allocates resources starting from the ToR that appears the most.
  • the available communication relationship matrix is shown in Table 5
  • the Dtor-SID mapping matrix is shown in Table 6 below
  • the Dtor-Stor mapping matrix is shown in Table 7 below.
  • the M data streams obtained by the first network device in step S302 correspond to one of multiple artificial intelligence (artificial intelligence, AI) collective communication tasks.
  • the M data flows correspond to a long-term steady-state traffic task, and the traffic size of the data flows in the long-term steady-state traffic task is greater than a preset threshold within a certain period of time.
  • the solution provided by this application can be applied to large flow scenarios where communication relationships change periodically or where convergence time requirements are strict. It is mainly subdivided into long-term steady-state flow scenarios and AI collective communication scenarios.
  • the characteristics of the communication relationship in the long-term steady-state traffic scenario are: the communication relationship generally does not change, and can also be regarded as having an infinite change period.
  • the flow table is updated and aged as the task starts and ends. The process only requires that after the new task starts, the edge switching device obtains the local traffic communication relationship and aggregates it into a traffic information table on the first network device, which can be used as the input of the global optimal path allocation algorithm for calculation. But if the same destination IP appears, but the source IP is different. This indicates that there is incast traffic in this scenario.
  • the Incast scenario of long steady-state traffic is not a problem to be solved by this invention. If such a communication relationship is encountered, it does not need to be calculated by the global optimal path allocation algorithm.
  • the communication relationship characteristics of the AI collective communication scenario are: inter-device communication in the AI collective communication scenario usually divides the entire process into multiple sub-phases (phases) according to the algorithm. In each phase, the communication relationships of the devices change, and at the beginning of each phase, all devices in the communication cluster need to be synchronized. This imposes high requirements on the convergence speed of the algorithm.
  • the following is an illustrative description of the three main types of algorithms used in AI collective communication scenarios.
  • the three types of algorithms include Halving-Doubling (H-D) algorithm, Ring algorithm, and Tree algorithm.
  • the HD algorithm is divided into two stages: the first stage performs a reduce-scatter operation for Halving, as shown in Figure 5a.
  • a total of (log 2 N) sub-phases are executed, where N is the number of communication nodes.
  • N is the number of communication nodes.
  • the characteristics of the communication relationship in the AI collective communication scenario are: the inter-device communication in the AI collective communication scenario usually divides the entire process into multiple sub-phases (phases) according to the algorithm. In different phases, the communication relationships of the devices change, and the communication step size is, where n is the current phase and N is the number of communication nodes. And at the beginning of each phase, all devices in the communication cluster need to be synchronized. This imposes high requirements on the convergence speed of the algorithm.
  • the second stage performs a global collection (all-gather) operation for Doubling, as shown in Figure 5b. Same as Halving, a total of log 2 N phases are executed, so a total of 2*log 2 N phases need to be executed in the entire HD process. Different from Halving, every time a phase is advanced, the amount of data interacted between nodes doubles.
  • N is the number of communication nodes.
  • the aforementioned Table 4 can be expressed as the communication relationship in Figure 5c. It can be seen that all phase communication relationships of the all-reduce process supported by the H-D algorithm are recovered from the communication relationship of a task. For all communication devices, all local streams of the same server are sorted, and the positions with the same number after sorting are the same phase. The advantage of this sorting is that it avoids clock synchronization of the entire network.
  • the Ring algorithm is different from the H-D algorithm mainly in that the communication relationship of the entire communication process does not change.
  • the Ring algorithm is divided into two parts like the HD algorithm.
  • the first step is scatter-reduce
  • the second step is all-gather.
  • communication nodes will exchange data so that each node can obtain a piece of data as the final result.
  • the nodes will exchange these final blocks so that all nodes have the completed data.
  • N communication nodes Each node will receive and send data N-1 times respectively.
  • N communication nodes perform a total of 2 (N-1) phases, but in all phases, the communication relationships of the nodes do not change. Therefore, the communication relationship processing method of this type of algorithm is the same as that of long-term steady-state traffic. It only needs to calculate the optimal path of one phase to obtain the global optimal path calculation result of all phases.
  • the difference between the Tree algorithm and the Ring algorithm is that the communication relationship of the Tree algorithm changes in each phase.
  • the difference from the H-D algorithm is that in the same phase of the Tree algorithm, the communication objects that send and receive data are different.
  • the communication relationship characteristics of the entire all-reduce stage For example, as shown in Figure 7, it is the communication relationship characteristics of the entire all-reduce stage.
  • the Tree algorithm two communication relationship trees need to be operated. The characteristics of the two trees are that the leaf nodes of the first tree are the ridge nodes of the second tree, and the ridge nodes of the first tree are the leaf nodes of the second tree. In addition, leaf nodes under the same spine node cannot send or receive data to the spine node at the same time.
  • the Tree algorithm is also divided into two steps: the first step is reduction, and the second step is broadcast.
  • the reduce phase at the i-th time, the devices whose communication relationship is connected by the dotted-line link in the figure can communicate. At the i+1-th time, the devices whose communication relationship is connected by the solid-line link in the figure can communicate.
  • the communication relationship has the same characteristics as the communication relationship in the reduce phase.
  • the three algorithms change the communication relationship of nodes, including the communication relationship characteristics of long-term and stable traffic.
  • the communication characteristics of long-term and stable traffic are similar to the Ring algorithm and will not change during the entire communication process between devices.
  • the node communication relationship of the H-D algorithm changes periodically, and the Halving process and the Doubling process are reciprocal.
  • the communication relationship is that the sending and receiving objects are the same.
  • the communication relationship between nodes of the Tree algorithm is similar to the H-D algorithm and will change periodically.
  • the same node under the Tree algorithm sends and receives data to different objects.
  • the solution provided by this application can be applied to scenarios where the overall network bandwidth is sufficient.
  • the overall bandwidth of the network is sufficient, assuming that there are n flows with a bandwidth of d in the network, which are distributed among m equivalent links with a capacity of c. If nd > mc, it means that the overall bandwidth of the network is insufficient.
  • the load balancing mechanism alone cannot solve the congestion problem, and collaborative scheduling algorithms or congestion control algorithms are required to jointly solve the network congestion problem.
  • the present invention is applicable to the scenario where nd ⁇ mc. For example: the AI collective communication scenarios mentioned above are dominated by large flows, the communication relationship changes periodically, and the algorithm convergence time has strict requirements; the long-term steady-state traffic scenario where the communication relationship is relatively fixed, etc.
  • the first network device determines M paths based on the communication relationships of the M data flows and the first topology information.
  • step S303 the first network device obtains the first topology information based on the communication relationships of the M data flows in step S303.
  • the M paths determined by the communication relationship and the first topology information respectively correspond to M data flows, and the M paths indicate the paths for transmitting the M data flows through the N second network devices to the P third network devices. .
  • the M data streams are transmitted to K fourth network devices through the P third network devices, where K is a positive integer; the first network device performs the transmission according to the M data in step S403.
  • the process of determining the M paths based on the communication relationships of the flows and the first topology information specifically includes: the first network device determines a first mapping relationship based on the communication relationships of the M data streams and the first topology information.
  • the first mapping relationship Used to indicate the mapping relationship between the second network device corresponding to the source address information of each of the M data flows and the fourth network device corresponding to the destination address information of each of the M data flows; The first network device determines the M paths according to the first mapping relationship.
  • the topology information also includes connection relationships between P third network devices and K fourth network devices.
  • any fourth network device is different from any second network device.
  • At least one network device among the N second network devices and at least one network device among the K fourth network devices are the same network device.
  • N and K are equal, and the N second network devices and the K fourth network devices are the same network devices.
  • an implementation method for the first network device to determine M paths is provided, so that the first network device determines the M paths based on the mapping relationship between the N second network devices and the K fourth network devices. .
  • the first network device determining the M paths according to the first mapping relationship includes: the first network device determining first sorting information according to the first mapping relationship, and the first sorting information is In order to indicate the sorting of the number of second network devices corresponding to the K fourth network devices; the first network device sequentially traverses the egress ports of the N second network devices according to the first sorting information to obtain the second mapping relationship; wherein, the second mapping relationship is used to indicate the mapping relationship between the egress ports of the N second network devices and the K fourth network devices; the first network device determines the M based on the second mapping relationship path.
  • an implementation manner for the first network device to determine M paths is provided, so that the first network device determines the M paths based on the first mapping relationship, the first sorting information, and the second mapping relationship determined in sequence.
  • the first network device sequentially traverses the egress ports of the N second network devices according to the first sorting information, and obtaining the second mapping relationship includes: the first network device according to the first sorting information.
  • a sorting information sequentially traverses the egress ports of the N second network devices to obtain a third mapping relationship; wherein the third mapping relationship is used to indicate the egress port of the second network device corresponding to each fourth network device.
  • An optional number of ports; the first network device determines the second mapping relationship based on the third mapping relationship.
  • the third mapping relationship is used to indicate the optional number of egress ports of the second network device corresponding to each fourth network device, wherein the larger the value of the optional number, the more the corresponding fourth network device The smaller the uncertainty of the optional path, and conversely, the smaller the value of the optional number, the greater the uncertainty of the corresponding optional path of the fourth network device.
  • the second mapping relationship determined based on the third mapping relationship can be traversed first for the egress port of the second network device corresponding to the fourth network device with less uncertainty in the optional path, so as to improve the accuracy of the solution. property, to avoid subsequent conflicts among the M paths determined based on the second mapping relationship.
  • the first network device sends the M paths to the N second network devices respectively.
  • the first network device after the first network device determines M paths in step S303, the first network device sends the M paths to the N second network devices respectively in step S304.
  • the M data streams are transmitted to K fourth network devices through the P third network devices, where K is an integer greater than or equal to 1; the first network device obtains in step S302
  • the M data streams corresponding to the communication relationship include a first data stream and a second data stream.
  • the source address information of the first data stream and the source address information of the second data stream correspond to different second network devices.
  • the destination address information of the data flow and the destination address information of the second data flow correspond to the same fourth network device.
  • the M paths determined by the first network device in step S303 include a first path corresponding to the first data flow, and a second path corresponding to the second data flow. A path and the second path correspond to different third network devices.
  • the M paths determined by the first network device include a first path corresponding to the first data flow and a second path corresponding to the second data flow, and the first path and the second path correspond to different A third network device, wherein the source address information of the first data flow and the source address information of the second data flow correspond to different second network devices, and the destination address information of the first data flow and the second data flow
  • the destination address information corresponds to the same fourth network device.
  • any fourth network device is a downstream network device of any third network device, after the first data flow and the second data flow are respectively sent to different third network devices through different second network devices, the different The third network device respectively sends the first data flow and the second data flow to the same fourth network device. Therefore, it is possible to avoid network congestion caused by data flows from different second network devices being transmitted through the same third network device and then being transmitted to the same fourth network device through the same third network device, thereby improving the first The transmission efficiency of the data stream and the second data stream.
  • the M paths determined by the first network device in step S303 also indicate the egress ports of the M data flows on the N second network devices. Specifically, in addition to indicating the paths for transmitting the M data flows through the N second network devices to the P third network devices, the M paths determined by the first network device also indicate the M data streams. The egress ports of the data flows on the N second network devices are so that the N second network devices can clearly send the egress ports of the M data flows after receiving M paths.
  • the implementation method shown in Figure 3 further includes: the first network device obtains second topology information, where the second topology information includes A second network devices and the P third network devices.
  • the connection relationship between, at least one second network device among the A second network devices is the same as at least one second network device among the N second network devices, and the A is an integer greater than or equal to 1;
  • the A network device obtains the communication relationship of B data streams.
  • the communication relationship of each data stream in the B data streams includes source address information and destination address information.
  • B is an integer greater than or equal to 1.
  • the B data streams Transmit to the P third network devices respectively through the A second network devices; after the first network device determines M paths based on the communication relationships of the M data flows and the topology information, the method also includes: The first network device determines B paths according to the communication relationship of the B data flows and the topology information. The B paths respectively correspond to the B data flows. The B paths indicate the direction to the P through the A second network device. Paths for a third network device to send the M data streams; among which, the B The egress ports of the second network device corresponding to the M paths are different from the egress ports of the second network device corresponding to the M paths; the first network device sends the B paths to the A second network devices respectively.
  • the egress ports of the second network device corresponding to the B paths determined by the first network device are different from the egress ports of the second network device corresponding to the M paths, so as to avoid that the M data flows and the B data flows correspond to Traffic conflicts generated at the outgoing port of the same second network device improve the data transmission efficiency of M data streams and B data streams.
  • the M data streams corresponding to the communication relationship obtained by the first network device in step S302 include a third data stream and a fourth data stream, and the source address information of the third data stream and the third data stream are The source address information of the four data streams corresponds to the same second network device.
  • the M paths determined by the first network device in step S303 include a third path and a fourth path, the third path corresponds to the third data flow, the fourth path corresponds to the fourth data flow, and the third path corresponds to the fourth data flow. The third path is different from the fourth path.
  • the M paths determined by the first network device include a third path corresponding to the third data flow and a fourth path corresponding to the fourth data flow, and the third path and the fourth path are different.
  • the source address information of the third data flow and the source address information of the fourth data flow correspond to the same second network device.
  • the third data stream and the fourth data stream are transmitted through different paths through the same second network device. Therefore, network congestion caused when the data flow from the same second network device is transmitted through the same path can be avoided, thereby improving the transmission efficiency of the third data flow and the fourth data flow.
  • the first network device acquires the first topology information including the connection relationship between N second network devices and P third network devices in step S301, and the first network device obtains the first topology information in step S302.
  • the first network device determines based on the communication relationships of the M data streams and the first topology information in step S303 and sends M data to the N second network devices in step S304. path.
  • N second network devices may respectively send the M data streams to P third network devices based on the M paths.
  • the first network device serves as a device for determining paths, and the first network device can determine M paths corresponding to M data flows transmitted between N second network devices and P third network devices.
  • the first network device can determine the path based on global information to avoid path conflicts. Conflicts improve the forwarding efficiency of data flow.
  • the network device corresponding to the source address information i.e., the second network device
  • the network device corresponding to the destination address information i.e., the fourth network device
  • the destination rack Taking a top-of-rack (destination top of rack, Dtor) as an example, the implementation process of the above step S303 is illustratively explained.
  • the M paths determined in the above step S303 can be used to solve the optimal path allocation problem, where the optimal path allocation problem is a mathematically accurate coverage problem and is an NP-complete problem. It is described as that the full set
  • the problem is that S is required to be an exact coverage of X, that is, the Stor upstream ports corresponding to all Dtors cannot be allocated repeatedly. This achieves the purpose of assigning an optimal path to each flow in the network.
  • the entire optimal path allocation calculation process is divided into the following steps. Among them, the algorithm used to solve this problem in step S303 may be called a flow matrix algorithm (flow matrix algorithm, FMA).
  • FMA flow matrix algorithm
  • the implementation process of the first network device in step S303 can be represented as the implementation process in the dotted box 4 in Figure 4a (ie, "Step 407. Calculate the global optimal path according to the task"), including the following implementation process.
  • Step 4071 The first network device obtains the input communication relationship [SIP, DIP].
  • Step 4072 The first network device establishes a Dtor to Stor mapping matrix based on the communication relationship and topology information.
  • Step 4073 The first network device traverses the traffic matrix in the Dtor dimension, and the negative effect value (NEV) guides the matrix column iteration sequence.
  • NEV negative effect value
  • Step 4074 The first network device uses the horizontal degree of freedom (HDOF) value and the vertical degree of freedom (VDOF) value to guide the matrix row calculation sequence within the iteration.
  • HDOF horizontal degree of freedom
  • VDOF vertical degree of freedom
  • Step 4075 Determine whether the for loop traversing Dtor is completed. If so, execute step 4076. If not, repeat step 4073.
  • Step 4076 Output the optimal allocation result.
  • Communication relationship identification can also be used as one of the inputs in “Step 407.
  • the corresponding data stream is used to perform the AI collective communication task in 4061 or the steady-state flow communication task in 4062.
  • Steps 1 and 2 will be introduced below with some implementation examples.
  • step 4071 an implementation of the first network device aggregating the traffic information of the task communication cluster is shown in Table 8.
  • the complete flow table gathered by the first network device includes: flow number, flow table generation switch information, communication relationship pair, that is, [SIP, DIP], first packet time and number of bytes, several important items.
  • this flow table needs to be initialized.
  • the calculation-related information in the table includes the switch generated by the flow table and the communication IP pair [SIP, DIP].
  • the communication unidirectional flow table in the original table is converted into a bidirectional flow table, as shown in Table 9 and Table 10.
  • step 4072 the original bidirectional flow table is converted into a matrix of Dtor to Stor mapping, as shown in Table 11.
  • the original communication relationship table is first converted into a Dtor to SIP mapping matrix, and then the Stor where the SIP is located is found in the flow table generation switching device item in the original table. Therefore, it is finally converted into a mapping relationship matrix from Dtor to Stor.
  • the core of the global optimal path allocation algorithm is to calculate and process the Dtor-Stor mapping relationship matrix.
  • Dtor-Stor mapping matrix shown in Table 11 is an implementation example of the first mapping relationship in the aforementioned step S303.
  • Step 4073, step 4074 and step 4075 will be introduced below through some implementation examples.
  • the Dtor-Stor mapping matrix and ToR available port matrix need to be initialized.
  • the first network device converts the aggregated original flow table into a Dtor to Stor mapping matrix, as shown in Table 11.
  • the columns in the Dtor-Stor mapping matrix are represented by Dtor (Dtor), and the rows are represented by the Stor egress port number of the flow to Dtor.
  • the entries in the matrix represent the Stor (Stor) corresponding to the traffic to the Dtor (Dtor) corresponding to the column.
  • the entry ‘-1’ in the Dtor to Stor mapping matrix represents traffic flowing only through the local ToR.
  • two steps are required during the initialization phase: the first step is to convert the original flow table into a relationship matrix of Dtor to Stor mapping.
  • the second step is to generate the available outbound port matrix of ToR, as shown in Table 12.
  • the columns of the matrix are defined as the numbers of ToRs.
  • a row is defined as the egress port number that can be used as an alternative when the flow passes through each ToR. In the entire optimal path calculation stage, it is the calculation operation of the Dtor to Stor mapping matrix and the ToR available port matrix.
  • x represents the Dtor to Stor mapping relationship matrix, which represents the row index
  • j represents the column index of the matrix
  • k represents the element of the matrix corresponding to cell (i, j)
  • the first network device calculates the traffic matrix according to the FMA algorithm.
  • the FMA algorithm performs iterative calculations in the Dtor dimension of the traffic matrix, that is, traversing each column of the Dtor to Stor mapping matrix.
  • the algorithm selects the traversal order based on the negative effect value of each column, that is, the size of the NEV indicator value.
  • NEV is defined as the total number of entries minus 1, as shown in Table 13.
  • Table 13 shows the calculation before the first round of iteration. Except for Dtor-4 and Dtor-5, the elements of all Dtor columns have four unique values. According to the NEV definition, the value of its negative effects is 4-1, which is 3. In the case of the same NEV values, the selection of columns in the current iteration cycle is based on the natural Dtor order and the minimum value is selected to enter the current iteration cycle. When the NEV values are different: for example, Dtor-0 is selected for calculation in the first iteration cycle. The elements are 1, 2, 3, 7. Before the calculation of the second iteration cycle, the NEV value of Dtor-1 is 7-1, which is 6. The NEV value of Dtor-3 is 5-1, which is 4.
  • Dtor-1 A value smaller than Dtor-1 is also the smallest among all Dtor columns to be traversed. Then according to the traversal order constraints. Dtor-3 will enter the iteration calculation in second place. In this way, the certainty of the algorithm can be guaranteed. It can be seen from Table 13 that Dtor-4 and Dtor-5 do not perform NEV calculation at the beginning of the current iteration cycle. The reason is that the two Dtors have local traffic, which will cause the traffic out of the local switching device to be less than the number of outgoing ports. Take Table 13 as an example: Dtor-4 and Dtor-5 each have only 2 flows that need to be allocated and calculated on their respective 4 outgoing ports. The total number of allocation methods is, a total of 12 allocation methods.
  • the FMA algorithm traverses and iterates all Dtors through the above method.
  • the first network device directs ToR port allocation through the degree of freedom value. Specifically, after selecting the Dtor of the current iteration through NEV, it is necessary to calculate and allocate the egress port number on Stor for different traffic with the same Dtor in this iteration. As shown in Table 14.
  • the Dtor of the first round of iteration is selected as Dtor-0 based on the method described in Table 13, and its elements include traffic from 1, 2, 3, and 7Stor.
  • the traffic distribution calculation is performed on the Stor's egress port based on the longitudinal degree of freedom values, that is, VDOF and HDOF.
  • HDOF is defined as the total number of available output ports with the same number in the sub-matrix of this round of iterative calculation.
  • the algorithm stipulates that allocation calculations are performed in ascending order of HDOF values. If the HDOF values are the same, allocation is performed in natural order from small to large. calculate.
  • Table 14 in the initialization phase, all ToR outgoing ports are in a state to be allocated and calculated, so in the sub-matrix of the first round of iteration consisting of 4 ToRs, all outgoing ports with the same number are available. So the total HDOF value is 4.
  • the sub-traffic matrix composed of each iteration has strong randomness.
  • Calculation of ToR output port allocation based on the constraints of the HDOF value ensures that the optimal solution can be calculated in each iteration cycle.
  • Another indicator is VDOF, which is defined as the number of ports that can be selected by the corresponding TOR after this round of iterative calculation.
  • the algorithm stipulates that the allocation calculation is carried out in order from small to large VDOF values. If the VDOF values are the same, the allocation calculation is carried out in the natural order from small to large. As shown in Table 14, in the current iteration cycle, traffic with Dtor of 0 needs to select one port on each of the outgoing ports of the Stor.
  • the current number of optional ports for each Stor is the initial 4, so after the current cycle allocation calculation is completed, the number of optional ports for each Stor in the current sub-traffic matrix is 4-1, which is 3. Therefore, the port allocation calculation should be performed in the natural order of ToR in the current iteration cycle.
  • the blocks with an "x" symbol in Table 14 represent the calculation and allocation results of the outgoing ports on Stor for several flows of Dtor-0 in this round of iteration.
  • the global optimal path allocation algorithm completes the entire calculation process in four dimensions based on three index values. Because in addition to processing the dimension calculation of the three indicators themselves, the time dimension and historical calculation results are also taken into consideration to ensure the maximum deterministic distribution of this iteration and to ensure that subsequent iterations can continuously calculate the optimal solution until the end of the iteration.
  • the global optimal path allocation result can be calculated, as shown in Table 15.
  • the global optimal path allocation algorithm needs to traverse the traffic matrix in respective dimensions based on three indicator values, so its time complexity is O(n3), where n is the number of switch ports.
  • the calculation results can be traversed and checked on the rows and columns of the traffic matrix according to the traffic non-crossing determination rules, so the time complexity of the verification results is O(n2), where n is the number of switch ports.
  • the first network device records the calculated optimal path allocation result into the original flow table, as shown in Table 16.
  • the next hop of Stor the key output of path planning
  • the output calculation result matrix can be converted into a path planning table associated with the Stor outlet port.
  • the results are synchronized to the edge switching equipment of the network, guiding the traffic to select the egress port corresponding to the next hop of the edge switching node, thereby completing the global path allocation process.
  • the core innovation point of the embodiment of the present application lies in the multi-layer network system including N second network devices and P third network devices.
  • the first network device obtains the communication relationship and topology information
  • the first network device performs optimization calculation through the global optimal path allocation algorithm based on the gathered communication relationship information and network topology information.
  • the first network device sends the calculated optimization results to the N second network devices.
  • the N second network devices select routes based on the received optimization calculation results as local traffic path allocation guidance to achieve load sharing and control network congestion. Therefore, when N second network devices perform service packet forwarding, the global optimal path allocation algorithm is used to calculate the optimal path for all traffic, thereby converging in one step without the need for hashing on a local single device.
  • Function calculation routing solves the local conflict problem in the flow-by-flow multi-path load sharing method and the global conflict problem caused by the local decision-making mechanism; the business packet forwarding paths with fixed communication relationships are consistent, solving the packet-by-packet multi-path load sharing method The problem of out-of-order packets.
  • the N second network devices obtain traffic communication relationship information entering the network, including: source IP, destination IP, and first packet time.
  • the N second network devices obtain local topology information. and aggregate the obtained local traffic communication relationship information and topology information to the first network device.
  • the first network device combines the communication relationship information and topology information of the converged edge switching nodes into a network traffic information table.
  • the network information table records at least one edge node topology information and communication relationship.
  • the network information table is input to the global optimal path allocation algorithm, and the columns of the network information table are mappings of traffic destination edge switching nodes to all source switching nodes to this destination edge switching node.
  • the rows in the network information table represent the egress port numbers of the source switching nodes to be allocated.
  • the first network device calculates and outputs a network traffic path allocation table through a global optimal path allocation algorithm.
  • the network traffic path allocation table contains network path allocation information for traffic within N second network devices.
  • the columns of the network path allocation table map the traffic destination edge switching node to all source switching nodes to this destination edge switching node.
  • the rows of the network path allocation table represent the traffic to the destination edge node that is allocated to the source edge node. Output port.
  • the communication device 800 can implement the functions of the communication device (ie, the first network device or the second network device) in the above method embodiment, and therefore can also implement the above method. Beneficial effects possessed by the embodiment.
  • the communication device 800 When the communication device 800 is used to implement the function of the aforementioned first network device, the communication device includes a transceiver unit 801 and a processing unit 802; the transceiver unit 801 is used to obtain first topology information, and the first topology information includes N
  • the transceiver unit 801 is also used to obtain the communication relationship of M data streams.
  • the communication relationship of each data stream in the M data streams includes source address information and destination address information.
  • M is an integer greater than or equal to 2.
  • the M data streams are respectively transmitted to the P third network devices through the N second network devices;
  • the processing unit 802 is configured to determine M paths according to the communication relationships of the M data streams and the first topology information.
  • the M paths respectively correspond to M data streams, and the M paths indicate paths for transmitting the M data streams through the N second network devices to the P third network devices;
  • the transceiver unit 801 is also used to The M paths are sent to the N second network devices respectively.
  • the M data streams are transmitted to K fourth network devices through the P third network devices, where K is an integer greater than or equal to 1;
  • the M data streams include the first data stream and a second data stream, the source address information of the first data stream and the source address information of the second data stream correspond to different second network devices, and the destination address information of the first data stream and the second data stream
  • the destination address information corresponds to the same fourth network device
  • the M paths include a first path and a second path, the first path corresponds to the first data flow, the second path corresponds to the second data flow, and the A path and the second path correspond to different third network devices.
  • the M paths also indicate the egress ports of the M data flows on the N second network devices.
  • the M data streams include a third data stream and a fourth data stream, and the source address information of the third data stream and the source address information of the fourth data stream correspond to the same second network.
  • the M paths include a third path and a fourth path, the third path corresponds to the third data flow, the fourth path corresponds to the fourth data flow, and the third path is different from the fourth path.
  • the M data streams are transmitted to K fourth network devices through the P third network devices, where K is a positive integer; the processing unit 802 is specifically configured to: according to the M data streams
  • the communication relationship and the first topology information determine a first mapping relationship.
  • the first mapping relationship is used to indicate the second network device corresponding to the source address information of each of the M data streams and the second network device in the M data streams.
  • a mapping relationship between fourth network devices corresponding to the target address information of each data flow; the M paths are determined according to the first mapping relationship.
  • the processing unit 802 is specifically configured to determine first sorting information according to the first mapping relationship, where the first sorting information is used to indicate second network devices corresponding to the K fourth network devices. Sorting of the number; sequentially traverse the egress ports of the N second network devices according to the first sorting information to obtain a second mapping relationship; wherein the second mapping relationship is used to indicate the N second network devices. Mapping relationship between the egress port and the K fourth network devices; determining the M paths based on the second mapping relationship.
  • the processing unit 802 is specifically configured to: sequentially traverse the egress ports of the N second network devices according to the first sorting information to obtain a third mapping relationship; wherein, the third mapping The relationship is used to indicate an optional number of egress ports of the second network device corresponding to each fourth network device; the second mapping relationship is determined based on the third mapping relationship.
  • the transceiver unit 801 is also configured to obtain second topology information.
  • the second topology information includes connection relationships between A second network devices and the P third network devices.
  • the A At least one second network device among the N second network devices is the same as at least one second network device among the N second network devices, and A is an integer greater than or equal to 1; the transceiver unit 801 is also used to obtain B
  • the communication relationship of each data stream in the B data streams includes source address information and destination address information.
  • B is an integer greater than or equal to 1.
  • the B data streams pass through the A respectively.
  • the second network device transmits to the P third network devices; the processing unit 802 is also configured to determine B paths according to the communication relationships of the B data flows and the topology information, and the B paths respectively correspond to the B data flows. , the B paths indicate paths for sending the M data streams to the P third network devices through the A second network devices; wherein, the egress ports of the second network devices corresponding to the B paths are different from the M The egress ports of the second network devices corresponding to the paths; the transceiver unit 801 is also configured to send the B paths to the A second network devices respectively.
  • the transceiver unit 801 is specifically configured to receive the communication relationships of the M data streams from the N second network devices respectively.
  • the M data streams correspond to one of multiple artificial intelligence AI collective communication tasks.
  • the first network device is a controller or one of the P third network devices.
  • the device When the communication device 800 is used to implement the function of the aforementioned second network device, the device includes a transceiver unit 801 and a processing unit 802; the processing unit 802 is used to determine the communication relationships of Q data streams.
  • the communication relationship of each data stream includes source address information and destination address information, and Q is an integer greater than or equal to 1; the transceiver unit 801 is used to send the communication relationships of Q data streams to the first network device; the transceiver unit 801 also Used to receive Q paths from the first network device, the Q paths indicate the paths used by the second network device to transmit the Q data streams; the transceiver unit 801 is also used to transmit the Q data streams based on the Q paths. data flow.
  • the path information also indicates the egress ports of the Q data flows on the second network device.
  • the Q data streams include a third data stream and a fourth data stream
  • the source address information of the third data stream and the source address information of the fourth data stream correspond to the second network Device
  • the path information includes a third path and a fourth path
  • the third path corresponds to the third data flow
  • the fourth path corresponds to the fourth data flow
  • the third path and the fourth path are different.
  • the Q data streams correspond to one task among multiple artificial intelligence AI collective communication tasks.
  • FIG. 9 is a schematic structural diagram of a communication device 900 provided by an embodiment of the present application.
  • the communication device 900 performs the functions of the first network device in FIG. 3 and related embodiments; wherein, the communication device 1000 performs the functions of the second network device in FIG. 3 and related embodiments.
  • the communication device 900 performs the functions of the second network device in FIG. 3 and related embodiments; wherein, the communication device 1000 performs the functions of the first network device in FIG. 3 and related embodiments.
  • the communication device 900 shown in FIG. 9 includes a memory 902 and at least one processor 901.
  • the processor 901 implements the method in the above embodiment by reading instructions stored in the memory 902, or the processor 901 can also implement the method in the above embodiment by using internally stored instructions.
  • the processor 901 implements the method in the above embodiment by reading the instructions stored in the memory 902
  • the memory 902 stores instructions for implementing the method provided by the above embodiment of the present application.
  • At least one processor 901 is one or more CPUs, or a single-core CPU, or a multi-core CPU.
  • At least one processor 901 can also be used to execute the implementation process corresponding to the processing unit 602 in the embodiment shown in FIG. 6 and achieve corresponding beneficial effects, which will not be described again here.
  • the memory 902 includes, but is not limited to, RAM, ROM, EPROM, flash memory, optical memory, etc.
  • the memory 902 stores operating system instructions.
  • the communication device After the program instructions stored in the memory 902 are read by the at least one processor 901, the communication device performs the corresponding operations in the aforementioned embodiments.
  • the communication device shown in FIG. 9 also includes a network interface 903.
  • the network interface 903 may be a wired interface, such as an FDDI or GE interface; the network interface 903 may also be a wireless interface.
  • the network interface 903 is used to perform data sending and receiving in FIG. 3 and related embodiments.
  • the network interface 903 can also be used to execute the implementation process corresponding to the transceiver unit 601 in the embodiment shown in FIG. 6 and achieve corresponding beneficial effects, which will not be described again here.
  • the network interface 903 has the functions of receiving data and sending data.
  • the function of "receiving data” and the function of “sending data” can be integrated and implemented in the same transceiver interface, or the function of "receiving data” and the function of “sending data” can be implemented. ” functions can be implemented in different interfaces, and there are no limitations here.
  • the network interface 903 may include one or more interfaces for implementing the function of "receiving data” and the function of "sending data”.
  • the processor 901 reads the program instructions in the memory 902, for other functions that the communication device 900 can perform, please refer to the descriptions in the previous method embodiments.
  • the communication device 900 also includes a bus 904.
  • the processor 901 and the memory 902 are usually connected to each other through the bus 904, but may also be connected to each other in other ways.
  • the communication device 900 also includes an input and output interface 905.
  • the input and output interface 905 is used to connect with an input device and receive relevant configuration information input by the user or other devices that can be linked with the communication device 900 through the input device.
  • Input devices include but are not limited to keyboards, touch screens, microphones, etc.
  • the communication device 900 provided by the embodiment of the present application is used to perform the method performed by the communication device (first network device) provided by the above method embodiments, and achieve corresponding beneficial effects.
  • the communication device 900 when the communication device 900 performs the functions of the first network device in FIG. 3 and related embodiments; the communication device 900 obtains the first connection relationship between N communication devices 1000 and P third network devices. A piece of topology information, and after the communication device 900 obtains the communication relationships of the M data streams, the communication device 900 determines and sends M paths to the N communication devices 1000 based on the communication relationships of the M data streams and the first topology information. . Thereafter, N communication devices 1000 may respectively send the M data streams to P third network devices based on the M paths. In other words, the communication device 900 serves as a device for determining paths. The communication device 900 can determine M paths corresponding to M data streams transmitted between N communication devices 1000 and P third network devices. Therefore, the communication device 900 can determine the path based on the global information to avoid path conflicts and improve the forwarding efficiency of the data flow.
  • the communication device 900 when the communication device 900 performs the functions of the second network device in FIG. 3 and related embodiments; after the communication device 900 sends the communication relationships of Q data streams to the communication device 1000, the communication device 900 receives The communication device 1000 instructs the communication device 900 to use Q paths when transmitting the Q data streams, and the communication device 900 transmits the Q data streams based on the Q paths.
  • the communication device 1000 serves as a device for determining paths.
  • the communication device 1000 can determine M paths corresponding to M data streams transmitted between N communication devices 900 and P third network devices. Therefore, the communication device 1000 can determine the path based on the global information to avoid path conflicts and improve the forwarding efficiency of the data flow.
  • An embodiment of the present application also provides a communication system, which includes at least a first network device and N second network devices.
  • the communication system also includes P third network devices.
  • the communication system also includes K fourth network devices.
  • each network device can also apply other methods involved in the foregoing embodiments and achieve corresponding technical effects, which will not be described again here.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请提供了一种路径确定的方法及相关设备,用于提升数据流的转发效率。在该方法中,第一网络设备获取包含有N个第二网络设备和P个第三网络设备之间的连接关系的第一拓扑信息,并且该第一网络设备获取M条数据流的通信关系之后,该第一网络设备根据该M条数据流的通信关系和该第一拓扑信息确定并向N个第二网络设备发送M个路径。此后,N个第二网络设备可以基于该M个路径分别向P个第三网络设备发送该M条数据流。从而,相比于N个第二网络设备仅基于本地数据流作为路径确定依据而容易导致路径冲突的实现方式,在该方法中,第一网络设备能够基于全局信息实现路径的确定,以避免路径冲突,提升数据流的转发效率。

Description

一种路径确定的方法及相关设备
本申请要求于2022年07月27日提交中国国家知识产权局,申请号为202210891496.0,发明名称为“一种路径确定的方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信领域,尤其涉及一种路径确定的方法及相关设备。
背景技术
在通信网络中,不同通信装置之间不存在直连链路的情况下,不同通信装置之间交互的数据流需要经过其他通信装置的转发。其中,该其他通信装置可以称为转发设备,例如该转发设备可以包括路由器、交换机或虚拟机等。
目前,随着组网的规模越加庞大,有可能需要通过多层网络的转发才可以实现不同通信装置之间数据流的交互。而在该多层网络中,数据流的源地址对应的通信装置和数据流的目的地址对应的通信装置之间往往存在多条路径。作为转发设备的通信装置在转发数据流的时候,会基于本地策略在该多条路径中进行路径选择,并基于本地选择的路径转发数据流。
然而,在多层网络中,往往存在多个作为转发设备的通信装置,而每一个作为转发设备的通信装置的路径确定依据为各自的本地数据流,容易导致不同转发设备确定的路径存在冲突,影响数据流的转发效率。
发明内容
本申请提供了一种路径确定的方法及相关设备,用于提升数据流的转发效率。
本申请第一方面提供了一种路径确定的方法,该方法由第一网络设备执行,或者,该方法由第一网络设备中的部分组件(例如处理器、芯片或芯片系统等)执行,或者该方法还可以由能实现全部或部分第一网络设备功能的逻辑模块或软件实现。在第一方面及其可能的实现方式中,以该方法由第一网络设备执行为例进行描述,该第一网络设备可以为路由器、交换机、虚拟机等。在该方法中,第一网络设备获取第一拓扑信息,该第一拓扑信息包括N个第二网络设备和P个第三网络设备之间的连接关系,任一第二网络设备为任一第三网络设备的上游网络设备,N为大于或等于2的整数,P为大于或等于1的整数;该第一网络设备获取M条数据流的通信关系,该M条数据流中的每条数据流的通信关系包括源地址信息和目的地址信息,M为大于或等于2的整数,该M条数据流分别通过该N个第二网络设备向该P个第三网络设备传输;该第一网络设备根据该M条数据流的通信关系和该第一拓扑信息确定M个路径,该M个路径分别与M条数据流对应,该M个路径指示通过该N个第二网络设备向该P个第三网络设备传输该M条数据流的路径;该第一网络设备分别向该N个第二网络设备发送该M个路径。
基于上述技术方案,第一网络设备获取包含有N个第二网络设备和P个第三网络设备之间的连接关系的第一拓扑信息,并且该第一网络设备获取M条数据流的通信关系之后,该第一网络设备根据该M条数据流的通信关系和该第一拓扑信息确定并向N个第二网络设备发送M个路径。此后,N个第二网络设备可以基于该M个路径分别向P个第三网络设备发送该M条数据流。换言之,第一网络设备作为确定路径的设备,该第一网络设备的路径确定依据为N个第二网络设备和P个第三网络设备之间的连接关系以及M条数据流的通信关系。从而,相比于N个第二网络设备仅基于本地数据流作为路径确定依据而容易导致路径冲突的实现方式,在上述方法中,第一网络设备能够基于全局信息实现路径的确定,以避免路径冲突,提升数据流的转发效率。
应理解,M条数据流中的任一数据流可以为单向数据流,也可以为双向数据流,本申请对此不做限定。其中,若M条数据流中存在某一数据流为双向数据流,则该M条数据流的通信关系中,该双向数据流的通信关系可以仅包含某一流向的源地址信息以及目的地址信息,该双向数据流的通信关系也可以包含两个流向分别对应的源地址信息以及目的地址信息,此处不做限定。
在第一方面的一种可能的实现方式中,该M条数据流通过该P个第三网络设备向K个第四网络设备 传输,K为大于或等于1的整数;该M条数据流包括第一数据流和第二数据流,该第一数据流的源地址信息与该第二数据流的源地址信息对应于不同的第二网络设备,该第一数据流的目的地址信息与该第二数据流的目的地址信息对应于同一第四网络设备,该M个路径包括第一路径和第二路径,该第一路径与该第一数据流对应,该第二路径与该第二数据流对应,该第一路径与该第二路径对应于不同的第三网络设备。
可选地,任一第四网络设备与任一第二网络设备均不相同。
可选地,N个第二网络设备中的至少一个网络设备与K个第四网络设备中的至少一个网络设备为相同的网络设备。
可选地,N与K相等,且N个第二网络设备与K个第四网络设备为相同的网络设备。
基于上述技术方案,第一网络设备确定的M个路径包括与第一数据流对应的第一路径以及与第二数据流对应的第二路径,并且,该第一路径与该第二路径对应于不同的第三网络设备,其中,该第一数据流的源地址信息与该第二数据流的源地址信息对应于不同的第二网络设备,该第一数据流的目的地址信息与该第二数据流的目的地址信息对应于同一第四网络设备。换言之,由于任一第四网络设备为任一第三网络设备的下游网络设备,第一数据流和第二数据流通过不同的第二网络设备分别向不同的第三网络设备发送之后,该不同的第三网络设备分别向同一第四网络设备发送该第一数据流和第二数据流。从而,可以避免来自于不同第二网络设备的数据流通过同一第三网络设备传输之后,再通过同一第三网络设备向同一第四网络设备传输的过程中产生的网络拥塞,以提升该第一数据流和该第二数据流的传输效率。
在第一方面的一种可能的实现方式中,该M个路径还指示该M条数据流在该N个第二网络设备上的出端口。
基于上述技术方案,第一网络设备确定的M个路径除了指示通过该N个第二网络设备向该P个第三网络设备传输该M条数据流的路径之外,该M个路径还指示该M条数据流在该N个第二网络设备上的出端口,以便于该N个第二网络设备接收M个路径之后,能够明确发送该M条数据流的出端口。
在第一方面的一种可能的实现方式中,该M条数据流包括第三数据流和第四数据流,该第三数据流的源地址信息和该第四数据流的源地址信息对应于同一第二网络设备,该M个路径包括第三路径和第四路径,该第三路径和该第三数据流对应,该第四路径和该第四数据流对应,该第三路径和该第四路径不同。
基于上述技术方案,第一网络设备确定的M个路径包括与第三数据流对应的第三路径以及与第四数据流对应的第四路径,并且,该第三路径和该第四路径不同。其中,该第三数据流的源地址信息和该第四数据流的源地址信息对应于同一第二网络设备。换言之,第三数据流和第四数据流通过同一第二网络设备分别通过不同的路径进行传输。从而,可以避免来自于同一第二网络设备的数据流通过相同路径传输的过程中产生的网络拥塞,以提升该第三数据流和该第四数据流的传输效率。
在第一方面的一种可能的实现方式中,该M条数据流通过该P个第三网络设备向K个第四网络设备传输,K为正整数;该第一网络设备根据该M条数据流的通信关系和该第一拓扑信息确定M个路径包括:该第一网络设备根据该M条数据流的通信关系和该第一拓扑信息确定第一映射关系,该第一映射关系用于指示该M条数据流中每条数据流的源地址信息对应的第二网络设备与该M条数据流中每条数据流的目标地址信息对应的第四网络设备之间的映射关系;该第一网络设备根据该第一映射关系确定该M个路径。
可选地,该拓扑信息还包括P个第三网络设备和K个第四网络设备之间的连接关系。
基于上述技术方案,提供了第一网络设备确定M个路径的一种实现方式,以便于第一网络设备基于该N个第二网络设备与K个第四网络设备之间的映射关系确定该M个路径。
在第一方面的一种可能的实现方式中,该第一网络设备根据该第一映射关系确定该M个路径包括:该第一网络设备根据该第一映射关系确定第一排序信息,该第一排序信息用于指示该K个第四网络设备对应的第二网络设备的数量的排序;该第一网络设备根据该第一排序信息依次对该N个第二网络设备的出端口进行遍历,得到第二映射关系;其中,该第二映射关系用于指示该N个第二网络设备的出端口与该K个第四网络设备之间的映射关系;该第一网络设备基于该第二映射关系确定该M个路径。
基于上述技术方案,提供了第一网络设备确定M个路径的一种实现方式,以便于第一网络设备基于 依次确定的第一映射关系、第一排序信息以及第二映射关系确定该M个路径。
在第一方面的一种可能的实现方式中,该第一网络设备根据该第一排序信息依次对该N个第二网络设备的出端口进行遍历,得到第二映射关系包括:该第一网络设备根据该第一排序信息依次对该N个第二网络设备的出端口进行遍历,得到第三映射关系;其中,该第三映射关系用于指示每个该第四网络设备对应的该第二网络设备的出端口的可选数量;该第一网络设备基于该第三映射关系确定该第二映射关系。
基于上述技术方案,第三映射关系用于指示每个该第四网络设备对应的该第二网络设备的出端口的可选数量,其中,该可选数量的取值越大则对应的第四网络设备的可选路径的不确定性就越小,反之,该可选数量的取值越小则对应的第四网络设备的可选路径的不确定性就越大。为此,基于第三映射关系所确定的第二映射关系能够优先为可选路径的不确定性较小的第四网络设备对应的该第二网络设备的出端口进行遍历,以提升方案的准确性,避免后续基于该第二映射关系所确定的M个路径产生冲突。
在第一方面的一种可能的实现方式中,该方法还包括:该第一网络设备获取第二拓扑信息,该第二拓扑信息包括A个第二网络设备和该P个第三网络设备之间的连接关系,该A个第二网络设备中的至少一个第二网络设备与该N个第二网络设备中的至少一个第二网络设备相同,该A为大于或等于1的整数;该第一网络设备获取B条数据流的通信关系,该B条数据流中的每条数据流的通信关系包括源地址信息和目的地址信息,该B为大于或等于1的整数,该B条数据流分别通过该A个第二网络设备向该P个第三网络设备传输;在该第一网络设备根据该M条数据流的通信关系和该拓扑信息确定M个路径之后,该方法还包括:该第一网络设备根据该B条数据流的通信关系和该拓扑信息确定B个路径,该B个路径分别与B条数据流对应,该B个路径指示通过该A个第二网络设备向该P个第三网络设备发送该M条数据流的路径;其中,该B个路径对应的第二网络设备的出端口不同于该M个路径对应的第二网络设备的出端口;该第一网络设备分别向该A个第二网络设备发送该B个路径。
基于上述技术方案,第一网络设备确定的B个路径对应的第二网络设备的出端口不同于该M个路径对应的第二网络设备的出端口,以避免M条数据流和B条数据流对应于同一第二网络设备的出端口时产生的流量冲突,提升M条数据流和B条数据流的数据传输效率。
在第一方面的一种可能的实现方式中,该第一网络设备获取M条数据流的通信关系包括:该第一网络设备分别接收来自该N个第二网络设备的该M条数据流的通信关系。
基于上述技术方案,第一网络设备可以基于分别接收来自该N个第二网络设备的该M条数据流的通信关系的方式,以获取该M条数据流的通信关系,以便于该第一网络设备和N个第二网络设备之间进行交互的方式获得该M条数据流的通信关系。
可选地,该M条数据流的通信关系预配置于该第一网络设备,以避免不同网络设备之间的交互所造成的开销以及时延增加。
在第一方面的一种可能的实现方式中,该M条数据流对应于多个人工智能(artificial intelligence,AI)集合通信任务中的一个任务。
可选地,该M条数据流对应于长稳态流量任务,该长稳态流量任务中的数据流的流量大小在一定的时长内大于预设阈值。
在第一方面的一种可能的实现方式中,该第一网络设备为控制器或该P个第三网络设备中的一个网络设备。
基于上述技术方案,执行该方法确定并发送M个路径的第一网络设备可以为控制器,也可以为P个第三网络设备中的一个网络设备,以提升方案实现的灵活性。
本申请第二方面提供了一种路径确定的方法,该方法由第二网络设备执行,或者,该方法由第二网络设备中的部分组件(例如处理器、芯片或芯片系统等)执行,或者该方法还可以由能实现全部或部分第二网络设备功能的逻辑模块或软件实现。在第二方面及其可能的实现方式中,以该方法由第二网络设备执行为例进行描述,该第二网络设备可以为路由器、交换机、虚拟机等。在该方法中,第二网络设备向第一网络设备发送Q条数据流的通信关系,该Q条数据流中的每条数据流的通信关系包括源地址信息和目的地址信息,Q为大于或等于1的整数;该第二网络设备接收来自第一网络设备的Q个路径,该Q 个路径指示该第二网络设备传输该Q条数据流时使用的路径;该第二网络设备基于该Q个路径传输该Q条数据流。
基于上述技术方案,第二网络设备向第一网络设备发送Q条数据流的通信关系的关系之后,该第二网络设备接收来自第一网络设备的指示该第二网络设备传输该Q条数据流时使用的Q个路径,并且,该第二网络设备基于该Q个路径传输该Q条数据流。换言之,第一网络设备作为确定路径的设备,该第一网络设备能够确定在N个第二网络设备和P个第三网络设备之间传输的M条数据流对应的M个路径。从而,相比于N个第二网络设备仅基于本地数据流作为路径确定依据而容易导致路径冲突的实现方式,在上述方法中,第一网络设备能够基于全局信息实现路径的确定,以避免路径冲突,提升数据流的转发效率。
应理解,Q条数据流中的任一数据流可以为单向数据流,也可以为双向数据流,本申请对此不做限定。其中,若Q条数据流中存在某一数据流为双向数据流,则该Q条数据流的通信关系中,该双向数据流的通信关系可以仅包含某一流向的源地址信息以及目的地址信息,该双向数据流的通信关系也可以包含两个流向分别对应的源地址信息以及目的地址信息,此处不做限定。
在第二方面的一种可能的实现方式中,该路径信息还指示该Q条数据流在该第二网络设备上的出端口。
基于上述技术方案,第二网络设备接收的Q个路径除了指示通过该第二网络设备传输该Q条数据流的路径之外,该Q个路径还指示该Q条数据流在该第二网络设备上的出端口,以便于该第二网络设备接收Q个路径之后,能够明确发送该Q条数据流的出端口。
在第二方面的一种可能的实现方式中,该Q条数据流包括第三数据流和第四数据流,该第三数据流的源地址信息和该第四数据流的源地址信息对应于该第二网络设备,该路径信息包括第三路径和第四路径,该第三路径和该第三数据流对应,该第四路径和该第四数据流对应,该第三路径和该第四路径不同。
基于上述技术方案第二网络设备接收的Q个路径包括与第三数据流对应的第三路径以及与第四数据流对应的第四路径,并且,该第三路径和该第四路径不同。其中,该第三数据流的源地址信息和该第四数据流的源地址信息对应于同一第二网络设备。换言之,第三数据流和第四数据流通过同一第二网络设备分别通过不同的路径进行传输。从而,可以避免来自于同一第二网络设备的数据流通过相同路径传输的过程中产生的网络拥塞,以提升该第三数据流和该第四数据流的传输效率。
在第二方面的一种可能的实现方式中,该Q条数据流对应于多个AI集合通信任务中的一个任务。
可选地,该Q条数据流对应于长稳态流量任务,该长稳态流量任务中的数据流的流量大小在一定的时长内大于预设阈值。
本申请第三方面提供了一种通信装置,该装置可以实现上述第一方面或第一方面任一种可能的实现方式中的方法。该装置包括用于执行上述方法的相应的单元或模块。该装置包括的单元或模块可以通过软件和/或硬件方式实现。例如,该装置可以为第一网络设备,或者,该装置可以为第一网络设备中的组件(例如处理器、芯片或芯片系统等),或者该装置还可以为能实现全部或部分第一网络设备功能的逻辑模块或软件。
该装置包括收发单元和处理单元;该收发单元用于获取第一拓扑信息,该第一拓扑信息包括N个第二网络设备和P个第三网络设备之间的连接关系,任一第二网络设备为任一第三网络设备的上游网络设备,N为大于或等于2的整数,P为大于或等于1的整数;该收发单元还用于获取M条数据流的通信关系,该M条数据流中的每条数据流的通信关系包括源地址信息和目的地址信息,M为大于或等于2的整数,该M条数据流分别通过该N个第二网络设备向该P个第三网络设备传输;该处理单元用于根据该M条数据流的通信关系和该第一拓扑信息确定M个路径,该M个路径分别与M条数据流对应,该M个路径指示通过该N个第二网络设备向该P个第三网络设备传输该M条数据流的路径;该收发单元还用于分别向该N个第二网络设备发送该M个路径。
在第三方面的一种可能的实现方式中,该M条数据流通过该P个第三网络设备向K个第四网络设备传输,K为大于或等于1的整数;该M条数据流包括第一数据流和第二数据流,该第一数据流的源地址信息与该第二数据流的源地址信息对应于不同的第二网络设备,该第一数据流的目的地址信息与该第二 数据流的目的地址信息对应于同一第四网络设备,该M个路径包括第一路径和第二路径,该第一路径与该第一数据流对应,该第二路径与该第二数据流对应,该第一路径与该第二路径对应于不同的第三网络设备。
在第三方面的一种可能的实现方式中,该M个路径还指示该M条数据流在该N个第二网络设备上的出端口。
在第三方面的一种可能的实现方式中,该M条数据流包括第三数据流和第四数据流,该第三数据流的源地址信息和该第四数据流的源地址信息对应于同一第二网络设备,该M个路径包括第三路径和第四路径,该第三路径和该第三数据流对应,该第四路径和该第四数据流对应,该第三路径和该第四路径不同。
在第三方面的一种可能的实现方式中,该M条数据流通过该P个第三网络设备向K个第四网络设备传输,K为正整数;该处理单元具体用于:根据该M条数据流的通信关系和该第一拓扑信息确定第一映射关系,该第一映射关系用于指示该M条数据流中每条数据流的源地址信息对应的第二网络设备与该M条数据流中每条数据流的目标地址信息对应的第四网络设备之间的映射关系;根据该第一映射关系确定该M个路径。
在第三方面的一种可能的实现方式中,该处理单元具体用于:根据该第一映射关系确定第一排序信息,该第一排序信息用于指示该K个第四网络设备对应的第二网络设备的数量的排序;根据该第一排序信息依次对该N个第二网络设备的出端口进行遍历,得到第二映射关系;其中,该第二映射关系用于指示该N个第二网络设备的出端口与该K个第四网络设备之间的映射关系;基于该第二映射关系确定该M个路径。
在第三方面的一种可能的实现方式中,该处理单元具体用于:根据该第一排序信息依次对该N个第二网络设备的出端口进行遍历,得到第三映射关系;其中,该第三映射关系用于指示每个该第四网络设备对应的该第二网络设备的出端口的可选数量;基于该第三映射关系确定该第二映射关系。
在第三方面的一种可能的实现方式中,该收发单元还用于获取第二拓扑信息,该第二拓扑信息包括A个第二网络设备和该P个第三网络设备之间的连接关系,该A个第二网络设备中的至少一个第二网络设备与该N个第二网络设备中的至少一个第二网络设备相同,该A为大于或等于1的整数;该收发单元还用于获取B条数据流的通信关系,该B条数据流中的每条数据流的通信关系包括源地址信息和目的地址信息,该B为大于或等于1的整数,该B条数据流分别通过该A个第二网络设备向该P个第三网络设备传输;该处理单元还用于根据该B条数据流的通信关系和该拓扑信息确定B个路径,该B个路径分别与B条数据流对应,该B个路径指示通过该A个第二网络设备向该P个第三网络设备发送该M条数据流的路径;其中,该B个路径对应的第二网络设备的出端口不同于该M个路径对应的第二网络设备的出端口;该收发单元还用于分别向该A个第二网络设备发送该B个路径。
在第三方面的一种可能的实现方式中,该收发单元具体用于分别接收来自该N个第二网络设备的该M条数据流的通信关系。
在第三方面的一种可能的实现方式中,该M条数据流对应于多个人工智能AI集合通信任务中的一个任务。
在第三方面的一种可能的实现方式中,该第一网络设备为控制器或该P个第三网络设备中的一个网络设备。
本申请第三方面中,通信装置的组成模块还可以用于执行第一方面的各个可能实现方式中所执行的步骤,并实现相应的技术效果,具体均可以参阅第一方面,此处不再赘述。
本申请第四方面提供了一种通信装置,该装置可以实现上述第二方面或第二方面任一种可能的实现方式中的方法。该装置包括用于执行上述方法的相应的单元或模块。该装置包括的单元或模块可以通过软件和/或硬件方式实现。例如,该装置可以为第二网络设备,或者,该装置可以为第二网络设备中的组件(例如处理器、芯片或芯片系统等),或者该装置还可以为能实现全部或部分第二网络设备功能的逻辑模块或软件。
该装置包括收发单元和处理单元;该处理单元用于确定Q条数据流的通信关系,该Q条数据流中的 每条数据流的通信关系包括源地址信息和目的地址信息,Q为大于或等于1的整数;该收发单元用于向第一网络设备发送Q条数据流的通信关系;该收发单元还用于接收来自第一网络设备的Q个路径,该Q个路径指示该第二网络设备传输该Q条数据流时使用的路径;该收发单元还用于基于该Q个路径传输该Q条数据流。
在第四方面的一种可能的实现方式中,该路径信息还指示该Q条数据流在该第二网络设备上的出端口。
在第四方面的一种可能的实现方式中,该Q条数据流包括第三数据流和第四数据流,该第三数据流的源地址信息和该第四数据流的源地址信息对应于该第二网络设备,该路径信息包括第三路径和第四路径,该第三路径和该第三数据流对应,该第四路径和该第四数据流对应,该第三路径和该第四路径不同。
在第四方面的一种可能的实现方式中,该Q条数据流对应于多个人工智能AI集合通信任务中的一个任务。
本申请第四方面中,通信装置的组成模块还可以用于执行第二方面的各个可能实现方式中所执行的步骤,并实现相应的技术效果,具体均可以参阅第二方面,此处不再赘述。
本申请第五方面提供了一种通信装置。该通信装置包括至少一个处理器。该至少一个处理器与存储器耦合。该存储器用于存储程序或指令。该至少一个处理器用于执行该程序或指令,以使该装置实现前述第一方面或第一方面任意一种可能的实现方式所述的方法,或,以使该装置实现前述第二方面或第二方面任意一种可能的实现方式所述的方法。
本申请第六方面提供了一种通信装置,包括至少一个逻辑电路和输入输出接口;该逻辑电路用于执行如前述第一方面或第一方面任意一种可能的实现方式所述的方法,或,该逻辑电路用于执行如前述第二方面或第二方面任意一种可能的实现方式所述的方法。
本申请第七方面提供一种计算机可读存储介质,用于存储计算机指令;当计算机指令被处理器执行时,该处理器执行如上述第一方面或第一方面任意一种可能的实现方式所述的方法,或,该处理器执行如上述第二方面或第二方面任意一种可能的实现方式所述的方法。
本申请第八方面提供一种计算机程序产品(或称计算机程序),该计算机程序产品包括指令,当该计算机程序产品中的指令被处理器执行时,该处理器执行上述第一方面或第一方面任意一种可能实现方式的方法,或,该处理器执行上述第二方面或第二方面任意一种可能实现方式的方法。
本申请第九方面提供了一种芯片系统,该芯片系统包括至少一个处理器,用于支持通信装置实现上述第一方面或第一方面任意一种可能的实现方式中所涉及的功能,或,用于支持通信装置实现上述第二方面或第二方面任意一种可能的实现方式中所涉及的功能。
在一种可能的设计中,该芯片系统还可以包括存储器,存储器,用于保存该通信装置必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。可选的,该芯片系统还包括接口电路,该接口电路为该至少一个处理器提供程序指令和/或数据。
本申请第十方面提供了一种通信系统,该通信系统包括上述任一方面中的第一网络设备。
可选地,该通信系统包括上述任一方面中的一个或多个第二网络设备。
可选地,该通信系统包括上述任一方面中的一个或多个第三网络设备。
可选地,该通信系统包括上述任一方面中的一个或多个第四网络设备。
其中,第二方面至第十方面中任一种设计方式所带来的技术效果可参见上述第一方面中不同实现方式所带来的技术效果,在此不再赘述。
附图说明
图1a为本申请提供的通信系统的一个示意图;
图1b为本申请提供的通信系统的另一个示意图;
图1c为本申请提供的通信系统的另一个示意图;
图1d为本申请提供的通信系统的另一个示意图;
图1e为本申请提供的通信系统的另一个示意图;
图1f为本申请提供的通信系统的另一个示意图;
图2a为本申请涉及的通信场景的一个示意图;
图2b为本申请涉及的通信场景的另一个示意图;
图2c为本申请涉及的通信场景的另一个示意图;
图2d为本申请涉及的通信场景的另一个示意图;
图2e为本申请涉及的通信场景的另一个示意图;
图2f为本申请涉及的通信场景的另一个示意图;
图2g为本申请涉及的通信场景的另一个示意图;
图3为本申请提供的路径确定的方法的一个示意图;
图4a为本申请提供的路径确定的方法的另一个示意图;
图4b为本申请提供的路径确定的方法的另一个示意图;
图5a为本申请提供的AI集合通信过程的一个示意图;
图5b为本申请提供的AI集合通信过程的另一个示意图;
图5c为本申请提供的AI集合通信过程的另一个示意图;
图6为本申请提供的AI集合通信过程的另一个示意图;
图7为本申请提供的AI集合通信过程的另一个示意图;
图8为本申请提供的通信装置的一个示意图;
图9为本申请提供的通信装置的另一个示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行描述。
本申请实施例中的术语“系统”和“网络”可被互换使用。“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A、同时存在A和B、单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如“A,B和C中的至少一个”包括A,B,C,AB,AC,BC或ABC。以及,除非有特别说明,本申请实施例提及“第一”、“第二”等序数词是用于对多个对象进行区分,不用于限定多个对象的顺序、时序、优先级或者重要程度。
需要说明的是,本申请中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
在通信网络中,不同通信装置之间不存在直连链路的情况下,不同通信装置之间交互的数据流需要经过其他通信装置的转发。其中,该其他通信装置可以称为转发设备,例如该转发设备可以包括路由器、交换机或虚拟机等。下面将结合图1a至图1c的实现示例,对本申请涉及的通信系统进行介绍。
参见图1a,为本申请实施例提供的通信系统的架构示意图。如图1a所示,该系统包括多个用户边缘(customer edge,CE)设备,例如用户边缘设备101和用户边缘设备102,以及可能存在的其他用户边缘设备;该系统还包括多个网络设备,例如网络设备103、网络设备104和网络设备105,以及可能存在的其他网络设备。
在图1a中,用户边缘设备可以作为通信系统传输的数据流的入口设备,在这种情况下,该用户边缘设备可以连接数据流的发送端,即数据流的源地址信息所指示的设备。或者,用户边缘设备可以作为通信系统传输的数据流的出口设备,在这种情况下,该用户边缘设备可以连接数据流的接收端,即数据流的目的地址信息所指示的设备。其中,数据流的发送端或数据流的接收端可以为终端设备、服务器、虚拟机等具备数据收发需求的设备。
可选的,在图1a中,网络设备103、网络设备104和网络设备105是路由器(router)、交换机(switch)、虚拟机等设备。这些网络设备可以用于将数据流从某一个用户边缘设备传输至另一个用户边缘设备。
可选地,用户边缘设备可以称为边缘交换设备。
以图1a的实现场景为例,随着组网的规模越加庞大,有可能需要通过多层网络的转发才可以实现不同通信装置之间数据流的交互。换言之,图1a中的不同网络设备之间可以组网成多层网络。下面将结合一些实现示例,对本申请提供的通信场景中的多层网络进行介绍。
一种实现示例如图1b所示,图中箭头的指向为数据流的流向,数据流依次从数据流的源地址信息指示的源设备流向N(N为大于或等于2的整数)个第二网络设备、P(P为大于或等于1的整数,图示中以P大于或等于2为例)个第三网络设备以及K(K为大于或等于2的整数)个第四网络设备之后,该数据流被目的地址信息指示的目的设备所接收。在图1b的数据流流向可知,N个第二网络设备为P个第三网络设备的上游设备,相应的,P个第三网络设备也可以称为N个第二网络设备的下游设备。类似地,P个第三网络设备为K个第四网络设备的上游设备,相应的,K个第四网络设备也可以称为P个第三网络设备的下游设备。
可选地,图1b所示场景示例中,还可以包括与各个第二网络设备连接的控制器,本申请提供的路径确定的方法可以由该控制器执行,也可以由任一第三网络设备执行。
此外,考虑到网络的复杂性,某一个数据流的源设备与该数据流的目的设备有可能均连接至同一个网络设备,或者,某一个数据流的源设备与另一数据流的目的设备有可能均连接至同一个网络设备,或者,某一个数据流的源设备与该数据流的目的设备有可能不是连接至同一个网络设备,或者,某一个数据流的源设备与另一数据流的目的设备有可能不是连接至同一个网络设备。换言之,N个第二网络设备有可能与K个第四网络设备完全不同,或者,N个第二网络设备有可能与K个第四网络设备部分相同或者完全相同。下面将结合更多的实现示例进一步介绍。
一种实现示例如图1c所示,N个第二网络设备与K个第四网络设备完全不同,即任一第四网络设备与任一第二网络设备均不相同。
另一种实现示例如图1d所示,N的取值与K的取值相等,且N个第二网络设备与K个第四网络设备完全相同。
可选地,N个第二网络设备、P个第三网络设备以及K个第四网络设备所组成的多层网络可以为包含有接入层、汇聚层和核心层的三层网络,该多层网络也可以为包含有脊(Spine)节点和叶(Leaf)节点的两层网络,或者是其他实现,本申请不做限定。
在图1e中,以上述多层网络为包含有Spine节点和Leaf节点的两层网络为例。在图1e中,以N个第二网络设备与K个第四网络设备完全相同为例进行说明。其中,Leaf节点分别可以连接至一个或多个服务器(Server),不同Server之间交互的数据流可以通过Leaf节点转发。其中,当两个Server连接至同一Leaf节点时,这两个Server之间的数据流通过该同一Leaf节点的转发即可实现交互;当两个Server连接至不同Leaf节点时,这两个Server之间的数据流需要通过各自连接的Leaf节点以及Spine节点的转发才可实现交互。
需要说明的是,为了便于描述,在后文实施例中主要介绍两层网络之间的通信过程进行描述。而在方案的实际应用中,后文实施例还可以应用于其它网络中的任意两个网络之间的通信过程。例如,对于包含有接入层、汇聚层和核心层的三层网络而言,可以在接入层和汇聚层之间的网络应用后文实施例,也可以在汇聚层和核心层之间的网络应用后文实施例,也可以在接入层和核心层之间的网络应用后文实施例。
在图1f中,此处仍以上述多层网络为包含有Spine节点和Leaf节点的两层网络为例,后文实施例的实现过程主要涉及网络设备(例如Spine节点和Leaf节点)中用于接收数据的接收模块、用于本地处理数据的处理模块和用于发送数据的发送模块中至少一个模块的改进。
以上述多层网络为数据中心网络为例。随着云计算、5G、大数据、物联网、AI等新技术的跨越式发展,以及无人驾驶汽车、5G智造工厂、智能风控、人脸识别等应用的成熟商用,对数据中心网络要求越来越高,要求数据中心提供无丢包、高吞吐和低时延的无损网络。随着数据中心网络包含的业务逐渐丰富,组网的规模也随之越加庞大。所以,数据中心网络多采用多根分层的网络拓扑。在不同层交换设备间存在等价的成员链路。在网际协议(internet protocol,IP)网络中,多条不同链路到达同一目 的地址的网络环境中。当多条路由的路由优先级和路由度量都相同时,这几条路由就称为等价路由则会形成等价多路径(Equal-Cost Multi-Path,ECMP)。在多路径分叉位置的设备,会基于特定的策略进行路径选择,将报文发往不同的路径实现流量负载分担。ECMP机制是将数据包的特征字段(例如:源媒体接入控制(media access control,MAC)地址、目的MAC地址、IP五元组信息等)作为哈希因子,通过哈希(HASH)算法生成哈希-键(HASH-KEY)值,然后根据HASH-KEY值在负载分担链路中选取一条成员链路对数据包进行转发。这时,对于具有不同特征字段的数据包,由于其HASH-KEY值可能不一样,因此可能会选取不同的成员链路进行转发;对于具有相同特征字段的数据包,由于其HASH-KEY值一样,因此会选取相同的成员链路进行转发。这样,既实现了不同数据流在不同成员链路上的负载分担转发,也保证了同一数据流中各数据包到达接收端的时序性。
换言之,在上述多层网络中,数据流的源地址对应的通信装置和数据流的目的地址对应的通信装置之间往往存在多条路径。作为转发设备的通信装置在转发数据流的时候,会基于本地策略在该多条路径中进行路径选择,并基于本地选择的路径转发数据流。下面将以前述图1b所示架构为例,结合图2a至图2g的实现示例,对基于本地选择确定路径的实现过程进行示例性描述。
如图2a所示,服务器A(Server-A)需要分别与服务器B(Server-B)和服务器C(Server-C)之间通信时,需要经过包含有交换设备A(Switch-A)和交换设备B(Switch-B),以及包含有交换设备1(Switch-1)、交换设备2(Switch-2)、交换设备3(Switch-3)和交换设备4(Switch-4)的多层网络。在图2a中涉及的数据流包括四个,即图2a中传输报文1,5,9的数据流1,传输报文5,6,10的数据流2,传输报文3,7,11的数据流3,传输报文4,8,12的数据流4。由数据流向可知,在图2a中,Switch-A为第二网络设备的实现示例,Switch-1,Switch-2,Switch-3和Switch-4为第三网络设备的实现示例,Switch-B为第四网络设备的实现示例。当Server-A与Server-B及Server-C经过多层网络通信时,网络中Switch-A与Switch-B之间存在4条相同的路径即为4条等价路径,ECMP通过hash计算,根据hash-key值为数据报文选在等价的成员链路中进行选择并转发,使得四个数据流在该四个ECMP上分别进行传输。
具体地,基于本地选择确定路径的实现过程可以基于负载分担技术实现。其中,基于负载分担技术按粒度划分常见的有逐包负载分担(Packet-based Load Balancing),逐流负载分担(Flow-based Load Balancing),以及逐子流负载分担(Flowlet-based Load Balancing)技术。下面将通过一些实现示例这三种实现过程进行描述。
实现方式一,逐包负载分担的实现过程。
具体地,逐包多路径负载分担的实现过程中,交换设备将第N个报文转发至路径i,将第N+1个报文转发至路径i+1,以此类推,在交换设备出端口进行轮询。其行为的数学描述为,路径选择为报文的编号对可选等价路径个数进行取模运算。目的是将所有报文平均分布在下一跳的等价成员链路上。
一种实现示例如图2b所示,Server-A向Server-B按时间顺序先后发出送四个报文即图中1-4个报文,此4个报文也以发送的先后顺序到达了Switch-A,Switch-A按逐包负载分担的策略将报文1和3转发至与Switch-1相连的路径上,并且Switch-A将报文2和4转发至与Switch-2相连的路径上。
由图2b所示示例可知,在实现方式一中,可以做到等价成员路径均匀分担网络负载。但是因不同的路径转发时延有差异,会导致报文到达接收端时产生乱序。如图2c所示,由于报文1和报文3所在的链路与报文2和报文4所在的链路的传输时延有可能不同,将会造成报文2和4早于报文1和3到达Server-B。其中,乱序现象会因重传影响网络性能。
实现方式二,逐流负载分担的实现过程。
具体地,依赖ECMP机制实现多个下一跳机制的选择,这种机制是将数据包的特征字段(比如源MAC地址、目的MAC地址、IP五元组信息等)作为哈希因子,通过HASH算法生成HASH-KEY值,然后根据HASH-KEY值在负载分担链路中选取一条成员链路对数据包进行转发。对于具有不同特征字段的数据包,由于其HASH-KEY值可能不一样,因此可能会选取不同的成员链路进行转发;对于具有相同特征字段的数据包,由于其HASH-KEY值一样,因此会选取相同的成员链路进行转发。这样,既实现了不同数据流Flow在不同成员链路上的负载分担转发,也保证了同一数据流Flow中各数据包到达接收端的时序性。
一种实现示例如图2d所示,Server-A向Server-D传送包含报文1,2,5,6的数据流1(记为Flow 1)。Server-B向Server-C传送包含报文3,4的数据流2(记为Flow 2)。Switch-A通过入端口报文的五元组信息进行hash计算,报文1,2,5,6得到相同的Key值,在Switch-A的下一跳走相同的路径。同理,报文3,4同走Switch-A的另一条下游路径。
由图2d所示示例可知,逐流负载分担的缺点是没有考虑负载分担链路中各成员链路的利用率,从而会出现成员链路之间的负载分担不均衡,尤其当大数据流出现时会加剧所选中成员链路的拥塞甚至引起丢包。
此外,逐流负载分担会形成两类冲突,造成网络负载不均衡,影响业务性能。下面将分别通过图2e和图2f进行介绍。
一类冲突可以称为本地冲突,原因是由于输入的不同特征字段通过hash算法计算的hash-key结果相同,从而导致不同的流被转发至相同的路径上,导致冲突发生。
示例性的,本地冲突的实现过程如图2e所示。Server-A、Server-B、Server-C、Server-D的IP地址分别为:0.0.0.1、0.0.0.2、0.0.0.3、0.0.0.4,Server-A向Server-C发送报文1,2,Server-B向Server-D发送报文3,4,Switch-A上的HASH函数为“源ip+目的ip”。为此,Switch-A在本地执行的路径计算方法如下:
1.Server-A发送到Server-C的报文:HASH函数计算结果为4,路径数量为2,HASH函数计算结果模路径数量的值为0,即选择第0条路径(路径编号从0开始)。
2.Server-B发送到Server-D的报文,HASH函数计算结果为6,路径数量为2,HASH函数计算结果模路径数量的值为0,即选择第0条路径(路径编号从0开始)。
从路径选择结果可知,从Server-A和Server-B传送的所有报文,均被转发到了Switch-A到Switch-1之间的路径上,Switch-A到Switch-2之间路径上没有报文,导致网络负载不均衡,业务性能受损。
另一类冲突可以称为全局冲突,原因是由于当前负载分担技术采用分布式决策机制,缺少全局视角,交换设备对上游流量的流向无法预测和控制,从而导致的流量冲突。
示例性的,全局冲突的实现过程如图2f所示。包含报文1和报文2的Flow-1是从Server-A到Server-C的一条流,包含报文3和报文4的Flow-2是从Server-B到Server-D的一条流。当Flow-1经过Switch-A时,选择到Switch-1之间的成员链路对流量进行转发。当Flow-2经过Switch-B时,也选择了到Switch-1之间的成员链路对flow-2进行转发。Flow-1和Flow-2的目的交换设备同在Switch-C本地,而Switch-1与Switch-C之间只有一条链路相连。所以,Flow-1和Flow-2将在Switch-1下游链路上发生冲突,且在当前的分布式决策的负载分担技术中,全局冲突问题是无法预见和控制避免的,此类冲突将严重影响网络性能。
实现方式三,逐子流负载分担的实现过程。
具体地,网络设备在转发数据包时会判断待转发数据包与其所属数据流中上一个数据包之间的时间间隔,若大于负载分担链路中各成员链路的最大链路传输时延(flowlet_gap_time),则认为待转发数据包是一个新子流(Flowlet)的首包;若小于负载分担链路中各成员链路的最大链路传输时延,则与上一个数据包作为同一个Flowlet。设备基于Flowlet选取当前负载分担链路中负载较轻的成员链路进行转发,同一Flowlet中的数据包选取相同的成员链路进行转发。
一种实现示例如图2g所示,Server-A,Server-B分别发送Flow-1和Flow-2经过网络到Server-C。Flow-1中的报文1,2,3由Switch-1和Switch-A之间的链路转发记为数据流1-1(Flowlet1-1),Flow-2的报文4,5因为是不相同的Flow,则由另一条等价成员链路,即Switch-2和Switch-A之间的链路进行转发记为数据流2-1(Flowlet2-1)。Flow-1中的报文6,7因为与之前的报文间隔大于flowlet_gap,所以报文6被检测为是新flowlet的首包,则由报文6,7组成的新flowlet与报文1,2,3所组成flowlet,虽然是相同的flow,但为不同的flowlet,需要通过不同的等价成员链路进行转发记为数据流1-2(Flowlet1-2)。所以,如图2g中所示,Flow-1中的Flowlet1-1即报文1,2,3通过Switch-A到Switch-1之间链路转发;Flow-1中的Flowlet1-2和Flow-2即报文4,5,6,7则通过Switch-A到Switch-2之间链路转发。
在实现方式三的逐子流负载分担的实现过程中,可以基于报文间距将一整条流分成多条子流的方式。并且,能够根据网络特征对不同子流选路:如根据链路利用率,出端口队列深度等。但是,该实现方式仍存在不足之处。一方面,主机侧无法主动构造flowlet;另一方面,如果强行主机侧采用发-停-发的机制强行构造flowlet,在网络无拥塞时影响吞吐,链路利用率较低;而网络有拥塞时,无法保证构造的flowlet的间隔大于交换机设置的flowlet gap,从而导致接收端乱序,触发重传。综上所述,基于Flowlet的多路径负载分担技术,同时具有基于包和基于流的负载分担技术的缺点,即负载不均衡和报文乱序的风险。同时,基于flowlet的负载分担技术需要考虑,权衡网络吞吐最大化和报文乱序最小化,精确设置flowlet_gap_time值。但是,在权衡网络吞吐和报文乱序过程中,flowlet_gap_time并不是一个静态的参数,需要通过网络性能指标的反馈,动态调整所述flowlet_gap_time值。这种动态flowlet_gap_time值的机制一方面会影响网络的吞吐率,无法达到全局最优值。
由上述实现方式一至实现方式三的实现过程可知,存在如下技术问题:
一方面,在多层网络中,往往存在多个作为转发设备的通信装置,而每一个作为转发设备的通信装置都需要基于各自的本地策略在多条路径中进行路径选择的过程,才可以确定转发路径,导致该路径确定方式的效率较低。另一方面,在上述三种实现方式中,由于各个作为转发设备的通信装置在本地做决策的方式,由于缺乏网络传输的多个数据流的全局规划,使得该路径确定方式容易造成其它的问题,例如在实现方式一容易造成的交换设备本地流量冲突,在实现方式二中容易造成全局冲突以及本地冲突,在实现方式三中容易影响网络的吞吐率。
为了解决上述问题,本申请提供了一种路径确定的方法及相关设备,用于提升数据流的转发效率。下面将结合附图进一步详细说明。
请参阅图3,为本申请提供的一种路径确定的方法的一个实现示意图,该方法包括如下步骤。需要说明的是,该方法可以由第一网络设备执行,或者,该方法也可以由第一网络设备中的部分组件(例如处理器、芯片或芯片系统等)执行,或者该方法还可以由能实现全部或部分第一网络设备功能的逻辑模块或软件实现。在下述实施例中,以该方法由第一网络设备执行为例进行描述,该第一网络设备可以为路由器、交换机、虚拟机等。
S301.第一网络设备获取第一拓扑信息。
本实施例中,第一网络设备在步骤S301中获取的第一拓扑信息包括N个第二网络设备和P个第三网络设备之间的连接关系,任一第二网络设备为任一第三网络设备的上游网络设备,N为大于或等于2的整数,P为大于或等于1的整数。
在一种可能的实现方式中,该第一网络设备为控制器或该P个第三网络设备中的一个网络设备。具体地,执行该方法确定并发送M个路径的第一网络设备可以为控制器,也可以为P个第三网络设备中的一个网络设备,以提升方案实现的灵活性。
作为一种实现示例,如前述图1b所示场景示例中,第一网络设备可以为图1b中的控制器,或者,第一网络设备可以为图1b中的任一第三网络设备。
示例性的,以前述图1e所示场景为例。数据流的源地址信息所指示的一个Server向数据流的目的地址信息所指示的另一个Server发送数据流的过程中,该一个Server所发送的数据流有可能分别需要经过该一个Server所连接的Leaf节点(为便于后文引用,记为源Leaf节点)、Spine节点以及该另一个Server(为便于后文引用,记为目的Leaf节点)所连接的Leaf节点的转发。其中,源Leaf节点可以为步骤S301中第一拓扑信息所指示的N个第二网络设备,Spine节点可以为步骤S301中第一拓扑信息所指示的P个第三网络设备,目的Leaf节点可以为后文提及的K个第四网络设备。
可选地,第一网络设备可以为图1e中与各个Leaf节点连接的任一Spine节点,或者,第一网络设备可以为图1e中与各个Leaf节点连接的控制器(图中未示出),以便于该第一网络设备在后续步骤S304中向各个Leaf节点发送M个路径。
应理解,在图1b所示示例,在不同的数据流中同一Leaf节点可能具备不同的角色。
例如,当数据流为双向数据流的情况下,某一个Leaf节点既是该数据流的一个流向的源Leaf节点,也是该数据流的另一个流向的目的Leaf节点。又如,当数据流为单向数据流的情况下,某一个Leaf节点 是该数据流的源Leaf节点,或者是该数据流的目的Leaf节点。又如,同一个Leaf节点可以分别用于传输不同的数据流,在某一数据流中该Leaf节点可能是源Leaf节点,而在另一数据流中该Leaf节点可能是源Leaf节点或目的Leaf节点。
S302.第一网络设备获取M条数据流的通信关系。
本实施例中,第一网络设备在步骤S302中获取的M条数据流中的每条数据流的通信关系包括源地址信息和目的地址信息,M为大于或等于2的整数,该M条数据流分别通过该N个第二网络设备向该P个第三网络设备传输。
需要说明的是,本申请对步骤S301和步骤S302的实现过程不做限定,即,第一网络设备可以先执行步骤S301后执行步骤S302,该第一网络设备也可以先执行步骤S302后执行步骤S301。
应理解,M条数据流中的任一数据流可以为单向数据流,也可以为双向数据流,本申请对此不做限定。其中,若M条数据流中存在某一数据流为双向数据流,则该M条数据流的通信关系中,该双向数据流的通信关系可以仅包含某一流向的源地址信息以及目的地址信息,该双向数据流的通信关系也可以包含两个流向分别对应的源地址信息以及目的地址信息,此处不做限定。
在一种可能的实现方式中,该第一网络设备获取M条数据流的通信关系包括:该第一网络设备分别接收来自该N个第二网络设备的该M条数据流的通信关系。具体地,第一网络设备可以基于分别接收来自该N个第二网络设备的该M条数据流的通信关系的方式,以获取该M条数据流的通信关系,以便于该第一网络设备和N个第二网络设备之间进行交互的方式获得该M条数据流的通信关系。
可选地,在步骤S302中,M条数据流的通信关系预配置于该第一网络设备,以避免不同网络设备之间的交互所造成的开销以及时延增加。
类似地,在前述步骤S301中,第一网络设备在步骤S301中获取的第一拓扑信息可以是通过N个第二网络设备所发送的各自的拓扑信息确定,或者,该第一拓扑信息预配置于该第一网络设备。
示例性的,下面将以第一网络设备通过N个第二网络设备所发送的各自的通信关系和拓扑信息确定步骤S301中的第一拓扑信息以及步骤S302中的M条数据流的通信关系为例,进行示例性说明。其中,每个第二网络设备需对本地大流建立流表,包括如下过程。
首先,当一条新流进入第二网络设备时,第二网络设备将此条流的五元组及特征信息(例如:首包时间,字节数)添加至原始流表。
其次,第二网络设备基于采集周期的字节数设置大小流门限值,并过滤去掉小流,保留大流。
再次,第二网络设备将原始流表转化成二元组流表,保留源IP和目的IP。目的是保留全局最优路径分配算法所需的关键字段,节省存储空间。
最后,第二网络设备将过滤后的二元组流表汇聚本地拓扑知识,组成本地流量信息表。该本地流量信息表的实现示例如表1至表3所示。
表1
表2

表3
可选地,对于前后两次获取的流表,第二网络设备可以对已经老化的表项进行删除动作。另外,还可以以文件的形式提供一个任务的全部通信节点,例如使用主机文件(hostfile)的形式,将任务的所有通信设备显式提供。配合显式通信算法名称,就可以获得当前任务的通信关系。
此后,第二网络设备获取本地流量信息后,将建成的流表汇聚到第一网络设备,使得第一网络设备得到步骤S302中的M条数据流的通信关系。
可选地,对上述表1至表3进行汇聚得到的表格的实现示例如下表4所示。
表4
可选地,上述实现过程可以表示为图4a中的实现过程,包括“步骤401.对本地大流建立流表并聚合通信关系”。应理解,图4a所示实现过程中的“全局计算节点”即为第一网络设备,“边缘交换节点”即为N个第二网络设备。
此外,第一网络设备周期性对本地流量通信关系和拓扑信息进行获取。并汇聚到第一网络设备,第一网络设备对汇聚的N个第二网络设备的本地流量信息表进行整合,并通过通信关系划分通信集群,以确定该M条数据流。
可选地,N个第二网络设备将本地的通信关系和拓扑信息汇聚到第一网络设备后,第一网络设备可以明确当前需要计算的M条数据流属于单个任务场景还是多任务场景。如果是多个任务的场景,需要对每个单独的任务进行资源分配工作,该实现过程可以表示为图4a中的实现过程中“步骤402.获取本地拓扑信息,并汇聚完整任务的拓扑信息”。
可选地,第一网络设备还可以执行图4a中,“步骤403判断是否覆盖全部节点”,并且,在确定覆盖全部节点的情况下,第一网络设备执行“步骤405.根据汇聚流表指导全部任务急全部step通信关系”;在确定未覆盖全部节点的情况下,第一网络设备执行图4a中“步骤404.根据通信关系按任务分资源”中的实现过程。
下面将对行图4a中步骤404对应的实现过程进一步描述,包括如下过程。
4041.输入通信关系。
4042.根据根据通信关系和拓扑信息建立Dtor到Stor的映射矩阵。
4043.Tor出现的次数降序进行遍历分配。
4044.按Dtor上行端口顺序分配链路资源。
4045.判断是否遍历全部Dtor,若是则执行步骤4046,若否则重新执行步骤4043。
4046.判断是否有其他通信关系待分配资源,若是则执行步骤4047,若否则重新执行步骤4042。
4047.资源分配结束。
由上述实现过程可知,第一网络设备根据拓扑信息和首包时间将相同任务的通信关系转化成通信关系矩阵。并且,第一网络设备将通信关系矩阵转换成Dtor到Stor的映射矩阵,并从出现次数最多的ToR开始分配资源。如图4b中所示场景中,可得通信关系矩阵如表5所示,Dtor-SID映射矩阵如下表6所示,Dtor-Stor映射矩阵如下表7所示。
表5
表6
表7

在上述表5至表7的实现示例中,ToR-1出现的次数最多,则从ToR-1开始分配资源。即,第一网络设备在分配资源时,将ToR的上行端口在Spine个数上轮询分配。如图4b所示,ToR-1的出端口在S-1和S-2上轮询,遍历全部ToR后结束资源分配过程。实施例分配结果如图4b中不同S节点(包括S1、S2)与T节点(包括T1、T2、T3和T4)之间的连接关系所示,为当前任务所分配到的链路资源。
在一种可能的实现方式中,第一网络设备在步骤S302中获取的M条数据流对应于多个人工智能(artificial intelligence,AI)集合通信任务中的一个任务。可选地,该M条数据流对应于长稳态流量任务,该长稳态流量任务中的数据流的流量大小在一定的时长内大于预设阈值。
示例性地,本申请提供的方案可以适用于,通信关系周期性变化的,或者是收敛时间要求严格的大流场景下,主要细分为长稳态流场景和AI集合通信场景。
可选地,长稳态流量场景的通信关系特点是:通信关系一般不发生改变,也可视为变化周期无穷大。此场景下,流表随任务开始和结束进行更新和老化。流程上只需要在新任务开始后,边缘交换设备获取本地流量通信关系,并在第一网络设备上聚合成流量信息表,即可作为全局最优路径分配算法的输入进行计算。但如果出现相同目的IP,而源IP不同。说明此场景存在多打一(Incast)流量。长稳态流量的Incast场景不属于本发明要解决问题。如果遇到此类通信关系,则不需要通过全局最优路径分配算法计算。
可选地,AI集合通信场景的通信关系特点是:AI集合通信场景的设备间通信根据算法通常将整个过程分成多个子阶段(phase)。在各个phase中,设备的通信关系都发生变化,且在每个phase开始的时候,通信集群内所有设备需要做同步。这对算法的收敛速度要求很高。下面将对AI集合通信场景主要使用三类算法进行示例性描述,该三类算法包括Halving-Doubling(H-D)算法,Ring算法,Tree算法。
一种可能的实现方式中,H-D算法分为两个阶段:第一个阶段为Halving做分散规约(reduce-scatter)操作,如图5a所示。共执行(log2N)个子阶段(phase),其中N为通信节点个数。每前进一个phase,节点间交互的数据量减半。
具体地,由于AI集合通信场景的通信关系特点是:AI集合通信场景的设备间通信根据算法通常将整个过程分成多个子阶段(phase)。在不同phase中,设备的通信关系都发生变化,通信的步长为,其中n为当前的phase,N为通信节点个数。且在每个phase开始的时候,通信集群内所有设备需要做同步。这对算法的收敛速度要求很高。
第二个阶段为Doubling做全局收集(all-gather)操作,如图5b所示。与Halving相同,共执行log2N个phase,所以在整个H-D过程中共需要进行2*log2N个phase。与Halving不同,每前进一个phase,节点间交互的数据量增倍。
综上,在H-D算法工作的AI集合通信的设备间通信的全局规约(all-reduce)阶段,一共需要2*log2N个phase。其中,N为通信节点个数。在全局最优路径分配算法对网络内流量最优路径计算前,需要对整个all-reduce阶段的全部phase内的流量通信关系进行还原,在全局最优路径分配算法计算阶段也将全部phase内的最优路径进行计算,得到最优结果。
如图5c所示,前述表4可以表示为图5c中的通信关系。可见,从一个任务的通信关系恢复出H-D算法支撑的all-reduce过程的全部phase通信关系。对于所有的通信设备,对相同的server本地的所有流进行排序,排序后的相同编号的位置,即为相同的phase。这种排序的优势在于可避免进行整网的时钟同步。
一种可能的实现方式中,Ring算法与H-D算法不同,主要在于其整个通信过程的通信关系不发生改变。
示例性的,如图6所示,为整个all-reduce阶段通信关系特征。其中,Ring算法同H-D算法一样分成两部分进行,第一步为规约分散(scatter-reduce),第二步为all-gather。在scatter-reduce部分,通信节点将交换数据,使每个节点可得到最终结果一块数据。在之后的all-gather部分,节点将交换这些最终的块,以便全部节点得到完成的数据。在scatter-reduce和all-gather过程中,N个通信节点中 的每一个节点都将分别接受和发送N-1次数据。每次都会发送K/N的数据量,其中K表示的是完整数据中不同节点上相加的值的总和。因此,传输到每个节点和从每个节点发送出的数据总量表示为:
DataTransferred=2(N-1)K/N;
可见,在这个all-reduce过程中,N个通信节点共进行2(N-1)个phase,但在所有的phase中,节点的通信关系没有发生任何变化。所以,此类算法的通信关系处理方式同长稳态流量相同,只需计算出一个phase的最优路径即为全部phase全局最优路径计算结果。
一种可能的实现方式中,Tree算法与Ring算法不同之处在于,Tree算法在每个phase阶段通信关系会发生变化。与H-D算法不同之处在于,Tree算法在相同的phase内,发送和接收数据的通信对象是不相同的。
示例性的,如图7所示,为整个all-reduce阶段通信关系特征。其中,在Tree算法中,需要对两棵通信关系树进行操作。两棵树的特点为,第一棵树的叶节点为第二棵树的脊节点,第一棵树的脊节点为第二棵树的叶节点。另外,相同脊节点下的叶节点不可同时对脊节点发送或接收数据。在整个设备间通信过程中,Tree算法也分为两个步骤:第一步为归约(reduce),第二步为广播(broadcast)。在reduce阶段,第i时刻,其通信关系为图中虚线链路连接的设备可进行通信,在第i+1时刻,其通信关系为图中实线链路连接的设备可进行通信。在broadcast阶段,通信关系与reduce阶段的通信关系特征相同。
综上所述,三种算法使节点的通信关系发生了改变,包括长稳流量的通信关系特征均不相同。长稳流量与Ring算法的通信特征相似,在整个设备间通信的过程中不会发生变化。H-D算法的节点通信关系是周期性发生改变,且Halving过程与Doubling过程为互逆。在相同phase内,通信关系是发送和接收对象相同。Tree算法的节点间通信关系类似H-D算法,周期性会发生改变。但与H-D算法不同的是,在相同phase内,Tree算法下的同一节点发送和接收数据的对象不相同。通过以上特征可以对流量进行识别,并完成全部phase的通信关系。另外,还可以通过文件制定当前通信的算法:如在文件中显式提供Ring,H-D,Tree或Static通信的算法标志。
可选地,本申请提供的方案可以适用于网络带宽整体充足的场景。所述网络整体带宽充足表示为,假设网络内有n条带宽为d的流,要分布在m条容量为c的等价链路中,如果nd﹥mc,表示网络总体带宽短缺。单靠负载均衡机制无法解决拥塞问题,需要协同调度算法或拥塞控制算法共同解决网络拥塞问题。本发明适用于nd≤mc的场景下。例如:上述提及的以大流为主,通信关系周期变化,算法收敛时间有严格要求的AI集合通信场景;通信关系相对固定的长稳态流量场景等。
S303.第一网络设备根据该M条数据流的通信关系和该第一拓扑信息确定M个路径。
本实施例中,第一网络设备在步骤S301中获取第一拓扑信息并且在步骤S302中获取M条数据流的通信关系之后,在步骤S303中,该第一网络设备根据该M条数据流的通信关系和该第一拓扑信息确定的M个路径分别与M条数据流对应,该M个路径指示通过该N个第二网络设备向该P个第三网络设备传输该M条数据流的路径。
在一种可能的实现方式中,该M条数据流通过该P个第三网络设备向K个第四网络设备传输,K为正整数;该第一网络设备在步骤S403中根据该M条数据流的通信关系和该第一拓扑信息确定M个路径的过程具体包括:该第一网络设备根据该M条数据流的通信关系和该第一拓扑信息确定第一映射关系,该第一映射关系用于指示该M条数据流中每条数据流的源地址信息对应的第二网络设备与该M条数据流中每条数据流的目标地址信息对应的第四网络设备之间的映射关系;该第一网络设备根据该第一映射关系确定该M个路径。
可选地,该拓扑信息还包括P个第三网络设备和K个第四网络设备之间的连接关系。
可选地,任一第四网络设备与任一第二网络设备均不相同。
可选地,N个第二网络设备中的至少一个网络设备与K个第四网络设备中的至少一个网络设备为相同的网络设备。
可选地,N与K相等,且N个第二网络设备与K个第四网络设备为相同的网络设备。
具体地,提供了第一网络设备确定M个路径的一种实现方式,以便于第一网络设备基于该N个第二网络设备与K个第四网络设备之间的映射关系确定该M个路径。
在一种可能的实现方式中,该第一网络设备根据该第一映射关系确定该M个路径包括:该第一网络设备根据该第一映射关系确定第一排序信息,该第一排序信息用于指示该K个第四网络设备对应的第二网络设备的数量的排序;该第一网络设备根据该第一排序信息依次对该N个第二网络设备的出端口进行遍历,得到第二映射关系;其中,该第二映射关系用于指示该N个第二网络设备的出端口与该K个第四网络设备之间的映射关系;该第一网络设备基于该第二映射关系确定该M个路径。具体地,提供了第一网络设备确定M个路径的一种实现方式,以便于第一网络设备基于依次确定的第一映射关系、第一排序信息以及第二映射关系确定该M个路径。
在一种可能的实现方式中,该第一网络设备根据该第一排序信息依次对该N个第二网络设备的出端口进行遍历,得到第二映射关系包括:该第一网络设备根据该第一排序信息依次对该N个第二网络设备的出端口进行遍历,得到第三映射关系;其中,该第三映射关系用于指示每个该第四网络设备对应的该第二网络设备的出端口的可选数量;该第一网络设备基于该第三映射关系确定该第二映射关系。具体地,第三映射关系用于指示每个该第四网络设备对应的该第二网络设备的出端口的可选数量,其中,该可选数量的取值越大则对应的第四网络设备的可选路径的不确定性就越小,反之,该可选数量的取值越小则对应的第四网络设备的可选路径的不确定性就越大。为此,基于第三映射关系所确定的第二映射关系能够优先为可选路径的不确定性较小的第四网络设备对应的该第二网络设备的出端口进行遍历,以提升方案的准确性,避免后续基于该第二映射关系所确定的M个路径产生冲突。
S304.该第一网络设备分别向该N个第二网络设备发送该M个路径。
本实施例中,第一网络设备在步骤S303中确定M个路径之后,该第一网络设备在步骤S304中分别向该N个第二网络设备发送该M个路径。
在一种可能的实现方式中,该M条数据流通过该P个第三网络设备向K个第四网络设备传输,K为大于或等于1的整数;第一网络设备在步骤S302所获取的通信关系对应的M条数据流包括第一数据流和第二数据流,该第一数据流的源地址信息与该第二数据流的源地址信息对应于不同的第二网络设备,该第一数据流的目的地址信息与该第二数据流的目的地址信息对应于同一第四网络设备。此外,第一网络设备在步骤S303所确定的M个路径包括第一路径和第二路径,该第一路径与该第一数据流对应,该第二路径与该第二数据流对应,该第一路径与该第二路径对应于不同的第三网络设备。
具体地,第一网络设备确定的M个路径包括与第一数据流对应的第一路径以及与第二数据流对应的第二路径,并且,该第一路径与该第二路径对应于不同的第三网络设备,其中,该第一数据流的源地址信息与该第二数据流的源地址信息对应于不同的第二网络设备,该第一数据流的目的地址信息与该第二数据流的目的地址信息对应于同一第四网络设备。换言之,由于任一第四网络设备为任一第三网络设备的下游网络设备,第一数据流和第二数据流通过不同的第二网络设备分别向不同的第三网络设备发送之后,该不同的第三网络设备分别向同一第四网络设备发送该第一数据流和第二数据流。从而,可以避免来自于不同第二网络设备的数据流通过同一第三网络设备传输之后,再通过同一第三网络设备向同一第四网络设备传输的过程中产生的网络拥塞,以提升该第一数据流和该第二数据流的传输效率。
在一种可能的实现方式中,第一网络设备在步骤S303所确定的M个路径还指示该M条数据流在该N个第二网络设备上的出端口。具体地,第一网络设备确定的M个路径除了指示通过该N个第二网络设备向该P个第三网络设备传输该M条数据流的路径之外,该M个路径还指示该M条数据流在该N个第二网络设备上的出端口,以便于该N个第二网络设备接收M个路径之后,能够明确发送该M条数据流的出端口。
在一种可能的实现方式中,图3所示实现方法还包括:该第一网络设备获取第二拓扑信息,该第二拓扑信息包括A个第二网络设备和该P个第三网络设备之间的连接关系,该A个第二网络设备中的至少一个第二网络设备与该N个第二网络设备中的至少一个第二网络设备相同,该A为大于或等于1的整数;该第一网络设备获取B条数据流的通信关系,该B条数据流中的每条数据流的通信关系包括源地址信息和目的地址信息,该B为大于或等于1的整数,该B条数据流分别通过该A个第二网络设备向该P个第三网络设备传输;在该第一网络设备根据该M条数据流的通信关系和该拓扑信息确定M个路径之后,该方法还包括:该第一网络设备根据该B条数据流的通信关系和该拓扑信息确定B个路径,该B个路径分别与B条数据流对应,该B个路径指示通过该A个第二网络设备向该P个第三网络设备发送该M条数据流的路径;其中,该B 个路径对应的第二网络设备的出端口不同于该M个路径对应的第二网络设备的出端口;该第一网络设备分别向该A个第二网络设备发送该B个路径。具体地,第一网络设备确定的B个路径对应的第二网络设备的出端口不同于该M个路径对应的第二网络设备的出端口,以避免M条数据流和B条数据流对应于同一第二网络设备的出端口时产生的流量冲突,提升M条数据流和B条数据流的数据传输效率。
在一种可能的实现方式中,第一网络设备在步骤S302所获取的通信关系对应的M条数据流包括第三数据流和第四数据流,该第三数据流的源地址信息和该第四数据流的源地址信息对应于同一第二网络设备。此外,第一网络设备在步骤S303所确定的M个路径包括第三路径和第四路径,该第三路径和该第三数据流对应,该第四路径和该第四数据流对应,该第三路径和该第四路径不同。具体地,第一网络设备确定的M个路径包括与第三数据流对应的第三路径以及与第四数据流对应的第四路径,并且,该第三路径和该第四路径不同。其中,该第三数据流的源地址信息和该第四数据流的源地址信息对应于同一第二网络设备。换言之,第三数据流和第四数据流通过同一第二网络设备分别通过不同的路径进行传输。从而,可以避免来自于同一第二网络设备的数据流通过相同路径传输的过程中产生的网络拥塞,以提升该第三数据流和该第四数据流的传输效率。
基于上述技术方案,第一网络设备在步骤S301中获取包含有N个第二网络设备和P个第三网络设备之间的连接关系的第一拓扑信息,并且该第一网络设备在步骤S302中获取M条数据流的通信关系之后,该第一网络设备在步骤S303中根据该M条数据流的通信关系和该第一拓扑信息确定并在步骤S304中向N个第二网络设备发送M个路径。此后,N个第二网络设备可以基于该M个路径分别向P个第三网络设备发送该M条数据流。换言之,第一网络设备作为确定路径的设备,该第一网络设备能够确定在N个第二网络设备和P个第三网络设备之间传输的M条数据流对应的M个路径。从而,相比于N个第二网络设备仅基于本地数据流作为路径确定依据而容易导致路径冲突的实现方式,在上述方法中,第一网络设备能够基于全局信息实现路径的确定,以避免路径冲突,提升数据流的转发效率。
下面将以源地址信息对应的网络设备(即第二网络设备)为源架顶式交换机(source top of rack,Stor),且目的地址信息对应的网络设备(即第四网络设备)为目的架顶式交换机(destination top of rack,Dtor)为例,对上述步骤S303的实现过程进行示例性说明。
在一种可能的实现方式中,上述步骤S303中确定的M个路径可以用于解决最优路径分配问题,其中,最优路径分配问题是一个数学上的精准覆盖问题,属于NP完全问题。描述为全集X是所有IP对DIP所在的Dtor(Dtor)对应的Stor(Stor)的上行端口,而子集S是每个Dtor对应的Stor的上行端口。问题是要求S是X的精确覆盖,即所有Dtor对应的Stor上行端口不能重复分配。从而达成为网络内每条流分配一条最优路径的目的。整个最优路径分配计算过程分为如下一个步骤。其中,在步骤S303中用于解决该问题的算法可以称为流量矩阵算法(flow matrix algorithm,FMA)。
示例性的,第一网络设备在步骤S303的实现过程可以表示为图4a中虚线框4中的实现过程(即“步骤407.按任务全局最优路径计算”),包括如下实现过程。
步骤4071.第一网络设备获取输入的通信关系[SIP,DIP]。
步骤4072.第一网络设备根据通信关系和拓扑信息建立Dtor到Stor映射矩阵。
步骤4073.第一网络设备在Dtor维度遍历流量矩阵,负作用值(negative effect value,NEV)指导矩阵列迭代顺序。
步骤4074.第一网络设备在横向自由度值(horizontal degree of freedom,HDOF)值和纵向自由度(vertical degree of freedom,VDOF)值指导迭代内矩阵行计算顺序。
步骤4075.判断是否完成遍历Dtor的for循环,若是则执行步骤4076,若否则重复执行步骤4073。
步骤4076.输出最优分配结果。
可选地,如图4a所示,“步骤406.通信关系识别”得到的识别结果也可以作为“步骤407.按任务全局最优路径计算”的输入之一,以便于明确当前输入的通信关系对应的数据流用于执行4061中的AI集合通信任务,还是用于执行4062中的稳态流通信任务。
下面将通过一些实现示例介绍步骤1和步骤2。
在步骤4071中,第一网络设备汇聚任务通信集群的流量信息的一个实现如表8所示。
表8
如表8所示。第一网络设备汇聚的完整流表中包括:流编号,流表生成交换机信息,通信关系对,即[SIP,DIP],首包时间和字节数几个重要项。
可选地,作为算法的输入,需要对此流表进行初始化处理。首先,需要在原始流表中继续筛选出计算过程中需要的信息,表中与计算相关的信息有流表生成的交换机以及通信IP对[SIP,DIP]。其次,将原始表中的通信单向流表转换成双向流表,如表9和表10所示。
表9
表10
此外,因为全局最优路径分配算法是在Dtor维度上迭代计算的。所以,在步骤4072中,将原始双向流表转换成Dtor到Stor映射的矩阵,如表11所示。
表11
在步骤4071和步骤4072中,首先将原始通信关系表转化成Dtor到SIP的映射矩阵,而后SIP所在的Stor是在原始表中的流表生成交换设备项中找到的。所以,最终转换成Dtor到Stor的映射关系矩阵。全局最优路径分配算法的核心就是对Dtor-Stor映射关系矩阵进行计算处理。
可以理解的是,表11所示Dtor-Stor映射矩阵即为前述步骤S303中的第一映射关系的一个实现示例。
下面将通过一些实现示例介绍步骤4073、步骤4074和步骤4075。
在步骤4073中,需要初始化Dtor-Stor映射矩阵和ToR可用端口矩阵。具体地,第一网络设备将汇聚的原始流表转化成Dtor到Stor映射矩阵,如表11所示。Dtor-Stor映射矩阵中的列表示为Dtor(Dtor),行表示为到Dtor的流在Stor出端口号。矩阵中的条目表示到所在列对应的Dtor(Dtor)的流量所对应的Stor(Stor)。另外,在Dtor到Stor映射矩阵中的条目‘-1’表示为只流经本地ToR的流量。此外,在初始化阶段需要进行两步操作:第一步操作是将原始流表转化成Dtor到Stor映射的关系矩阵。第二步操作是生成ToR的可用出端口矩阵,如表12所示。
表12

在表12中,矩阵的列定义为ToR的编号。行定义为流经过每个ToR时可作为备选的出端口编号。在整个最优路径计算的阶段,即是对Dtor到Stor映射矩阵,以及ToR可用出端口矩阵的计算操作。
上述提到基于流的负载分担会有全局冲突的风险,而全局冲突形成的条件是:不同的流经过不同的Stor,流向相同的ToR内的节点,且选择相同的Spine作为中继。所以,在处理Dtor到Stor映射矩阵的时候,如果满足公式1的要求,则可保证避免所述全局冲突问题。
在公式1中,x表示Dtor到Stor映射关系矩阵,表示的是行角标,j表示矩阵的列角标,而k表示的是cell(i,j)对应的矩阵的元素,保证等式成立的意味对应的元素在矩阵的行和列上是互斥的,也称元素在所述行,列空间上具备唯一性。对应网络中的流量没有重合的链路,即不会发生本地冲突和全局冲突问题,使得每条流的路径均为最优分配。
在步骤4073中,第一网络设备对算法输入数据结构初始化后,根据FMA算法对流量矩阵进行计算。FMA算法在流量矩阵的Dtor维度进行迭代计算,即Dtor到Stor映射矩阵的各列进行遍历。算法通过各列负作用值,即NEV指标值的大小选择遍历顺序。其中,NEV定义为出现的条目的总个数减1,如表13所示。
表13
在表13中,当选择一条流在Stor出端口进行分配计算时,在确保流量无冲突的约束条件下,具有相同Dtor的其他流量会受第一条流计算结果的影响,被动地在各自Stor其他编号的出端口上进行分配计算。这种由于一条流的分配计算对其他流的计算结果的影响大小值即为负作用值。根据NEV指标定义,在Dtor遍历的每轮迭代开始前对待计算的流量矩阵的列进行NEV计算。
示例性的,表13中给出了第一轮迭代前的计算,除Dtor-4和Dtor-5外,全部Dtor列的元素均为4个不重复的值。根据NEV定义,其负作用的值均为4-1,即为3。在NEV值相同的情况,当前迭代周期的列的选择根据自然Dtor的顺序选择最小值进入当前迭代周期。当NEV值不相同的情况:例如,第一迭代周期选择Dtor-0进行计算。元素为1,2,3,7。在第二个迭代周期计算之前,Dtor-1的NEV值为7-1,即为6。而Dtor-3的NEV值为5-1,为4。小于Dtor-1的值,也是全部待遍历Dtor列中最小的。则根据遍历顺序约束条件。Dtor-3将以第二顺位进入迭代计算。这种方式,可以保证算法的确定性。从表13中可见Dtor-4和Dtor-5在当前迭代周期开始时,是不进行NEV计算的。原因是所述两个Dtor有本地流量,这将导致流出本地交换设备的流量少与出端口的数量。以表13为例:Dtor-4和Dtor-5分别只有2条流需要在各自的4个出端口上分配计算。分配方式总数为,共为12种分配方法。这将严重影响算法的确定性,和后续迭代 计算的正确性,甚至导致后续迭代无法计算出最优解。所以,需要将有本地流量的Dtor遍历顺序后置。FMA算法通过所述方法,对全部Dtor进行遍历迭代。
在步骤4074和步骤4075中,第一网络设备通过自由度值指导ToR端口分配。具体地,通过NEV选定当前迭代的Dtor后,在此迭代内需为同Dtor的不同流量,计算分配其在Stor上的出端口号。如表14中所示。
表14
在表14中,基于表13所述的方法选择第一轮迭代的Dtor为Dtor-0,其元素包括来自1,2,3,7Stor的流量。在当前迭代周期内,根据纵向自由度值,即VDOF和HDOF对流量在Stor的出端口上进行分配计算。
示例性的,VDOF和HDOF指标指导流量矩阵行分配计算顺序的实现过程如下所示。其中,HDOF定义为本轮迭代计算的子矩阵中,编号相同出端口可用数量总数,算法规定按HDOF值从小到大顺序进行分配计算,如HDOF值相同,则按自然顺序有小到大进行分配计算。如表14所示,在初始化阶段,全部ToR的出端口都是待分配计算的状态,所以在由4个ToR组成的第一轮迭代的子矩阵中,编号相同的出端口都是可用的,所以全部HDOF值为4。在遍历Dtor的迭代过程中,每轮迭代组成的子流量矩阵具有很强随机性。根据HDOF值的约束条件进行ToR出端口分配计算,可保证在每个迭代周期内能计算得到最优解。另一个指标为VDOF,定义为本轮迭代计算后,对应TOR可选出端口数量。算法规定按VDOF值从小到大顺序进行分配计算,如VDOF值相同,则按自然顺序有小到大进行分配计算。如表14所示,在当前迭代周期内,Dtor为0的流量,需在所在Stor的出端口上各选择1个端口。而当前每个Stor可选的端口数为初始化的4个,所以在当前周期分配计算结束后,当前子流量矩阵中,每个Stor可选的端口数均为4-1,为3个。所以当前迭代周期应按ToR自然顺序进行端口分配计算。表14中带“x”符号的块代表着在本轮迭代中,为Dtor-0的几条流,在Stor上出端口的计算分配结果。
可以理解的是,表12至表14任一表格中的实现过程为前述步骤S303中的第三映射关系的一个实现示例。
由上述步骤4073至步骤4075的实现过程可知,根据FMA算法计算规则,全局最优路径分配算法根据3个指标值,在4个维度上完成整个计算过程。因为除了处理三个指标本身维度计算的同时,都考虑到了时间维度上,历史计算结果,保证本迭代最大确定性分配,以及保证后续迭代过程可持续计算出最优解,直到迭代结束。通过上述迭代规则,遍历全部Dtor,可计算出全局最优路径分配结果,如表15所示。
表15
可以理解的是,表15所示全局最优路径分配结果即为前述步骤S303中的第二映射关系的一个实现示例。
可以理解的是,全局最优路径分配算法如上所述,需要根据3个指标值在各自维度上对流量矩阵进行遍历计算,所以其时间复杂度为O(n3),其中n为交换机端口数。计算结果可根据流量无交叉判定规则对流量矩阵的行和列遍历检查,所以验证结果的时间复杂度为O(n2),其中n为交换机端口数。
此后,第一网络设备将计算得到的最优路径分配结果记录到原始流表中,如表16所示。
表16

由上述表16可知,经过流量路径编排计算后,可以得到路径规划的关键输出Stor下一跳。通过输出的计算结果矩阵可转化成与Stor出端口相关联的路径规划表。并将结果同步给网络的边缘交换设备,指导流量在边缘交换节点下一跳对应的出端口选择,从而完成全局路径分配流程。
综上所述,本申请实施例的核心创新点在于,在包括N个第二网络设备和P个第三网络设备组成的多层网络系统中。第一网络设备在获取通信关系以及拓扑信息之后,第一网络设备根据汇聚到的通信关系信息及网络拓扑信息,通过全局最优路径分配算法进行最优化计算。第一网络设备将计算得到最优化结果发送给N个第二网络设备。最后,N个第二网络设备根据接收到的最优化计算结果作为本地流量路径分配指导进行选路,实现负载分担,控制网络拥塞。从而,在N个第二网络设备执行业务报文转发时,是通过全局最优路径分配算法,为所有流量计算出最优的路径,从而一步收敛,而不需要在局部单台设备上进行hash函数计算选路,解决了逐流多路径负载分担方法中的本地冲突问题以及由于局部决策机制导致的全局冲突问题;通信关系固定的业务报文转发路径一致,解决了逐包多路径负载分担方法中报文乱序的问题。
可选地,N个第二网络设备获取进入网络的流量通信关系信息,包括:源IP,目的IP,首包时间。N个第二网络设备获取本地拓扑信息。且将获取的本地流量通信关系信息和拓扑信息汇聚到第一网络设备。
可选地,第一网络设备将汇聚的边缘交换节点的通信关系信息以及拓扑信息,组合成网络流量信息表。所述网络信息表中记录有至少一个边缘节点拓扑信息和通信关系。并将所述网络信息表作为全局最优路径分配算法输入,所述网络信息表的列为流量目的边缘交换节点到所有到此目的边缘交换节点的源交换节点的映射。网络信息表的行表示,待分配的源交换节点的出端口号。
可选地,第一网络设备通过全局最优路径分配算法计算输出得到网络流量路径分配表。所述网络流量路径分配表中有N个第二网络设备内流量的网络路径分配信息。网络路径分配表的列为流量目的边缘交换节点到所有到此目的边缘交换节点的源交换节点的映射,网络路径分配表的行表示为到所述目的边缘节点的流量在源边缘节点分配到的出端口。
请参阅图8,本申请实施例提供了一种通信装置,该通信装置800可以实现上述方法实施例中通信装置(即第一网络设备或第二网络设备)的功能,因此也能实现上述方法实施例所具备的有益效果。
当该通信装置800用于实现前述第一网络设备的功能时,该通信装置所包含收发单元801和处理单元802;该收发单元801用于获取第一拓扑信息,该第一拓扑信息包括N个第二网络设备和P个第三网络设备之间的连接关系,任一第二网络设备为任一第三网络设备的上游网络设备,N为大于或等于2的整数,P为大于或等于1的整数;该收发单元801还用于获取M条数据流的通信关系,该M条数据流中的每条数据流的通信关系包括源地址信息和目的地址信息,M为大于或等于2的整数,该M条数据流分别通过该N个第二网络设备向该P个第三网络设备传输;该处理单元802用于根据该M条数据流的通信关系和该第一拓扑信息确定M个路径,该M个路径分别与M条数据流对应,该M个路径指示通过该N个第二网络设备向该P个第三网络设备传输该M条数据流的路径;该收发单元801还用于分别向该N个第二网络设备发送该M个路径。
在一种可能的实现方式中,该M条数据流通过该P个第三网络设备向K个第四网络设备传输,K为大于或等于1的整数;该M条数据流包括第一数据流和第二数据流,该第一数据流的源地址信息与该第二数据流的源地址信息对应于不同的第二网络设备,该第一数据流的目的地址信息与该第二数据流的目的地址信息对应于同一第四网络设备,该M个路径包括第一路径和第二路径,该第一路径与该第一数据流对应,该第二路径与该第二数据流对应,该第一路径与该第二路径对应于不同的第三网络设备。
在一种可能的实现方式中,该M个路径还指示该M条数据流在该N个第二网络设备上的出端口。
在一种可能的实现方式中,该M条数据流包括第三数据流和第四数据流,该第三数据流的源地址信息和该第四数据流的源地址信息对应于同一第二网络设备,该M个路径包括第三路径和第四路径,该第三路径和该第三数据流对应,该第四路径和该第四数据流对应,该第三路径和该第四路径不同。
在一种可能的实现方式中,该M条数据流通过该P个第三网络设备向K个第四网络设备传输,K为正整数;该处理单元802具体用于:根据该M条数据流的通信关系和该第一拓扑信息确定第一映射关系,该第一映射关系用于指示该M条数据流中每条数据流的源地址信息对应的第二网络设备与该M条数据流中每条数据流的目标地址信息对应的第四网络设备之间的映射关系;根据该第一映射关系确定该M个路径。
在一种可能的实现方式中,该处理单元802具体用于:根据该第一映射关系确定第一排序信息,该第一排序信息用于指示该K个第四网络设备对应的第二网络设备的数量的排序;根据该第一排序信息依次对该N个第二网络设备的出端口进行遍历,得到第二映射关系;其中,该第二映射关系用于指示该N个第二网络设备的出端口与该K个第四网络设备之间的映射关系;基于该第二映射关系确定该M个路径。
在一种可能的实现方式中,该处理单元802具体用于:根据该第一排序信息依次对该N个第二网络设备的出端口进行遍历,得到第三映射关系;其中,该第三映射关系用于指示每个该第四网络设备对应的该第二网络设备的出端口的可选数量;基于该第三映射关系确定该第二映射关系。
在一种可能的实现方式中,该收发单元801还用于获取第二拓扑信息,该第二拓扑信息包括A个第二网络设备和该P个第三网络设备之间的连接关系,该A个第二网络设备中的至少一个第二网络设备与该N个第二网络设备中的至少一个第二网络设备相同,该A为大于或等于1的整数;该收发单元801还用于获取B条数据流的通信关系,该B条数据流中的每条数据流的通信关系包括源地址信息和目的地址信息,该B为大于或等于1的整数,该B条数据流分别通过该A个第二网络设备向该P个第三网络设备传输;该处理单元802还用于根据该B条数据流的通信关系和该拓扑信息确定B个路径,该B个路径分别与B条数据流对应,该B个路径指示通过该A个第二网络设备向该P个第三网络设备发送该M条数据流的路径;其中,该B个路径对应的第二网络设备的出端口不同于该M个路径对应的第二网络设备的出端口;该收发单元801还用于分别向该A个第二网络设备发送该B个路径。
在一种可能的实现方式中,该收发单元801具体用于分别接收来自该N个第二网络设备的该M条数据流的通信关系。
在一种可能的实现方式中,该M条数据流对应于多个人工智能AI集合通信任务中的一个任务。
在一种可能的实现方式中,该第一网络设备为控制器或该P个第三网络设备中的一个网络设备。
当该通信装置800用于实现前述第二网络设备的功能时,该装置包括收发单元801和处理单元802;该处理单元802用于确定Q条数据流的通信关系,该Q条数据流中的每条数据流的通信关系包括源地址信息和目的地址信息,Q为大于或等于1的整数;该收发单元801用于向第一网络设备发送Q条数据流的通信关系;该收发单元801还用于接收来自第一网络设备的Q个路径,该Q个路径指示该第二网络设备传输该Q条数据流时使用的路径;该收发单元801还用于基于该Q个路径传输该Q条数据流。
在一种可能的实现方式中,该路径信息还指示该Q条数据流在该第二网络设备上的出端口。
在一种可能的实现方式中,该Q条数据流包括第三数据流和第四数据流,该第三数据流的源地址信息和该第四数据流的源地址信息对应于该第二网络设备,该路径信息包括第三路径和第四路径,该第三路径和该第三数据流对应,该第四路径和该第四数据流对应,该第三路径和该第四路径不同。
在一种可能的实现方式中,该Q条数据流对应于多个人工智能AI集合通信任务中的一个任务。
需要说明的是,上述通信装置800的各单元的信息执行过程等内容,具体可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例还提供了一种通信装置900,参见图9所示,图9为本申请实施例提供的一种通信装置900的结构示意图。
可选地,该通信装置900执行附图3及相关实施例中第一网络设备的功能;其中,通信装置1000执行附图3及相关实施例中第二网络设备的功能。
可选地,该通信装置900执行附图3及相关实施例中第二网络设备的功能;其中,通信装置1000执行附图3及相关实施例中第一网络设备的功能。
附图9所示通信装置900包括存储器902和至少一个处理器901。
可选地,处理器901通过读取存储器902中保存的指令实现上述实施例中的方法,或者,处理器901也可以通过内部存储的指令实现上述实施例中的方法。在处理器901通过读取存储器902中保存的指令实现上述实施例中的方法的情况下,存储器902中保存实现本申请上述实施例提供的方法的指令。
可选地,至少一个处理器901是一个或多个CPU,或者是单核CPU,也可以是多核CPU。
进一步可选地,至少一个处理器901还可以用于执行前述图6所示实施例中处理单元602对应的实现过程,并实现相应的有益效果,此处不做赘述。
存储器902包括但不限于是RAM、ROM、EPROM、快闪存储器、或光存储器等。存储器902中保存有操作系统的指令。
存储器902中存储的程序指令被所述至少一个处理器901读取后,通信装置执行前述实施例中对应的操作。
可选地,附图9所示的通信装置还包括网络接口903。网络接口903可以是有线接口,例如FDDI,GE接口;网络接口903也可以是无线接口。网络接口903用于在附图3及相关实施例中执行数据的收发。
进一步可选地,网络接口903还可以用于执行前述图6所示实施例中收发单元601对应的实现过程,并实现相应的有益效果,此处不做赘述。
应理解,网络接口903具备接收数据和发送数据的功能,“接收数据”的功能和“发送数据”的功能可以集成在同一个收发接口中实现,或者,“接收数据”的功能和“发送数据”的功能可以分别在不同的接口中实现,此处不做限定。换言之,网络接口903可以包括一个或多个接口,用于实现“接收数据”的功能和“发送数据”的功能。
处理器901读取存储器902中的程序指令后,通信装置900能够执行的其他功能请参照前面各个方法实施例中的描述。
可选地,通信装置900还包括总线904,上述处理器901、存储器902通常通过总线904相互连接,也可以采用其他方式相互连接。
可选地,通信装置900还包括输入输出接口905,输入输出接口905用于与输入设备连接,接收用户、或者与通信装置900能够联动的其他设备通过输入设备输入的相关配置信息。输入设备包括但不限于键盘、触摸屏、麦克风等等。
本申请实施例提供的通信装置900用于执行上述各个方法实施例提供的通信装置(第一网络设备)执行的方法,并实现对应的有益效果。
例如,当通信装置900执行附图3及相关实施例中第一网络设备的功能的情况下;通信装置900获取包含有N个通信装置1000和P个第三网络设备之间的连接关系的第一拓扑信息,并且该通信装置900获取M条数据流的通信关系之后,该通信装置900根据该M条数据流的通信关系和该第一拓扑信息确定并向N个通信装置1000发送M个路径。此后,N个通信装置1000可以基于该M个路径分别向P个第三网络设备发送该M条数据流。换言之,通信装置900作为确定路径的设备,该通信装置900能够确定在N个通信装置1000和P个第三网络设备之间传输的M条数据流对应的M个路径。从而,通信装置900能够基于全局信息实现路径的确定,以避免路径冲突,提升数据流的转发效率。
又如,当通信装置900执行附图3及相关实施例中第二网络设备的功能的情况下;通信装置900向通信装置1000发送Q条数据流的通信关系的关系之后,该通信装置900接收来自通信装置1000的指示该通信装置900传输该Q条数据流时使用的Q个路径,并且,该通信装置900基于该Q个路径传输该Q条数据流。换言之,通信装置1000作为确定路径的设备,该通信装置1000能够确定在N个通信装置900和P个第三网络设备之间传输的M条数据流对应的M个路径。从而,通信装置1000能够基于全局信息实现路径的确定,以避免路径冲突,提升数据流的转发效率。
图9所示通信装置的具体实现方式,均可以参考前述的各个方法实施例中的叙述,此处不再一一赘述。
本申请实施例还提供了一种通信系统,该通信系统至少包括第一网络设备与N个第二网络设备。
可选地,该通信系统还包括P个第三网络设备。
可选地,该通信系统还包括K个第四网络设备。
应理解,在该通信系统中,各个网络设备还可以应用前述实施例所涉及的其它方法,并实现相应的技术效果,此处不做赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (28)

  1. 一种路径确定的方法,其特征在于,包括:
    第一网络设备获取第一拓扑信息,所述第一拓扑信息包括N个第二网络设备和P个第三网络设备之间的连接关系,任一第二网络设备为任一第三网络设备的上游网络设备,所述N为大于或等于2的整数,所述P为大于或等于1的整数;
    所述第一网络设备获取M条数据流的通信关系,所述M条数据流中的每条数据流的通信关系包括源地址信息和目的地址信息,所述M为大于或等于2的整数,所述M条数据流分别通过所述N个第二网络设备向所述P个第三网络设备传输;
    所述第一网络设备根据所述M条数据流的通信关系和所述第一拓扑信息确定M个路径,所述M个路径分别与M条数据流对应,所述M个路径指示通过所述N个第二网络设备向所述P个第三网络设备传输所述M条数据流的路径;
    所述第一网络设备分别向所述N个第二网络设备发送所述M个路径。
  2. 根据权利要求1所述的方法,其特征在于,所述M条数据流通过所述P个第三网络设备向K个第四网络设备传输,K为大于或等于1的整数;
    所述M条数据流包括第一数据流和第二数据流,所述第一数据流的源地址信息与所述第二数据流的源地址信息对应于不同的第二网络设备,所述第一数据流的目的地址信息与所述第二数据流的目的地址信息对应于同一第四网络设备,所述M个路径包括第一路径和第二路径,所述第一路径与所述第一数据流对应,所述第二路径与所述第二数据流对应,所述第一路径与所述第二路径对应于不同的第三网络设备。
  3. 根据权利要求1或2所述的方法,其特征在于,所述M个路径还指示所述M条数据流在所述N个第二网络设备上的出端口。
  4. 根据权利要求1至3所述的方法,其特征在于,所述M条数据流通过所述P个第三网络设备向K个第四网络设备传输,K为正整数;所述第一网络设备根据所述M条数据流的通信关系和所述第一拓扑信息确定M个路径包括:
    所述第一网络设备根据所述M条数据流的通信关系和所述第一拓扑信息确定第一映射关系,所述第一映射关系用于指示所述M条数据流中每条数据流的源地址信息对应的第二网络设备与所述M条数据流中每条数据流的目标地址信息对应的第四网络设备之间的映射关系;
    所述第一网络设备根据所述第一映射关系确定所述M个路径。
  5. 根据权利要求4所述的方法,其特征在于,所述第一网络设备根据所述第一映射关系确定所述M个路径包括:
    所述第一网络设备根据所述第一映射关系确定第一排序信息,所述第一排序信息用于指示所述K个第四网络设备对应的第二网络设备的数量的排序;
    所述第一网络设备根据所述第一排序信息依次对所述N个第二网络设备的出端口进行遍历,得到第二映射关系;其中,所述第二映射关系用于指示所述N个第二网络设备的出端口与所述K个第四网络设备之间的映射关系;
    所述第一网络设备基于所述第二映射关系确定所述M个路径。
  6. 根据权利要求5所述的方法,其特征在于,所述第一网络设备根据所述第一排序信息依次对所述N个第二网络设备的出端口进行遍历,得到第二映射关系包括:
    所述第一网络设备根据所述第一排序信息依次对所述N个第二网络设备的出端口进行遍历,得到第三映射关系;其中,所述第三映射关系用于指示每个所述第四网络设备对应的所述第二网络设备的出端 口的可选数量;
    所述第一网络设备基于所述第三映射关系确定所述第二映射关系。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述第一网络设备获取M条数据流的通信关系包括:
    所述第一网络设备分别接收来自所述N个第二网络设备的所述M条数据流的通信关系。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,
    所述M条数据流对应于多个人工智能AI集合通信任务中的一个任务。
  9. 根据权利要求1至8任一项所述的方法,其特征在于,
    所述第一网络设备为控制器或所述P个第三网络设备中的一个网络设备。
  10. 一种路径确定的方法,其特征在于,包括:
    第二网络设备向第一网络设备发送Q条数据流的通信关系,所述Q条数据流中的每条数据流的通信关系包括源地址信息和目的地址信息,Q为大于或等于1的整数;
    所述第二网络设备接收来自第一网络设备的Q个路径,所述Q个路径指示所述第二网络设备传输所述Q条数据流时使用的路径;
    所述第二网络设备基于所述Q个路径传输所述Q条数据流。
  11. 根据权利要求10所述的方法,其特征在于,所述路径信息还指示所述Q条数据流在所述第二网络设备上的出端口。
  12. 根据权利要求10或11所述的方法,其特征在于,
    所述一个或多条数据流对应于多个人工智能AI集合通信任务中的一个任务。
  13. 一种通信装置,其特征在于,包括收发单元和处理单元;
    所述收发单元用于获取第一拓扑信息,所述第一拓扑信息包括N个第二网络设备和P个第三网络设备之间的连接关系,任一第二网络设备为任一第三网络设备的上游网络设备,所述N为大于或等于2的整数,所述P为大于或等于1的整数;
    所述收发单元还用于获取M条数据流的通信关系,所述M条数据流中的每条数据流的通信关系包括源地址信息和目的地址信息,所述M为大于或等于2的整数,所述M条数据流分别通过所述N个第二网络设备向所述P个第三网络设备传输;
    所述处理单元用于根据所述M条数据流的通信关系和所述第一拓扑信息确定M个路径,所述M个路径分别与M条数据流对应,所述M个路径指示通过所述N个第二网络设备向所述P个第三网络设备传输所述M条数据流的路径;
    所述收发单元还用于分别向所述N个第二网络设备发送所述M个路径。
  14. 根据权利要求13所述的装置,其特征在于,所述M条数据流通过所述P个第三网络设备向K个第四网络设备传输,K为大于或等于1的整数;
    所述M条数据流包括第一数据流和第二数据流,所述第一数据流的源地址信息与所述第二数据流的源地址信息对应于不同的第二网络设备,所述第一数据流的目的地址信息与所述第二数据流的目的地址信息对应于同一第四网络设备,所述M个路径包括第一路径和第二路径,所述第一路径与所述第一数据流对应,所述第二路径与所述第二数据流对应,所述第一路径与所述第二路径对应于不同的第三网络设备。
  15. 根据权利要求13或14所述的装置,其特征在于,所述M个路径还指示所述M条数据流在所述N个第二网络设备上的出端口。
  16. 根据权利要求13至15所述的装置,其特征在于,所述M条数据流通过所述P个第三网络设备向K个第四网络设备传输,K为正整数;所述处理单元具体用于:
    根据所述M条数据流的通信关系和所述第一拓扑信息确定第一映射关系,所述第一映射关系用于指示所述M条数据流中每条数据流的源地址信息对应的第二网络设备与所述M条数据流中每条数据流的目标地址信息对应的第四网络设备之间的映射关系;
    根据所述第一映射关系确定所述M个路径。
  17. 根据权利要求16所述的装置,其特征在于,所述处理单元具体用于:
    根据所述第一映射关系确定第一排序信息,所述第一排序信息用于指示所述K个第四网络设备对应的第二网络设备的数量的排序;
    根据所述第一排序信息依次对所述N个第二网络设备的出端口进行遍历,得到第二映射关系;其中,所述第二映射关系用于指示所述N个第二网络设备的出端口与所述K个第四网络设备之间的映射关系;
    基于所述第二映射关系确定所述M个路径。
  18. 根据权利要求17所述的装置,其特征在于,所述处理单元具体用于:
    根据所述第一排序信息依次对所述N个第二网络设备的出端口进行遍历,得到第三映射关系;其中,所述第三映射关系用于指示每个所述第四网络设备对应的所述第二网络设备的出端口的可选数量;
    基于所述第三映射关系确定所述第二映射关系。
  19. 根据权利要求13至18任一项所述的装置,其特征在于,所述收发单元具体用于分别接收来自所述N个第二网络设备的所述M条数据流的通信关系。
  20. 根据权利要求13至19任一项所述的装置,其特征在于,
    所述M条数据流对应于多个人工智能AI集合通信任务中的一个任务。
  21. 根据权利要求13至20任一项所述的装置,其特征在于,
    所述第一网络设备为控制器或所述P个第三网络设备中的一个网络设备。
  22. 一种通信装置,其特征在于,包括收发单元和处理单元;
    所述处理单元用于确定Q条数据流的通信关系,所述Q条数据流中的每条数据流的通信关系包括源地址信息和目的地址信息,Q为大于或等于1的整数;
    所述收发单元用于向第一网络设备发送Q条数据流的通信关系;
    所述收发单元还用于接收来自第一网络设备的Q个路径,所述Q个路径指示所述第二网络设备传输所述Q条数据流时使用的路径;
    所述收发单元还用于基于所述Q个路径传输所述Q条数据流。
  23. 根据权利要求22所述的装置,其特征在于,所述路径信息还指示所述Q条数据流在所述第二网络设备上的出端口。
  24. 根据权利要求22或23所述的装置,其特征在于,
    所述一个或多条数据流对应于多个人工智能AI集合通信任务中的一个任务。
  25. 一种通信装置,其特征在于,包括至少一个处理器,所述至少一个处理器与存储器耦合;
    所述存储器用于存储程序或指令;
    所述至少一个处理器用于执行所述程序或指令,以使所述通信装置实现如权利要求1至9中任一项所述的方法,或,以使所述通信装置实现如权利要求10至12中任一项所述的方法。
  26. 一种计算机可读存储介质,其特征在于,所述介质存储有指令,当所述指令被处理器执行时,实现权利要求1至12中任一项所述的方法。
  27. 一种计算机程序产品,其特征在于,包括指令,当所述指令在处理器上运行时,实现如权利要求1至12中任一项所述的方法。
  28. 一种通信系统,其特征在于,所述通信系统包括第一网络设备以及N个第二网络设备,其中,所述第一网络设备用于执行如权利要求1至9中任一项所述的方法,所述第二网络设备用于如权利要求10至12中任一项所述的方法。
PCT/CN2023/103818 2022-07-27 2023-06-29 一种路径确定的方法及相关设备 Ceased WO2024021990A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP23845203.1A EP4557688A4 (en) 2022-07-27 2023-06-29 METHOD FOR DETERMINING TRAIL AND ASSOCIATED DEVICE
US19/027,916 US20250168116A1 (en) 2022-07-27 2025-01-17 Path Determining Method and Related Device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210891496.0A CN117527675A (zh) 2022-07-27 2022-07-27 一种路径确定的方法及相关设备
CN202210891496.0 2022-07-27

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US19/027,916 Continuation US20250168116A1 (en) 2022-07-27 2025-01-17 Path Determining Method and Related Device

Publications (1)

Publication Number Publication Date
WO2024021990A1 true WO2024021990A1 (zh) 2024-02-01

Family

ID=89705247

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/103818 Ceased WO2024021990A1 (zh) 2022-07-27 2023-06-29 一种路径确定的方法及相关设备

Country Status (4)

Country Link
US (1) US20250168116A1 (zh)
EP (1) EP4557688A4 (zh)
CN (2) CN117527675A (zh)
WO (1) WO2024021990A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120835027A (zh) * 2024-04-23 2025-10-24 华为技术有限公司 报文转发方法、通信节点、通信系统及相关产品
CN121194269A (zh) * 2024-06-20 2025-12-23 华为技术有限公司 一种路径信息的处理方法以及相关设备
CN121644443A (zh) * 2024-09-10 2026-03-10 华为技术有限公司 通告消息的处理方法、发送方法及装置
CN119583435A (zh) * 2024-11-29 2025-03-07 联想(北京)有限公司 一种流量转发调度方法及装置
CN119788597B (zh) * 2024-12-31 2026-02-03 中国电信股份有限公司技术创新中心 通信优化方法及装置、存储介质及电子设备

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104601485A (zh) * 2015-02-12 2015-05-06 清华大学 网络流量的分配方法及实现网络流量分配的路由方法
CN104767694A (zh) * 2015-04-08 2015-07-08 大连理工大学 一种面向Fat-Tree数据中心网络架构的数据流转发方法
CN105634974A (zh) * 2015-12-31 2016-06-01 杭州华为数字技术有限公司 软件定义网络中的路由确定方法和装置
CN107579922A (zh) * 2017-09-08 2018-01-12 北京信息科技大学 网络负载均衡装置和方法
US20180176134A1 (en) * 2016-12-21 2018-06-21 Cisco Technology, Inc. MACHINE LEARNING-DERIVED ENTROPY PATH GRAPH FROM IN-SITU OAM (iOAM) DATA
US10148551B1 (en) * 2016-09-30 2018-12-04 Juniper Networks, Inc. Heuristic multiple paths computation for label switched paths
US10630579B1 (en) * 2018-01-16 2020-04-21 Amazon Technologies, Inc. Ensuring separate paths for network traffic between source devices and a destination device
WO2021017578A1 (zh) * 2019-07-31 2021-02-04 华为技术有限公司 报文发送方法、装置及存储介质
CN114024969A (zh) * 2020-07-17 2022-02-08 华为技术有限公司 一种负载均衡方法、装置和系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10291503B2 (en) * 2013-09-26 2019-05-14 Taiwan Semiconductor Manufacturing Co., Ltd. File block placement in a distributed network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104601485A (zh) * 2015-02-12 2015-05-06 清华大学 网络流量的分配方法及实现网络流量分配的路由方法
CN104767694A (zh) * 2015-04-08 2015-07-08 大连理工大学 一种面向Fat-Tree数据中心网络架构的数据流转发方法
CN105634974A (zh) * 2015-12-31 2016-06-01 杭州华为数字技术有限公司 软件定义网络中的路由确定方法和装置
US10148551B1 (en) * 2016-09-30 2018-12-04 Juniper Networks, Inc. Heuristic multiple paths computation for label switched paths
US20180176134A1 (en) * 2016-12-21 2018-06-21 Cisco Technology, Inc. MACHINE LEARNING-DERIVED ENTROPY PATH GRAPH FROM IN-SITU OAM (iOAM) DATA
CN107579922A (zh) * 2017-09-08 2018-01-12 北京信息科技大学 网络负载均衡装置和方法
US10630579B1 (en) * 2018-01-16 2020-04-21 Amazon Technologies, Inc. Ensuring separate paths for network traffic between source devices and a destination device
WO2021017578A1 (zh) * 2019-07-31 2021-02-04 华为技术有限公司 报文发送方法、装置及存储介质
CN114024969A (zh) * 2020-07-17 2022-02-08 华为技术有限公司 一种负载均衡方法、装置和系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4557688A4

Also Published As

Publication number Publication date
US20250168116A1 (en) 2025-05-22
CN117527675A (zh) 2024-02-06
CN121418335A (zh) 2026-01-27
EP4557688A1 (en) 2025-05-21
EP4557688A4 (en) 2025-11-05

Similar Documents

Publication Publication Date Title
Noormohammadpour et al. Datacenter traffic control: Understanding techniques and tradeoffs
WO2024021990A1 (zh) 一种路径确定的方法及相关设备
Wang et al. Adaptive path isolation for elephant and mice flows by exploiting path diversity in datacenters
Guo et al. On-line multicast scheduling with bounded congestion in fat-tree data center networks
CN113438163B (zh) 一种基于路径隔离的数据中心网络混合流路由方法及系统
Rojas-Cessa et al. Schemes for fast transmission of flows in data center networks
Wang et al. Freeway: Adaptively isolating the elephant and mice flows on different transmission paths
JP2022532729A (ja) スライスベースルーティング
Liu et al. Delay-optimized video traffic routing in software-defined interdatacenter networks
Wu et al. DARD: Distributed adaptive routing for datacenter networks
CN104767694A (zh) 一种面向Fat-Tree数据中心网络架构的数据流转发方法
CN117294643A (zh) 一种基于SDN架构的网络QoS保障路由方法
CN106059821A (zh) 一种基于sdn的数据中心业务服务质量保障方法
CN113746751A (zh) 一种通信方法及装置
CN107454017A (zh) 一种云数据中心网络中混合数据流协同调度方法
CN116684360B (zh) 一种确定性网络时延控制方法、系统及可读存储介质
Zhang et al. A stable matching based elephant flow scheduling algorithm in data center networks
CN106330545A (zh) 一种地震解释系统及基于该系统的数据传输调度方法
Nepolo et al. A predictive ECMP routing protocol for fat-tree enabled data centre networks
CN117459414A (zh) 一种业务路径的编排方法及相关设备
Huang et al. Weaver: Efficient coflow scheduling in heterogeneous parallel networks
Pang et al. Research on SDN-based data center network traffic management and optimization
CN108347378A (zh) 一种用于大电网的控制专用网络及动态路由方法
Avci et al. A content-based traffic engineering policy for Information-Centric Networks
Bouacherine et al. Parallel multi-path forwarding strategy for named data networking

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23845203

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023845203

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2023845203

Country of ref document: EP

Effective date: 20250214

NENP Non-entry into the national phase

Ref country code: DE

WWP Wipo information: published in national office

Ref document number: 2023845203

Country of ref document: EP