CN118819864B - Unified scheduling method and system for resources of multiple types of loads - Google Patents
Unified scheduling method and system for resources of multiple types of loads Download PDFInfo
- Publication number
- CN118819864B CN118819864B CN202411295538.XA CN202411295538A CN118819864B CN 118819864 B CN118819864 B CN 118819864B CN 202411295538 A CN202411295538 A CN 202411295538A CN 118819864 B CN118819864 B CN 118819864B
- Authority
- CN
- China
- Prior art keywords
- server
- application
- target
- resource
- application task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5072—Grid computing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
- H04L67/1008—Server selection for load balancing based on parameters of servers, e.g. available memory or workload
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mathematical Physics (AREA)
- Computer And Data Communications (AREA)
- Multi Processors (AREA)
Abstract
The application discloses a uniform scheduling method and a uniform scheduling system for resources of multiple types of loads, and relates to the field of data processing, wherein the method comprises the steps of receiving application tasks of different types of target job load types and resource demand information corresponding to the application tasks of all the target job load types, wherein the application tasks of the different types of target job load types at least comprise graphic application session tasks; searching candidate servers meeting the resource requirement information corresponding to the application task of the current target job load type in a cluster resource information table according to the resource requirement information corresponding to the application task of the current target job load type, and sending the application task of the current target job load type and the resource requirement information corresponding to the application task of the current target job load type to the candidate servers to select the server with the lowest load. The application can improve the efficiency of connecting the terminal equipment with the server and the overall utilization rate of the server resources in the cluster.
Description
Technical Field
The application relates to the field of data processing, in particular to a uniform scheduling method and system for resources of multiple types of loads.
Background
When a user uses an application through a terminal device, a server is often required to be started for relevant operation.
At present, a user can search a server in a server cluster by using a terminal device to attempt to connect to start a graphic application, if all resources of the server attempting to connect are occupied by other terminal devices, the terminal device is indicated to fail to attempt, and then the terminal device can go to attempt to connect to other servers until the connection with one server is successful, so that the server starts the graphic application of the terminal device.
However, in this manner, the terminal device may need to try to connect to the server several times, so that the connection time of the terminal device is too long, the efficiency of starting the server is low, and different terminal devices use the same server to run the application task, so that some servers are busy, some servers are idle, and the resource load of the servers in the cluster is unbalanced.
Disclosure of Invention
The application aims to provide a uniform scheduling method and system for resources of multiple types of loads, which can improve the efficiency of connecting terminal equipment with a server and the overall utilization rate of server resources in a cluster.
In order to achieve the above object, the present application provides the following solutions:
In a first aspect, the present application provides a method for uniformly scheduling resources of multiple types of loads, where the method for uniformly scheduling resources of multiple types of loads includes:
receiving application tasks of different types of target workload types and resource demand information corresponding to the application tasks of the target workload types, wherein the application tasks of the different types of target workload types at least comprise graphic application session tasks;
Searching candidate servers meeting the resource demand information corresponding to the application task of the current target job load type in a cluster resource information table according to the resource demand information corresponding to the application task of the current target job load type, wherein the cluster resource information table comprises attribute information, used resource information and idle resource information of each server in a cluster;
Selecting a server with the lowest load from the candidate servers as a first target server;
And sending the application task of the current target workload type and the resource demand information corresponding to the application task of the current target workload type to the first target server so that the first target server runs the application task of the current target workload type.
In a second aspect, the present application provides a system for uniform scheduling of resources for multiple types of loads, where the system for uniform scheduling of resources for multiple types of loads includes:
the scheduling method comprises the steps of terminal equipment, a scheduling server and a cluster comprising a plurality of servers, wherein the plurality of servers comprise a first target server;
the terminal equipment is used for sending application tasks of different types of target workload types to the scheduling server and resource demand information corresponding to the application tasks of the target workload types, wherein the application tasks of the different types of target workload types at least comprise graphic application session tasks;
The scheduling server is used for searching candidate servers meeting the resource demand information corresponding to the application task of the current target job load type in a cluster resource information table according to the resource demand information corresponding to the application task of the current target job load type, wherein the cluster resource information table comprises attribute information, used resource information and idle resource information of each server in a cluster;
the scheduling server is used for selecting a server with the lowest load from the candidate servers as a first target server, and sending the application task of the current target job load type and resource demand information corresponding to the application task of the current target job load type to the first target server;
The first target server is configured to operate, after receiving the application task of the current target workload type and resource requirement information corresponding to the application task of the current target workload type, the application task of the current target workload type according to the resource requirement information corresponding to the application task of the current target workload type.
According to the specific embodiment provided by the application, the application discloses the following technical effects:
The application provides a unified scheduling method and system for resources of multiple types of loads, wherein the method comprises the steps of receiving application tasks of different types of target job load types and resource requirement information corresponding to the application tasks of the target job load types, wherein the application tasks of the different types of target job load types at least comprise graphic application session tasks, searching candidate servers meeting the resource requirement information corresponding to the application tasks of the current target job load types in a cluster resource information table according to the resource requirement information corresponding to the application tasks of the current target job load types, and the cluster resource information table comprises attribute information, used resource information and idle resource information of each server in a cluster, selecting a server with the lowest load from the candidate servers as a first target server, and sending the resource requirement information corresponding to the application tasks of the current target job load types and the application tasks of the current target job load types to the first target server so that the first target server can operate the application tasks of the current target job load types. Therefore, when the server is allocated to the terminal equipment, the scheduling server can directly search the resource demand information server which can meet the target operation from the cluster resource information table, so that the terminal equipment can be directly connected with the target server to enable the target server to process the target operation for the terminal equipment, and the terminal equipment is not required to try to connect with each server in the cluster one by one until connection is successfully established like the prior art, thereby improving the efficiency of connecting the terminal equipment with the server.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for unified scheduling of resources for multiple types of loads according to an embodiment of the present application;
FIG. 2 is a flow chart illustrating a related art task for launching a graphical application session in accordance with an embodiment of the present application;
FIG. 3 is a flow chart illustrating a method for launching a graphics application session task in the present disclosure in accordance with one embodiment of the present application;
FIG. 4 is a schematic diagram of a functional module of a resource unified scheduling system for multiple types of loads according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a functional module of a resource unified scheduling device with multiple types of loads according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The foregoing objects, features, and advantages of the application will be more readily apparent from the following detailed description of the application when taken in conjunction with the accompanying drawings and detailed description.
Fig. 1 is a flowchart of a method for uniform scheduling of resources for multiple types of loads according to an embodiment of the present application, where the method for uniform scheduling of resources for multiple types of loads includes the following steps S101 to S102:
In step S101, application tasks of different types of target workload types are received, and resource requirement information corresponding to the application tasks of the respective target workload types, where the application tasks of different types of target workload types at least include a graphics application session task.
The graphic application session task comprises a digital twin application and a graphic rendering application.
The application tasks of the target workload type may also include HPC parallel computing tasks, batch computing tasks, AI model training tasks, and AI model reasoning tasks.
The resource requirement information comprises the number of CPUs, the memory size, the operating system version, the type of GPU, the number of GPUs, the GPU mode and the graphics rendering library type.
That is, the resource information required for the target job includes, in addition to the usual number of CPUs, memory size, and operating system version, the model number of GPUs, the number of GPUs (divided into the number of stream processors and the number of display), the GPU mode (rendering/computing/balancing mode), and the graphics rendering library type, such as DirectX, openGL, vulkan.
The present disclosure gives examples of several application tasks and resource requirement information corresponding to the application tasks, siemens UG NX, unreal Engine, pyTorch model training, matlab parallel computing.
1) The application task is a graphic application session task (graphic rendering class), taking the graphic session of the industrial design application Siemens UG NX as an example, the corresponding resource requirement information is as follows:
CPU model "x86, amd64";
4, CPU core number;
the memory quantity is 32GB;
operating system "Windows Server 2016";
GPU model "Quadro M4000";
GPU count: "gmem =8 GB, cuda _core=1000";
GPU mode "render";
rendering library type: "DirectX 12".
2) The application task is a graphical application session task (digital twin class), taking the graphical session of the game engine Unreal Engine as an example, and the corresponding resource requirement information is as follows:
CPU model "x86, amd64";
The number of CPU cores is 8;
the memory quantity is 128GB;
Operating system "Ubuntu 22.04";
GPU model "RTX A40";
GPU count: "gmem =48 GB, cuda _core=10000";
GPU mode "render";
rendering library type: "Vulkan 1.3".
3) The application task is a parallel computing application, taking a scientific computing application WRF of climate prediction as an example, and the corresponding resource demand information is as follows:
CPU model "aarch";
The CPU core number is 256;
the memory quantity is 512GB;
Operating system: "Kylin Linux Server V10 SP1".
4) The application task is a batch computing application, taking EDA application Cadence Virtuoso of chip design as an example, and the corresponding resource requirement information is as follows:
CPU model "x86, amd64";
The CPU core number is 64;
the memory quantity is 256GB;
operating system "CentOS 7";
GPU model is RTX A4000;
GPU count: "gmem =4 GB, cuda _core=500";
GPU mode "Balanced";
rendering library type: "OpenGL 2.1".
5) The application task is AI model training, taking the big model training framework DEEPSPEED as an example, the corresponding resource requirement information is as follows:
CPU model "aarch";
The number of CPU cores is 8;
the memory quantity is 1024GB;
Operating system "Ubuntu 22.04";
GPU model "NVIDIA A800";
GPU count "gmem =640 GB, cuda _core=100000";
GPU mode "computing".
6) The application task is AI model reasoning, taking deep learning framework Pytorch as an example, and the corresponding resource requirement information is as follows:
CPU model "aarch";
4, CPU core number;
the memory quantity is 128GB;
Operating system "Ubuntu 20.04";
GPU model "Assetnd 310b";
GPU count: "gmem =10 GB, cuda _core=2000";
GPU mode "computing".
Existing HPC cluster resource scheduling software, such as IBM Spectrum LSF and Slurm scheduling software, can only schedule parallel computing tasks and batch computing tasks, but cannot schedule digital twin class application tasks, graphics rendering class application tasks, AI model training tasks or AI model reasoning tasks. The K8S open source management software most commonly used by the container cluster can only schedule and manage the calculation tasks and AI model training tasks or AI model reasoning tasks packaged by the container according to the pod, cannot schedule the HPC parallel calculation job tasks, and cannot schedule and manage the digital twin application tasks and the graphic rendering application tasks. Cloud desktop system software, such as the most advanced Citrix and VMware in the world, only has the management functions of cloud desktop session and remote graphic application session, and does not schedule cloud desktop or graphic application session tasks according to GPU resources, and further does not support management and resource scheduling of other types of computing tasks or AI model training reasoning tasks. The method and the system can uniformly schedule Windows and Linux digital twin application, graphic rendering application, AI model training task and AI model reasoning task in server clusters of public cloud and private cloud, and traditional parallel computing task and batch processing computing task, further abstract the digital twin application task or graphic rendering application task running in the cloud cluster as a load task of GPU resource, and schedule and manage the remote graphic session operation together with other load types.
In the present disclosure, a digital twin type application task and a graphics rendering type application task, an AI model training task, an AI model reasoning task, an HPC parallel computing task, and a batch processing computing task are all abstracted into "job" objects, a terminal device submits a job to a scheduling server by using a command line of the scheduling system in the scheduling server or calling an API of the scheduling system through a graphic interface, and the load type of the target job and resource information required by the target job are described through a command line parameter or a job description file, which may include an operating system version, and resources required by the graphics application, mainly GPU model, GPU number, GPU mode, and graphics rendering library type and version.
For example, a user may submit a job through the command line jsub of the scheduling system, submitting a job object description file to the scheduling system as a command line parameter, "jsub-jobfile myjob. Json";
the resource requirements and application launch paths and parameters of the job are described in myjob. Json, such as:
App_start_cmd="/opt/unreal-engine/Engine/Binaries/Linux/UnrealEditor –graphic"。
In step S102, candidate servers meeting the resource requirement information corresponding to the application task of the current target job load type are searched in a cluster resource information table according to the resource requirement information corresponding to the application task of the current target job load type, wherein the cluster resource information table comprises attribute information, used resource information and idle resource information of each server in a cluster.
The candidate server searching the cluster resource information table for the resource requirement information corresponding to the application task meeting the current target workload type can be understood as finding a suitable candidate server for each application task, wherein the candidate server can serve multiple application tasks at the same time, and when serving multiple application tasks, the idle resources of the candidate server meet the requirement of the resource requirement information of the application tasks.
The attribute information of the servers includes an operating system version, a model number of the GPU, a GPU mode (rendering/computing/balancing mode), and a graphics rendering library type, such as DirectX, openGL, vulkan, and the used resource information of each server and the idle resource information of each server may include a number of used CPUs, a size of used memory, a number of used GPUs, a number of idle CPUs, a size of idle memory, and a number of idle GPUs.
In step S103, the server with the lowest load is selected as the first target server among the candidate servers.
In order to balance the utilization of the servers in the cluster, so as not to overload some servers and idle some servers, the server with the lowest load among the candidate servers is selected as the first target server in the present disclosure.
In step S104, the application task of the current target workload type and the resource requirement information corresponding to the application task of the current target workload type are sent to the first target server, so that the first target server runs the application task of the current target workload type.
After the first target server is selected for the current application task, the application task of the current target workload type and the resource requirement information corresponding to the application task of the current target workload type can be sent to the first target server, so that the first target server can operate the application task of the current target workload type based on the received information.
Taking an application task of a current target job load type as a graphic application session task as an example, taking a first target server as a graphic server as an example, a scheduling server selects a graphic server with the lowest load from candidate servers as the first target server according to resource demand information of graphic application and resource load conditions (the cluster resource information table) of each graphic server in a cluster, if the candidate servers meet the resource demand information requirement of the graphic application session task currently, the scheduling server immediately starts the graphic application session task on the server according to an application start command line and parameters in a job description file by a job start program on the graphic server, returns a graphic server IP address and a Windows graphics DISPLAY number or a remote session number to terminal equipment corresponding to the application task of the current target job load type for connection and use by the terminal equipment, and if no idle graphic server currently exists, the graphic application session task is queued in a scheduling queue and is started after waiting for the resource to be met. That is, in the present disclosure, if there are more jobs in the scheduling server, at this time, the application task submitted by the terminal device may be placed in the queue of the scheduling server first, and then scheduled and dispatched by the scheduling server.
The target job can be queued in the scheduling server when the server resources are insufficient, and the target job cannot start failure or run slowly due to contention for the same server resources.
When a graphic application session task is started on a graphic server, the graphic rendering library is pre-installed or switched to a version required by an application through a system environment variable and a Windows registry according to the type and version of the graphic rendering library required by the graphic application session task, and the graphic application corresponding to the graphic application session task is started, before the graphic application is started, a process tree of the graphic application session task is bound to the allocated GPU card and CPU core through a system API and a bottom resource isolation mechanism according to the specific GPU card number and CPU core number which are allocated, for example, cgroup of a linux system and job object of the Windows system.
In the related art, as shown in fig. 2, when a user starts a digital twin application or a graphics rendering application in a terminal device, the user logs in to a graphics server to acquire a graphical interface or a remote graphical session, and then starts the graphics application on a graphics desktop, if the server is occupied by another terminal device, the user needs to log in another terminal device again, for example, there are 10 servers, the terminal device logs in the 4 th server, if the 4 th server is occupied by another terminal device, the login failure is caused, and the terminal device needs to try to connect to the other server.
In the disclosure, a digital twin application, a graphics rendering application, an AI model training and reasoning task, an HPC parallel computing task and a batch processing computing task are all abstracted into a 'job' object, a terminal device uses a command line of a dispatching system in a dispatching server or calls an API of the dispatching system through a graphic interface to submit a job to the dispatching server, and a command line parameter or a job description file is used for describing the load type of a target job and resource information required by the target job, which can include an operating system version, resources required by a graphics application session task, mainly GPU model, GPU number, GPU mode and graphics rendering library type and version.
For example, a user may submit a job through the command line jsub of the scheduling system, submitting a job object description file to the scheduling system as a command line parameter, "jsub-jobfile myjob. Json";
the resource requirements and application launch paths and parameters of the job are described in myjob. Json, such as:
App_start_cmd="/opt/unreal-engine/Engine/Binaries/Linux/UnrealEditor –graphic"。
As shown in fig. 3, taking the load type of the target job as an example of a graphic application session task, the terminal device submits the graphic application session task to the scheduling server through a command line or an API, the scheduling server schedules and allocates resources on the graphic server, and then the scheduling server starts the graphic application session task and returns to the graphic interface or the remote graphic session.
The application provides a unified resource scheduling method of multiple types of loads, which comprises the steps of receiving application tasks of different types of target job load types and resource requirement information corresponding to the application tasks of all the target job load types, wherein the application tasks of the different types of target job load types at least comprise graphic application session tasks, searching candidate servers meeting the resource requirement information corresponding to the application tasks of the current target job load types in a cluster resource information table according to the resource requirement information corresponding to the application tasks of the current target job load types, wherein the cluster resource information table comprises attribute information, used resource information and idle resource information of all the servers in a cluster, selecting a server with the lowest load from the candidate servers as a first target server, and sending the application tasks of the current target job load types and the resource requirement information corresponding to the application tasks of the current target job load types to the first target server so that the first target server can operate the application tasks of the current target job load types. Therefore, when the server is allocated to the terminal equipment, the scheduling server can directly search the resource demand information server which can meet the target operation from the cluster resource information table, so that the terminal equipment can be directly connected with the target server to enable the target server to process the target operation for the terminal equipment, and the terminal equipment is not required to try to connect with each server in the cluster one by one until connection is successfully established like the prior art, thereby improving the efficiency of connecting the terminal equipment with the server.
New scheduling policies are designed for cross-type applications, especially graphics rendering class application tasks. In order to maximize the utilization of server resources in a cluster and ensure user experience of graphics rendering class application tasks in a new manner, the present disclosure adds complementary scheduling policies, time window scheduling policies, resource preemptive scheduling policies, and session reuse scheduling policies in a scheduling system, as described in detail below.
1. Complementary scheduling policy
The application task of the current target workload type is the application task currently processed in the received application tasks of different types of target workload types, and can be one application task or a plurality of application tasks.
The application task of the current target workload type is exemplified as a plurality of application tasks.
In one implementation manner, when the first target server allocated to the application task of the current target workload type is a server running the application task but having idle resources, the idle resources of the server can meet resource requirement information corresponding to the application task of the current target workload type, and the application task being run by the first target server is different from the application task of the current target workload type, and after the application task of the current target workload type is run, the first target server runs different types of application tasks.
When the first target server has residual resources, a plurality of target resource demand information can be obtained from the received resource demand information corresponding to the application tasks of each target workload type, wherein the sum of the plurality of target resource demand information is smaller than or equal to the residual resources of the first target server, and the types of the application tasks running in the first target server are different from the types of the application tasks corresponding to each target resource demand information, so that the plurality of target resource demand information and the application tasks corresponding to each target resource demand information can be sent to the first target server, and the first target server can run the application tasks simultaneously.
In another implementation manner, when it is detected that there is an idle server in the cluster, a plurality of target resource requirement information may be obtained from the received resource requirement information corresponding to the application tasks of each target workload type, where the sum of the plurality of target resource requirement information is smaller than or equal to the resource of the first target server, and the types of the application tasks corresponding to each target resource requirement information are different, so that the plurality of target resource requirement information and the application tasks corresponding to each target resource requirement information may be sent to the first target server, so that the first target server may operate the application tasks simultaneously.
Specifically, in the existing job scheduling system, jobs are generally distributed to a server one by one according to a job queuing sequence for execution, and when jobs are distributed in batches, a batch of jobs of the same application type are distributed, and the fact that jobs of different application types are combined and distributed in batches according to resource complementation is not considered. When load jobs of different application types are queued in the scheduling queue and resources are complementary, the scheduling program can dispatch 2 or more jobs of completely different application types to the same server for execution at one time according to the resource configuration of the server. For example, a server A is idle in the cluster, the server is configured with 32 core CPUs and 2 GPU cards, a parallel computing job needs 24 core CPUs in the queuing queue, a graphics class job needs 2 core CPU+1 GPU card, and an AI model reasoning job needs 6 core CPU+1 GPU card, so that the scheduling system can dispatch the 3 jobs to the server A for starting at one time for 3 jobs, thereby ensuring the maximum utilization of resources of the server A and realizing one machine for multiple purposes of the cluster server.
Because some graphics application session tasks heavily use GPUs for graphics rendering, but only use a small amount of CPU resources, the physical servers or virtual machines allocated by conventional methods often have wasted CPU computing resources. The method and the system can achieve the aim of 'one machine with multiple purposes', can simultaneously run parallel computing tasks, batch computing tasks, graphic application session tasks and AI model training and reasoning tasks in a mixed mode on the same server, and fully utilize CPU and GPU resources of the server.
The existing job scheduling system generally distributes jobs to servers one by one according to job queuing order to execute, and when the jobs are distributed in batches, a batch of jobs of the same application type are distributed, and the jobs of different application types are combined according to resource complementation to be distributed in batches, but the present disclosure provides a complementary scheduling strategy, in which a plurality of terminal devices can use the same server in a cluster at the same time, for example, a first target server already performs jobs for other terminal devices, but the first target server also has idle resources, and the idle resources can meet the requirement of resource requirement information of target jobs, and the load types of the jobs performed by the first target server and the load types of the target jobs are different, so that the first target server can be shared.
2. Time window scheduling strategy
In one embodiment, the step S102 includes the following sub-steps A1-A3:
A1, acquiring the running time corresponding to the application task of the current target workload type, and placing the running time into a corresponding target time window, wherein different time windows correspond to different running time periods, and the application tasks of different types of target workload types correspond to different running time periods.
A2, if the current time meets the time period requirement corresponding to the target time window, searching a candidate server meeting the resource requirement information corresponding to the application task of the current target work load type in the cluster resource information table according to the resource requirement information corresponding to the application task of the current target work load type.
A3, if the current time does not meet the time period requirement corresponding to the target time window, sending a refusing processing instruction to the terminal equipment corresponding to the application task of the current target work load type, wherein the refusing processing instruction indicates that the current time does not meet the running time corresponding to the application task of the current target work load type, or placing the application task of the current target work load type and the resource requirement information corresponding to the application task of the current target work load type into a corresponding waiting queue for queuing, and when the time period requirement corresponding to the target time window is detected to be met, searching a candidate server meeting the resource requirement information corresponding to the application task of the current target work load type in a cluster resource information table again according to the resource requirement information corresponding to the application task of the current target work load type.
And setting a plurality of time windows in the scheduling server, such as a time window corresponding to a daytime working period, a time window corresponding to a night period and a time window corresponding to a weekend rest period. In the scheduling server, a running time window is designated for each type of application task, for example, a graphic application session task job requiring user interaction is allocated to a time window corresponding to a daytime working period, and a batch computing job or an AI model training job is allocated to a time window corresponding to a night period or a time window corresponding to a weekend rest period. Such a strategy may be referred to as a time window scheduling strategy. When the working period of the daytime is finished, the scheduling server stores the check point of the graphic application environment running on the server and cleans and releases resources, the running environment required by batch processing calculation and AI model training is automatically prepared for the use of resources at night by the computing job, and when the time window corresponding to the night period or the time window corresponding to the weekend rest period is finished, the computing job environment is stored and cleaned and releases resources, the server is automatically restored to the graphic application environment, and the working personnel interact for use. The time window scheduling strategy can realize the time-sharing use of the graphic application session task and the batch processing calculation task on the same server, and the GPU resource and the CPU resource on the server are fully utilized, so that the effect of one machine and two purposes is achieved.
The digital twin application and the graphic rendering application generally use a large amount of GPU resources when users interoperate, and the GPU resources are often idle after users get off duty in the evening, so that AI model training and reasoning tasks cannot be used. In the disclosure, "one machine is dual-purpose", the GPU card is set to be in a rendering mode in daytime for users to interactively use digital twin application and graphic rendering application, and the GPU card is automatically switched to be in a computing mode at night for AI model training and reasoning tasks.
The method comprises the steps of firstly obtaining running time corresponding to an application task of a current target work load type when a server is allocated for the target work, and placing the running time corresponding to the application task of the current target work load type into a corresponding target time window, then checking whether the current time meets the time period requirement corresponding to the target time window, if so, searching a candidate server meeting resource requirement information corresponding to the application task of the current target work load type in a cluster resource information table according to resource requirement information corresponding to the application task of the current target work load type, if the current time does not meet the time period requirement corresponding to the target time window, sending a refusing processing instruction to terminal equipment corresponding to the application task of the current target work load type, and refusing the processing instruction to indicate that the current time does not meet the running time corresponding to the application task of the current target work load type, or placing the application task of the current target work load type and the resource requirement information corresponding to a corresponding waiting queue for queuing, and searching the resource requirement information corresponding to the application task of the current target work load type in the cluster resource information table when the time period requirement corresponding to meet the target time window is detected, and achieving the effect of the candidate resource information corresponding to the task of the current target work load type in the cluster resource information.
3. Resource preemptive scheduling policy
In one embodiment, the method for uniform scheduling of resources for multiple types of loads in the present disclosure further includes the following sub-steps B1-B5:
And B1, if the candidate server meeting the resource requirement information corresponding to the application task of the current target workload type is not found in the cluster resource information table, acquiring the priority of the application task of the current target workload type.
B2, detecting whether a second target server exists in each server of the cluster, wherein the priority of the load type of the running first application task in the second target server is lower than that of the application task of the current target job load type, and after the second target server releases the resources allocated for the running first application task, the idle resources in the second target server meet the resource demand information corresponding to the application task of the current target job load type.
And B3, controlling the second target server to suspend or terminate the running first application task so as to enable the second target server to release the resources allocated for the first application task.
And B4, sending the application task of the current target workload type and the resource demand information corresponding to the application task of the current target workload type to a second target server so as to enable the second target server to run the application task of the current target workload type.
And B5, after the completion of the operation of the application task of the current target workload type is detected, controlling the second target server to continue to operate the first application task.
In an actual use scenario, parallel computing and AI model training often require long-time continuous use of CPU and GPU server resources, and when a user needs resources urgently to use a certain graphic application session task or use an AI model to make reasoning, the user needs to immediately obtain the required resources and start related applications. The scheduling server may cope with the above scenario by preemptive job scheduling policies. Different priorities can be configured in the scheduling server for different load types, the higher the priority is, the higher the emergency degree of application operation is, when an emergency task with high priority is encountered, the scheduling server will find a server which can be preempted to operate a low priority job in a cluster, and suspend or terminate the low priority job to release resources for use by the emergency graphic application session task. And after the graphic application session task is used, the scheduling system automatically restores the server environment, and resumes the low-priority job to continue running from the check point.
For example, the scheduling server may configure a plurality of queuing queues, different queuing queues may set different priorities, when an urgent task is encountered, for example, a graphics application session task may be submitted to a preemptible queue of high priority by the scheduling server, the scheduling server will find a low priority job and server meeting the requirements in a low priority queue that may be preempted in the cluster, and suspend or terminate the low priority job to release resources for use by the urgent graphics application session task. And after the graphic application session task is used, the scheduling system automatically restores the server environment, and resumes the low-priority job to continue running from the check point.
4. Session reuse scheduling policy
In one embodiment, when the application task of the target workload type is a graphics application session task, the method for uniformly scheduling resources of multiple types of loads in the present disclosure further includes the following sub-steps C1-C2:
If no candidate server meeting the resource requirement information corresponding to the application task of the current target work load type is found in the cluster resource information table, detecting whether a third target server exists in the cluster, wherein other application tasks belonging to the same terminal equipment with the application task of the current target work load type are running in the third target server, and the load types of the other application tasks are the same as the current target work load type;
and C2, if yes, sending the application task of the current target workload type and the resource demand information corresponding to the application task of the current target workload type to a third target server.
Creating a separate session or assigning a separate GPU graphics card for each graphics session class task can burden the resources of the cluster when the GPU resources of the servers in the cluster are strained. Therefore, the scheduling server can provide a scheduling strategy for session reuse, namely, a plurality of graphics application session tasks of the same type started by the same terminal equipment are scheduled and distributed to the same session job, and the same GPU display card is used. Because the same terminal equipment generally only operates one application at the same time in the interactive operation process, GPU resources occupied by session tasks of other graphic applications of the terminal equipment are low, the interactive performance of the currently operated session tasks of the graphic applications is not affected, resource conflict and coordination work among the terminal equipment are not caused by the session reuse scheduling strategy at the terminal equipment level, the safety isolation among the terminal equipment is ensured, and the GPU resources are fully utilized.
Because the resources of each server in the cluster are dynamically changed, the cluster resource information table also needs to be updated, and at this time, the unified scheduling method for the resources of the multiple types of loads further comprises the following sub-steps D1-D2:
And D1, receiving cluster resource updating information sent by each server in the cluster, wherein the cluster resource updating information comprises used resource information and idle resource information of each server corresponding to the current time point.
And D2, updating the cluster resource information table according to the cluster resource updating information.
The used resource information and the idle resource information at the current time point are identified on various servers in the cluster and reported to the scheduling server for the scheduling server to update the cluster resource information table, so that the scheduling server can more accurately distribute the server for the target job of the terminal equipment.
The present disclosure also provides a system for unified scheduling of resources for multiple types of loads, the system for unified scheduling of resources for multiple types of loads comprising:
The system comprises terminal equipment, a scheduling server and a cluster comprising a plurality of servers, wherein the plurality of servers comprise a first target server;
The terminal equipment is used for sending application tasks of different types of target job load types to the scheduling server and resource demand information corresponding to the application tasks of each target job load type, wherein the application tasks of the different types of target job load types at least comprise graphic application session tasks;
The scheduling server is used for searching candidate servers meeting the resource demand information corresponding to the application task of the current target job load type in the cluster resource information table according to the resource demand information corresponding to the application task of the current target job load type;
The scheduling server is used for selecting a server with the lowest load from the candidate servers as a first target server, and sending the application task of the current target job load type and the resource demand information corresponding to the application task of the current target job load type to the first target server;
The first target server is used for operating the application task of the current target work load type according to the resource requirement information corresponding to the application task of the current target work load type after receiving the application task of the current target work load type and the resource requirement information corresponding to the application task of the current target work load type.
In one embodiment, the first target server is a server running an application task but having a free resource, where the free resource may satisfy resource requirement information corresponding to an application task of a current target workload type, and the application task being run by the first target server is different from the application task of the current target workload type, and after the application task of the current target workload type is run, different types of application tasks are run in the first target server.
In one embodiment, the scheduling server is specifically configured to:
Acquiring the running time corresponding to the application task of the current target workload type, and placing the running time into a corresponding target time window, wherein different time windows correspond to different running time periods, and the application tasks of different types of target workload types correspond to different running time periods;
if the current time meets the time period requirement corresponding to the target time window, searching a candidate server meeting the resource requirement information corresponding to the application task of the current target work load type in the cluster resource information table according to the resource requirement information corresponding to the application task of the current target work load type;
If the current time does not meet the time period requirement corresponding to the target time window, a refusing processing instruction is sent to the terminal equipment corresponding to the application task of the current target work load type, the refusing processing instruction indicates that the current time does not meet the running time corresponding to the application task of the current target work load type, or the resource requirement information corresponding to the application task of the current target work load type and the application task of the current target work load type is stored in a cache, and when the requirement of the time period corresponding to the target time window is detected to be met, the candidate server meeting the resource requirement information corresponding to the application task of the current target work load type is searched again in a cluster resource information table according to the resource requirement information corresponding to the application task of the current target work load type.
In one embodiment, the scheduling server is further configured to obtain a priority of the application task of the current target workload type if the candidate server satisfying the resource requirement information corresponding to the application task of the current target workload type is not found in the cluster resource information table; detecting whether a second target server exists in each server of the cluster, wherein the priority of the load type of a first application task running in the second target server is lower than that of an application task of the current target job load type, and after the second target server releases resources allocated for the first application task running, idle resources in the second target server meet resource demand information corresponding to the application task of the current target job load type; the method comprises the steps of controlling a second target server to suspend or terminate a running first application task so as to enable the second target server to release resources allocated for the first application task, sending the application task of a current target job load type and resource requirement information corresponding to the application task of the current target job load type to the second target server so as to enable the second target server to run the application task of the current target job load type;
Or alternatively
The scheduling server is further configured to, when the application task of the target job load type is a graphic application session task, if a candidate server meeting resource requirement information corresponding to the application task of the current target job load type is not found in the cluster resource information table, detect whether a third target server exists in the cluster, where the third target server is running other application tasks belonging to the same terminal device as the application task of the current target job load type, and load types of the other application tasks are the same as the current target job load type, and if yes, send the application task of the current target job load type and the resource requirement information corresponding to the application task of the current target job load type to the third target server.
In the existing public cloud and private cloud, when users use digital twin and graphic rendering applications, required graphic server resources are often distributed according to the whole GPU server, or a virtualization technology is used for isolating a server into a plurality of virtual machines, and Windows and linux virtual machines are fixedly distributed to different users for interactive use by utilizing a GPU penetration or vGPU technology. Therefore, the problem of low GPU resource utilization rate of the server is easily caused, and the interaction performance of the graphics application is greatly lost due to the influence of virtualization.
Some digital twin and graphics class applications heavily use GPUs for graphics rendering, but only use a small amount of CPU resources, typically wasted CPU computing resources, of physical servers or virtual machines that are traditionally allocated.
Some public clouds and private clouds solve the problem of dynamic matching of Linux applications and resources by using a containerization technology and Kubernetes (K8S), but digital twin and graphics rendering applications mostly use Windows operating systems, and applications which do not support graphical interfaces in Windows containers, so that Windows graphics applications can only use Windows physical servers or cloud desktop virtual machines in clusters.
The digital twin and graphic rendering application generally uses a large amount of GPU resources only when users interoperate, and the GPU resources are often idle after users get off duty in the evening, so that AI model training and reasoning tasks cannot be used.
Existing HPC cluster resource scheduling software, such as IBM Spectrum LSF and Slurm scheduling software, can only schedule parallel computing and batch computing tasks, and cannot schedule digital twin applications, graphics rendering applications, or AI model training reasoning tasks. The K8S open source management software most commonly used by the container cluster can only schedule and manage calculation tasks and AI model training reasoning tasks which are packaged by the container according to the pod, cannot schedule HPC parallel calculation tasks, and cannot schedule and manage digital twin and graphic applications. Cloud desktop system software, such as the most advanced Citrix and VMware in the world, only has the management functions of cloud desktop sessions and remote graphics application sessions, and does not schedule cloud desktop or graphics application sessions according to GPU resources, and further does not support management and resource scheduling of other types of computing tasks or AI model training reasoning tasks.
The effect and advantage of this disclosure:
1. The method and the system can uniformly schedule Windows and Linux digital twin application, graphic rendering application and AI model training and reasoning tasks, and also conventional parallel computing tasks and batch processing computing tasks in server clusters of public cloud and private cloud.
2. The digital twin application or the graphic rendering application running in the cloud cluster is abstracted out as a load task of a GPU resource and used as a remote graphic session operation to be scheduled and managed together with other types of computing operation.
3. The jobs of the graphics class application can be queued in the resource scheduling system when the server GPU resources are insufficient, and the startup failure or slow running caused by competing for the same GPU resources can be avoided.
4. The digital twin and graphic rendering application can share the same GPU card with AI model training and reasoning tasks without depending on virtualization technology, and the performance is better.
5. The multi-purpose computer can simultaneously run parallel computing tasks, batch processing computing tasks, digital twin and graphic rendering applications and AI model training and reasoning tasks in a mixed mode on the same server, and CPU and GPU resources of the server are fully utilized.
6. The method is characterized in that the method comprises the steps of setting a GPU card into a rendering mode in daytime for users to interactively use digital twin and graphic rendering applications, and automatically switching the GPU card into a computing mode at night for AI model training and reasoning tasks.
Fig. 4 is a functional module architecture diagram of a multi-type load resource unified scheduling system according to an embodiment of the present application, as shown in fig. 4, including a terminal device, a scheduling server, and a cluster including a plurality of servers, where the servers in the cluster may be application/computing servers, and the plurality of servers includes a first target server;
The scheduling command line program is a group of command line programs including command line programs such as job submission, job inquiry, job control and the like. The terminal device can manually call the command lines in the operating system to perform job submission, job inquiry and job control operation, and the command line program returns a job calculation result or remote interface connection information of the graphic application to the terminal device. Before the terminal equipment submits the job, the resource type, the resource quantity, the rendering library type version, the application program starting path and the parameters required by the application program are written in a job description file, and the job description file is designated when the job is submitted. The scheduling command line program is typically run on the terminal device, but may also run on the scheduling server.
And the scheduling engine program in the scheduling server is used for receiving requests of job submission, job inquiry, job control and the like submitted by the scheduling command line and returning job data or graphic interface connection information to the scheduling command line program. The scheduling engine program is internally provided with a configuration file, and an administrator can preset parameters related to a scheduling strategy, such as a complementary scheduling strategy switch parameter, a time window setting of each type of load type, a queuing queue of a preemptive scheduling strategy, a session reuse scheduling test switch and the like. The scheduling engine program receives the used resource information and the idle resource information corresponding to the current time point reported by the resource monitoring program on each application/computing server at regular time (such as every 15 seconds), and schedules the target job to the appropriate application/computing server according to the resource demand information of the target job, the resource status of each application/computing server in the current cluster and the scheduling policy configured by the administrator. The scheduling engine program designates the CPU, GPU equipment numbers and resource quantity available on the server for the target job according to the resource equipment condition of the application/computing server, and the information is sent to a job starting program on the application/computing server along with the job object through network communication, so that the target job is distributed. The scheduler engine operates on a scheduler server.
The resource monitoring program runs on each application/computing server, scans the application/computing server system by various methods such as operating system commands, system APIs and IPMI, automatically identifies the CPU, memory, GPU and other devices and resource use conditions on the application/computing server, and the number of resources actually used by various jobs running on the application/computing server (namely, the attribute information, the used resource information and the idle resource information of each application/computing server in the embodiment) and reports the actual number of resources to the scheduling engine program on the scheduling server through network communication.
The job starting program runs on each application/computing server, after receiving the job object information (resource requirement information of the target job in the above embodiment) sent by the scheduling engine program through network communication, the rendering library of the current program running environment is switched to the type and version of the job requirement by setting the environment variable and registry of the operating system according to the type and version requirement of the rendering library in the job object information, if the system fails to report the related dependency library in the switching process, the job starting program can be automatically downloaded and installed from the internet. After the running environment is prepared, the job starting program starts the target job by executing an application starting path and starting parameters in the job object, and calls an API of an operating system to set the process attribute of the application program, so that the equipment and the quantity of the CPU, the memory, the GPU and the like which can be used by the target job are limited according to the resource quota in the job object. After the target job is started, the job starting program tracks and scans the process tree of each job, and the subsequent job process can be suspended, restored or terminated according to the job control instruction sent by the scheduling engine program.
In the related art, some public clouds and private clouds solve the problem of dynamic matching of Linux applications and resources by using a containerization technology and Kubernetes (K8S), but digital twin applications and graphic rendering applications mostly use Windows operating systems, and applications which do not support a graphical interface in a Windows container, so that Windows graphic applications only use Windows physical servers or cloud desktop virtual machines in a cluster. The digital twin application and the graphic rendering application can share the same GPU card with AI model training and reasoning tasks without depending on virtualization technology, and the performance is better.
Based on the same inventive concept, the embodiment of the application also provides a multi-type load resource unified scheduling device for realizing the multi-type load resource unified scheduling method. The implementation scheme of the solution to the problem provided by the device is similar to the implementation scheme described in the above method, so the specific limitation in the embodiments of the resource unified scheduling device for one or more multi-type loads provided below can be referred to the limitation of the resource unified scheduling method for multi-type loads hereinabove, and will not be repeated here.
In an exemplary embodiment, as shown in fig. 5, there is provided a resource uniform scheduling apparatus of a multi-type load, the resource uniform scheduling apparatus of the multi-type load including:
The receiving module 11 is configured to receive application tasks of different types of target workload types and resource requirement information corresponding to the application tasks of the target workload types, where the application tasks of the different types of target workload types at least include a graphics application session task;
The searching module 12 is configured to search a cluster resource information table for candidate servers meeting the resource requirement information corresponding to the application task of the current target job load type according to the resource requirement information corresponding to the application task of the current target job load type, where the cluster resource information table includes attribute information, used resource information and idle resource information of each server in a cluster;
a selecting module 13, configured to select a server with the lowest load from the candidate servers as a first target server;
and the sending module 14 is configured to send the current target workload type application task and resource requirement information corresponding to the current target workload type application task to the first target server, so that the first target server runs the current target workload type application task.
In an exemplary embodiment, a computer device, which may be a server or a terminal, is provided, and an internal structure thereof may be as shown in fig. 6. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data required by uniform scheduling of resources of multiple types of loads. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for unified scheduling of resources for multiple types of loads.
It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In an exemplary embodiment, a computer device is also provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In an exemplary embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method embodiments described above.
In an exemplary embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are both information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to meet the related regulations.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magneto-resistive random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic RandomAccess Memory, DRAM), etc.
The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The principles and embodiments of the present application have been described herein with reference to specific examples, which are intended to facilitate an understanding of the principles and concepts of the application and are to be varied in scope and detail by persons of ordinary skill in the art based on the teachings herein. In view of the foregoing, this description should not be construed as limiting the application.
Claims (9)
1. The unified scheduling method for the resources of the multi-type loads is characterized by comprising the following steps of:
receiving application tasks of different types of target workload types and resource demand information corresponding to the application tasks of the target workload types, wherein the application tasks of the different types of target workload types at least comprise graphic application session tasks;
Searching candidate servers meeting the resource demand information corresponding to the application task of the current target job load type in a cluster resource information table according to the resource demand information corresponding to the application task of the current target job load type, wherein the cluster resource information table comprises attribute information, used resource information and idle resource information of each server in a cluster;
Selecting a server with the lowest load from the candidate servers as a first target server;
Transmitting the application task of the current target workload type and the resource demand information corresponding to the application task of the current target workload type to the first target server so that the first target server runs the application task of the current target workload type;
When the application task of the target workload type is a graphic application session task, the method for uniformly scheduling resources of the multiple types of loads further comprises:
if no candidate server meeting the resource requirement information corresponding to the application task of the current target work load type is found in the cluster resource information table, detecting whether a third target server exists in the cluster, wherein other application tasks belonging to the same terminal equipment as the application task of the current target work load type are running in the third target server, and the load types of the other application tasks are the same as the current target work load type;
And if so, sending the application task of the current target workload type and the resource demand information corresponding to the application task of the current target workload type to the third target server.
2. The method for uniform scheduling of resources for multiple types of loads according to claim 1, wherein,
The first target server is a server running application tasks but having idle resources, the idle resources can meet resource requirement information corresponding to the application tasks of the current target job load type, the application tasks running by the first target server are different from the application tasks of the current target job load type, and after the application tasks of the current target job load type are run, the application tasks of different types are run in the first target server.
3. The method for uniform scheduling of multi-type load resources according to claim 1, wherein the resource requirement information corresponding to the application task satisfying the current target job load type is searched in a cluster resource information table according to the resource requirement information corresponding to the application task of the current target job load type
Comprises:
acquiring the running time corresponding to the application task of the current target workload type, and placing the running time into a corresponding target time window, wherein different time windows correspond to different running time periods, and the application tasks of different types of target workload types correspond to different running time periods;
If the current time meets the time period requirement corresponding to the target time window, searching the candidate server meeting the resource requirement information corresponding to the application task of the current target work load type in the cluster resource information table according to the resource requirement information corresponding to the application task of the current target work load type;
If the current time does not meet the time period requirement corresponding to the target time window, a refusing processing instruction is sent to the terminal equipment corresponding to the application task of the current target work load type, the refusing processing instruction indicates that the current time does not meet the running time corresponding to the application task of the current target work load type, or the application task of the current target work load type and the resource requirement information corresponding to the application task of the current target work load type are put into a corresponding waiting queue to be queued, and when the requirement of the time period corresponding to the target time window is detected to be met, the candidate server meeting the resource requirement information corresponding to the application task of the current target work load type is searched again in the cluster resource information table according to the resource requirement information corresponding to the application task of the current target work load type.
4. The method for uniform scheduling of resources for multiple types of loads according to claim 1, further comprising:
If the candidate server meeting the resource demand information corresponding to the application task of the current target job load type is not found in the cluster resource information table, acquiring the priority of the application task of the current target job load type;
detecting whether a second target server exists in each server of the cluster, wherein the priority of a load type of a first application task running in the second target server is lower than that of an application task of the current target job load type, and after the second target server releases resources allocated for the first application task running, idle resources in the second target server meet resource demand information corresponding to the application task of the current target job load type;
Controlling the second target server to suspend or terminate the running first application task so as to enable the second target server to release the resources allocated for the first application task;
transmitting the application task of the current target workload type and the resource demand information corresponding to the application task of the current target workload type to the second target server so that the second target server runs the application task of the current target workload type;
And after detecting that the application task of the current target workload type is finished, controlling the second target server to continue to run the first application task.
5. The method for unified scheduling of resources for multiple types of loads according to any one of claims 1 to 4, wherein the method for unified scheduling of resources for multiple types of loads further comprises:
receiving cluster resource updating information sent by each server in a cluster, wherein the cluster resource updating information comprises used resource information and idle resource information of each server corresponding to a current time point;
And updating the cluster resource information table according to the cluster resource updating information.
6. A multi-type load resource unified scheduling system, characterized in that the multi-type load resource unified scheduling system comprises:
the scheduling method comprises the steps of terminal equipment, a scheduling server and a cluster comprising a plurality of servers, wherein the plurality of servers comprise a first target server;
The terminal equipment is used for sending application tasks of different types of target workload types to the scheduling server and resource demand information corresponding to the application tasks of the target workload types, wherein the application tasks of the different types of target workload types at least comprise graphic application session tasks;
the scheduling server is used for searching candidate servers meeting the resource demand information corresponding to the application task of the current target job load type in a cluster resource information table according to the resource demand information corresponding to the application task of the current target job load type, wherein the cluster resource information table comprises attribute information, used resource information and idle resource information of each server in a cluster;
The scheduling server is used for selecting a server with the lowest load from the candidate servers as a first target server, and sending the application task of the current target job load type and the resource demand information corresponding to the application task of the current target job load type to the first target server;
The first target server is configured to operate, after receiving the application task of the current target workload type and resource requirement information corresponding to the application task of the current target workload type, the application task of the current target workload type according to the resource requirement information corresponding to the application task of the current target workload type;
When the application task of the target workload type is a graphic application session task, if a candidate server meeting the resource requirement information corresponding to the application task of the current target workload type is not found in a cluster resource information table, detecting whether a third target server exists in the cluster, wherein the third target server is running and belongs to the same terminal with the application task of the current target workload type
And if so, sending the application task of the current target job load type and the resource demand information corresponding to the application task of the current target job load type to the third target server.
7. The uniform resource scheduling system of multiple types of loads according to claim 6, wherein,
The first target server is a server running application tasks but having idle resources, the idle resources can meet resource requirement information corresponding to the application tasks of the current target job load type, the application tasks running by the first target server are different from the application tasks of the current target job load type, and after the application tasks of the current target job load type are run, the application tasks of different types are run in the first target server.
8. The unified scheduling system for resources for multiple types of loads according to claim 6, wherein the scheduling server is specifically configured to:
acquiring the running time corresponding to the application task of the current target workload type, and placing the running time into a corresponding target time window, wherein different time windows correspond to different running time periods, and the application tasks of different types of target workload types correspond to different running time periods;
If the current time meets the time period requirement corresponding to the target time window, searching a candidate server meeting the resource requirement information corresponding to the application task of the current target job load type in the cluster resource information table according to the resource requirement information corresponding to the application task of the current target job load type;
If the current time does not meet the time period requirement corresponding to the target time window, a refusing processing instruction is sent to the terminal equipment corresponding to the application task of the current target job load type, the refusing processing instruction indicates that the current time does not meet the running time corresponding to the application task of the current target job load type, or the resource requirement information corresponding to the application task of the current target job load type and the application task of the current target job load type is stored in a cache, and when the time period requirement corresponding to the target time window is detected to be met, the candidate server meeting the resource requirement information corresponding to the application task of the current target job load type is searched again in the cluster resource information table according to the resource requirement information corresponding to the application task of the current target job load type.
9. The system of claim 6, wherein the scheduling server is further configured to obtain a priority of the application task of the current target workload type if no candidate server satisfying resource requirement information corresponding to the application task of the current target workload type is found in the cluster resource information table, and detect whether a second target server exists in each server of the cluster, wherein the first application task is running in the second target server
The method comprises the steps of enabling a load type to be lower than a priority of an application task of a current target work load type, enabling idle resources in a second target server to meet resource requirement information corresponding to the application task of the current target work load type after resources allocated for the application task of the current target work load type are released by the second target server, controlling the second target server to suspend or terminate the first application task which is running so that the second target server releases the resources allocated for the first application task, and enabling the application task of the current target work load type and the resource requirement information corresponding to the application task of the current target work load type to be sent to the second target server so that the second target server can run the application task of the current target work load type.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411295538.XA CN118819864B (en) | 2024-09-18 | 2024-09-18 | Unified scheduling method and system for resources of multiple types of loads |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411295538.XA CN118819864B (en) | 2024-09-18 | 2024-09-18 | Unified scheduling method and system for resources of multiple types of loads |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN118819864A CN118819864A (en) | 2024-10-22 |
| CN118819864B true CN118819864B (en) | 2024-12-20 |
Family
ID=93066006
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411295538.XA Active CN118819864B (en) | 2024-09-18 | 2024-09-18 | Unified scheduling method and system for resources of multiple types of loads |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN118819864B (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119537032B (en) * | 2025-01-21 | 2025-05-20 | 北京亿安天下科技股份有限公司 | Large model reasoning scheduling method based on off-network computing power server |
| CN119718688A (en) * | 2025-02-28 | 2025-03-28 | 北京涵鑫盛科技有限公司 | Cluster load balancing processing method based on cloud computing |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111488210A (en) * | 2020-04-02 | 2020-08-04 | 腾讯科技(深圳)有限公司 | Task scheduling method and device based on cloud computing and computer equipment |
| CN116467053A (en) * | 2022-01-11 | 2023-07-21 | 中国移动通信有限公司研究院 | Resource scheduling method and device, equipment, storage medium |
| CN118394468A (en) * | 2024-04-16 | 2024-07-26 | 北京车智慧信息技术有限公司 | Task scheduling method, system and computing device |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110333939B (en) * | 2019-06-17 | 2023-11-14 | 腾讯科技(成都)有限公司 | Task hybrid scheduling method, device, scheduling server and resource server |
-
2024
- 2024-09-18 CN CN202411295538.XA patent/CN118819864B/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111488210A (en) * | 2020-04-02 | 2020-08-04 | 腾讯科技(深圳)有限公司 | Task scheduling method and device based on cloud computing and computer equipment |
| CN116467053A (en) * | 2022-01-11 | 2023-07-21 | 中国移动通信有限公司研究院 | Resource scheduling method and device, equipment, storage medium |
| CN118394468A (en) * | 2024-04-16 | 2024-07-26 | 北京车智慧信息技术有限公司 | Task scheduling method, system and computing device |
Also Published As
| Publication number | Publication date |
|---|---|
| CN118819864A (en) | 2024-10-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10467725B2 (en) | Managing access to a resource pool of graphics processing units under fine grain control | |
| JP6646114B2 (en) | Dynamic virtual machine sizing | |
| CN118819864B (en) | Unified scheduling method and system for resources of multiple types of loads | |
| CN110888743B (en) | GPU resource using method, device and storage medium | |
| CN106933669B (en) | Apparatus and method for data processing | |
| US20140208072A1 (en) | User-level manager to handle multi-processing on many-core coprocessor-based systems | |
| US20170024251A1 (en) | Scheduling method and apparatus for distributed computing system | |
| CN115686805A (en) | GPU resource sharing method and device, and GPU resource sharing scheduling method and device | |
| CN116541134A (en) | Method and device for deploying containers in multi-architecture cluster | |
| CN116225709A (en) | A GPU real-time task computing power dynamic allocation method and device | |
| CN114968567A (en) | Method, apparatus and medium for allocating computing resources of computing nodes | |
| CN118502965B (en) | Accelerator card allocation method, device and artificial intelligence platform | |
| CN114721818A (en) | Kubernetes cluster-based GPU time-sharing method and system | |
| CN117472570A (en) | Methods, apparatus, electronic devices and media for scheduling accelerator resources | |
| Chiang et al. | DynamoMl: Dynamic resource management operators for machine learning workloads. | |
| WO2025103006A1 (en) | Serverless computing-based data processing methods and electronic device | |
| US11954534B2 (en) | Scheduling in a container orchestration system utilizing hardware topology hints | |
| US12254346B1 (en) | Latency service level agreement based scheduling of operating system threads at cloud services | |
| CN114168294B (en) | Method and device for distributing compiling resources, electronic equipment and storage medium | |
| CN117632516A (en) | Resource allocation method and device and computer equipment | |
| CN117950816A (en) | Job scheduling method, device and chip | |
| CN114281529A (en) | Distributed virtualization guest operating system scheduling optimization method, system and terminal | |
| Sajjapongse et al. | A flexible scheduling framework for heterogeneous CPU-GPU clusters | |
| HK40081315A (en) | Method and apparatus for gpu resource sharing, method and apparatus for scheduling gpu resource sharing | |
| CN119166291B (en) | Database thread pool control method and device, electronic device, and computer medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |