WO2024254894A1

WO2024254894A1 - Single unit-based large-scale graph data processing system

Info

Publication number: WO2024254894A1
Application number: PCT/CN2023/101407
Authority: WO
Inventors: 朱筱可; 刘阳; 刘书昊; 樊文飞
Original assignee: Shenzhen Institute of Computing Sciences
Current assignee: Shenzhen Institute of Computing Sciences
Priority date: 2023-06-12
Filing date: 2023-06-20
Publication date: 2024-12-19
Anticipated expiration: 2025-12-12
Also published as: CN116680296B; CN116680296A

Abstract

The present application provides a single unit-based large-scale graph data processing system, comprising a data loading module, a data computing module, a data release module, a storage management module and a disk; the data loading module is used for acquiring sub-graphs in an active state from the disk, and transmitting the sub-graphs to the data computing module; the data computing module is used for updating the sub-graphs, and transmitting messages generated by the updating to the storage management module; the data computing module is further used for transmitting the sub-graphs to the data release module; the data release module is used for writing the sub-graphs into the disk; when the sub-graphs are written into the disk, the storage management module is used for setting the states of the sub-graphs to be convergent. Applying a sub-graph center-based computing model to a single unit system and establishing a set of unique pipeline processing architecture capable of overlapping data I/O and CPU operations reduce I/O costs of traditional vertex center computing models while increasing CPU utilization rate, and promote sequential access to the disk.

Description

A large-scale graph data processing system based on a single machine

Technical Field

本申请涉及数据处理技术领域，特别是一种基于单机的大规模图数据处理系统。The present application relates to the field of data processing technology, and in particular to a large-scale graph data processing system based on a single machine.

Background Art

近年来，由于图数据易于抽象真实世界中的实体与关系，它已经成为数据科学和工程领域备受重视的话题，并被广泛应用于社交网络分析、推荐系统、金融欺诈检测、药物发现等多个领域，同时，图数据具有很高的灵活性，许多原本使用矩阵、关系或其他数据结构进行建模的问题也可以转换为图数据处理，进一步凸显了图数据的重要性。随着社交媒体和移动互联网应用的增强，计算机系统产生或收集的抽象图数据规模正在飞速增长，这种量级上的增长对现代计算机系统的大规模数据存储、分析和挖掘能力提出了极其尖锐的挑战。In recent years, graph data has become a highly valued topic in the field of data science and engineering, as it is easy to abstract entities and relationships in the real world. It has been widely used in many fields such as social network analysis, recommendation systems, financial fraud detection, and drug discovery. At the same time, graph data has high flexibility, and many problems that were originally modeled using matrices, relationships, or other data structures can also be converted to graph data processing, further highlighting the importance of graph data. With the enhancement of social media and mobile Internet applications, the scale of abstract graph data generated or collected by computer systems is growing rapidly. This growth in magnitude has posed an extremely sharp challenge to the large-scale data storage, analysis, and mining capabilities of modern computer systems.

传统的大规模图计算系统使用数据划分的并行化方法，即整合多台计算机资源以完成图计算任务。尽管这些计算系统在大图处理领域扮演着重要角色，但由于高昂的维护和构建成本，只有少数拥有大规模计算机集群的公司能够进行大规模图计算，此外，分布式计算系统通常基于一个假设，即使用更多的计算节点会减少计算时间，但实际上这一假设并不总是成立，增加计算节点可能会导致更大的通信代价，从而无法显著提升系统性能。Traditional large-scale graph computing systems use a parallel approach to data partitioning, which is to integrate multiple computer resources to complete graph computing tasks. Although these computing systems play an important role in the field of large graph processing, due to the high maintenance and construction costs, only a few companies with large-scale computer clusters can perform large-scale graph computing. In addition, distributed computing systems are usually based on an assumption that using more computing nodes will reduce computing time, but in fact this assumption is not always true. Adding computing nodes may result in greater communication costs, which cannot significantly improve system performance.

针对大规模图分析的实际需求和资源受限的使用场景，一系列基于单机的大规模图处理系统被提出。这些系统利用外存作为内存拓展来处理大图，并采用基于顶点中心的计算模型(该模型将计算过程中的信息局限在节点之间传递)，以提升数据局部性，简化用户使用负担。尽管基于顶点中心的计算模型简单易懂，但也存在通信开销或I/O开销较高的问题。In response to the actual needs of large-scale graph analysis and resource-constrained usage scenarios, a series of large-scale graph processing systems based on single machines have been proposed. These systems use external memory as memory extension to process large graphs, and adopt a vertex-centric computing model (which limits the information in the computing process to be transmitted between nodes) to improve data locality and simplify the user's usage burden. Although the vertex-centric computing model is simple and easy to understand, it also has the problem of high communication overhead or I/O overhead.

发明内容Summary of the invention

鉴于上述问题，提出了本申请以便提供克服所述问题或者至少部分地解决所述问题的一种基于单机的大规模图数据处理系统，包括：In view of the above problems, the present application is proposed to provide a method for overcoming the above problems or at least partially solving the above problems. A large-scale graph data processing system based on a single machine to solve the above problem includes:

一种基于单机的大规模图数据处理系统，包括数据加载模块、数据计算模块、数据释放模块、存储管理模块和磁盘；所述磁盘存储有由若干子图构成的大规模图数据；所述存储管理模块存储有与每一所述子图对应的状态信息；初始状态下，所述子图的状态为活跃；A large-scale graph data processing system based on a single machine, comprising a data loading module, a data calculation module, a data release module, a storage management module and a disk; the disk stores large-scale graph data consisting of a plurality of subgraphs; the storage management module stores state information corresponding to each of the subgraphs; in an initial state, the state of the subgraph is active;

所述数据加载模块用于从所述磁盘获取状态为活跃的所述子图，并将所述子图传输至所述数据计算模块；The data loading module is used for acquiring the sub-graph in the active state from the disk, and transmitting the sub-graph to the data calculation module;

所述数据计算模块用于对所述子图进行更新，并将更新产生的消息传输至所述存储管理模块；The data calculation module is used to update the subgraph and transmit the message generated by the update to the storage management module;

当更新后的所述子图存在改变时，所述数据计算模块还用于将所述子图传输至所述数据释放模块；When there is a change in the updated sub-graph, the data calculation module is further used to transmit the sub-graph to the data release module;

当所述子图非当前轮更新中的最后一个时，所述数据释放模块用于将所述子图写入所述磁盘；When the sub-graph is not the last one in the current round of update, the data release module is used to write the sub-graph into the disk;

当所述子图被写入所述磁盘时，所述存储管理模块用于将所述子图的状态设置为收敛。When the subgraph is written to the disk, the storage management module is used to set the state of the subgraph to converged.

优选的，当更新后的所述子图不存在改变时，所述数据计算模块还用于将所述子图写入所述磁盘。Preferably, when there is no change in the updated sub-graph, the data calculation module is further used to write the sub-graph into the disk.

优选的，当所述子图为当前轮更新中的最后一个时，所述数据释放模块还用于将所述子图传输至所述数据加载模块。Preferably, when the sub-graph is the last one in the current round of updating, the data releasing module is further used to transmit the sub-graph to the data loading module.

优选的，当当前轮更新结束时，所述存储管理模块还用于将接收到消息的所述子图的状态设置为活跃。Preferably, when the current round of updating is finished, the storage management module is further used to set the state of the subgraph that has received the message to active.

优选的，当当前轮更新结束且所述存储管理模块中不存在消息缓存时，所述数据计算模块还用于对全部所述子图进行聚合，得到更新后的大规模图数据。Preferably, when the current round of updates is completed and there is no message cache in the storage management module, the data calculation module is also used to aggregate all the sub-graphs to obtain updated large-scale graph data.

优选的，所述存储管理模块包括消息存储单元和状态管理单元；所述状态管理单元存储有所述状态信息；Preferably, the storage management module includes a message storage unit and a state management unit; the state management unit stores the state information;

所述数据计算模块用于将更新产生的消息传输至所述消息存储单元；The data calculation module is used to transmit the updated message to the message storage unit;

当当前轮更新结束时，所述状态管理单元用于将接收到消息的所述子图的状态设置为活跃。When the current round of update is finished, the state management unit is used to update the subgraph received the message Set the status to Active.

优选的，所述数据计算模块包括聚合计算单元；Preferably, the data calculation module includes an aggregation calculation unit;

当当前轮更新结束且所述消息存储单元内不存在消息缓存时，所述聚合计算单元用于对全部所述子图进行聚合，得到更新后的大规模图数据。When the current round of updates is completed and there is no message cache in the message storage unit, the aggregation calculation unit is used to aggregate all the subgraphs to obtain updated large-scale graph data.

优选的，当所述子图被所述数据加载模块获取时，所述存储管理模块还用于将所述子图的状态设置为等待计算。Preferably, when the subgraph is acquired by the data loading module, the storage management module is further used to set the state of the subgraph to waiting for calculation.

优选的，当所述子图被传输至所述数据计算模块时，所述存储管理模块还用于将所述子图的状态设置为正在计算。Preferably, when the subgraph is transmitted to the data calculation module, the storage management module is further used to set the state of the subgraph to being calculated.

优选的，当所述子图被传输至所述数据释放模块时，所述存储管理模块还用于将所述子图的状态设置为释放中。Preferably, when the sub-graph is transmitted to the data release module, the storage management module is further used to set the state of the sub-graph to being released.

本申请具有以下优点：This application has the following advantages:

在本申请的实施例中，相对于现有基于单机的大规模图处理系统通信开销或I/O开销较高的问题，本申请提供了将基于子图中心的计算模型应用到单机系统并建立一套流水线处理架构的解决方案，具体为：“一种基于单机的大规模图数据处理系统，包括数据加载模块、数据计算模块、数据释放模块、存储管理模块和磁盘；所述磁盘存储有由若干子图构成的大规模图数据；所述存储管理模块存储有与每一所述子图对应的状态信息；初始状态下，所述子图的状态为活跃；所述数据加载模块用于从所述磁盘获取状态为活跃的所述子图，并将所述子图传输至所述数据计算模块；所述数据计算模块用于对所述子图进行更新，并将更新产生的消息传输至所述存储管理模块；当更新后的所述子图存在改变时，所述数据计算模块还用于将所述子图传输至所述数据释放模块；当所述子图非当前轮更新中的最后一个时，所述数据释放模块用于将所述子图写入所述磁盘；当所述子图被写入所述磁盘时，所述存储管理模块用于将所述子图的状态设置为收敛”。通过将基于子图中心的计算模型应用到单机系统，并且建立一套独特的流水线处理架构，该架构能够重叠数据I/O和CPU操作，从而降低传统的顶点中心计算模型的I/O成本同时提高CPU利用率，并促进对磁盘的顺序访问，此外，该架构采用共享内存数据结构进行消息传递和高效同步，能够将计算从内存管理和调度中分离出来，从而为优化提供了新的机会。In the embodiments of the present application, with respect to the problem of high communication overhead or I/O overhead of existing large-scale graph processing systems based on a single machine, the present application provides a solution for applying a sub-graph-centric computing model to a single machine system and establishing a pipeline processing architecture, specifically: "A large-scale graph data processing system based on a single machine, comprising a data loading module, a data computing module, a data release module, a storage management module and a disk; the disk stores large-scale graph data consisting of a number of sub-graphs; the storage management module stores status information corresponding to each of the sub-graphs; in an initial state, the sub-graph is active; The data loading module is used to obtain the subgraph with active status from the disk and transmit the subgraph to the data calculation module; the data calculation module is used to update the subgraph and transmit the message generated by the update to the storage management module; when the updated subgraph has changes, the data calculation module is also used to transmit the subgraph to the data release module; when the subgraph is not the last one in the current round of updates, the data release module is used to write the subgraph to the disk; when the subgraph is written to the disk, the storage management module is used to set the state of the subgraph to convergence. By applying the subgraph-centric computing model to a single-machine system and establishing a unique pipeline processing architecture, the architecture can overlap data I/O and CPU operations, thereby reducing the I/O cost of the traditional vertex-centric computing model while improving CPU utilization and promoting sequential access to the disk. In addition, the architecture uses shared memory data structures for message passing and efficient synchronization, which can separate computing from memory management and scheduling. out, thus providing new opportunities for optimization.

BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请的技术方案，下面将对本申请的描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solution of the present application, the drawings required for use in the description of the present application will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.

图1是顶点中心模型和子图中心模型上的连通分量计算过程示意图；FIG1 is a schematic diagram of the connected component calculation process on the vertex center model and the subgraph center model;

图2是本申请一实施例提供的一种大规模图数据处理系统的处理架构示意图；FIG2 is a schematic diagram of a processing architecture of a large-scale graph data processing system provided by an embodiment of the present application;

图3是本申请一实施例提供的一种大规模图数据处理系统的状态管理及优化策略示意图。FIG3 is a schematic diagram of a state management and optimization strategy for a large-scale graph data processing system provided in an embodiment of the present application.

DETAILED DESCRIPTION

为使本申请的所述目的、特征和优点能够更加明显易懂，下面结合附图和具体实施方式对本申请做进一步详细的说明。显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to make the objects, features and advantages of the present application more obvious and understandable, the present application is further described in detail below in conjunction with the accompanying drawings and specific implementation methods. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in the field without creative work are within the scope of protection of the present application.

发明人通过分析现有技术发现，基于顶点中心的计算模型无疑会因为顶点之间消息的传递带来额外的通信开销或I/O开销，如图1所示，在输入图G上完成连通分量的计算，采用基于子图中心的计算模型(该模型允许计算过程中的信息在子图内部自由传递)比采用基于顶点中心的计算模型明显拥有更少的计算步骤。By analyzing the prior art, the inventors found that the vertex-centric computing model will undoubtedly bring additional communication overhead or I/O overhead due to the transmission of messages between vertices. As shown in Figure 1, when calculating the connected components on the input graph G, the subgraph-centric computing model (which allows information in the calculation process to be freely transmitted within the subgraph) has significantly fewer computing steps than the vertex-centric computing model.

但是，传统的基于子图中心的计算模型是为多机系统设计的，还没有工作将基于子图中心的计算模型引入单机环境中，因此有一些问题尚未明确，例如：当通信代价潜在地转换为I/O代价之后，引入基于子图中心的计算模型，能否系统地降低核外图系统的I/O成本并提高多核并行性？传统的基于子图中心的计算模型要求对图执行更细粒度的划分来提升并行度，但是这样做的代价是更多冗余的控制信息，比如全局顶点ID到本地顶点ID的映射，这一问题在分布式环境下可以通过为每台计算节点分配足够多的内存解决，但是在单机多核环境下更细粒度的图划分会占据本就珍贵的内存资源。However, the traditional subgraph-centric computing model is designed for multi-machine systems, and no work has yet introduced the subgraph-centric computing model into a single-machine environment. Therefore, some issues remain unclear. For example, when the communication cost is potentially converted into I/O cost, can the introduction of the subgraph-centric computing model systematically reduce the I/O cost of the out-of-core graph system and improve multi-core parallelism? The traditional subgraph-centric computing model requires a finer-grained partitioning of the graph to improve parallelism, but this The cost is more redundant control information, such as the mapping of global vertex IDs to local vertex IDs. This problem can be solved in a distributed environment by allocating enough memory to each computing node, but in a single-machine multi-core environment, finer-grained graph partitioning will occupy the already precious memory resources.

发明人认为，将基于子图中心的计算模型扩展到单机系统将面临以下改进需求：当输入图超过内存容量时，单机系统需要借助辅助存储器(如硬盘、SSD等)作为内存扩展进行计算，因此需要合理管理图在内存和磁盘间的调度；传统的基于子图中心的计算模型通过计算机网络在计算单元之间传递消息以进行同步，但在共享内存的情况下，单机系统的同步逻辑发生了改变，因此，需要更有效地实现消息同步；基于子图中心的计算模型仅利用数据分区并行性，这可能导致在内存容量有限的情况下，CPU内核利用率不足或图碎片过多，因此，需要平衡考虑子图间的并行计算与子图内的并行计算；由于单机系统共享内存架构，工作迁移在单个机器的核心之间的成本很低，因此，需要采用灵活的资源调度来提高系统性能。The inventors believe that extending the subgraph-centric computing model to a stand-alone system will face the following improvement requirements: when the input graph exceeds the memory capacity, the stand-alone system needs to use auxiliary storage (such as hard disk, SSD, etc.) as memory extension for calculation, so it is necessary to reasonably manage the scheduling of the graph between memory and disk; the traditional subgraph-centric computing model transmits messages between computing units through a computer network for synchronization, but in the case of shared memory, the synchronization logic of the stand-alone system has changed, so it is necessary to implement message synchronization more effectively; the subgraph-centric computing model only utilizes data partition parallelism, which may lead to insufficient CPU core utilization or excessive graph fragmentation when the memory capacity is limited, so it is necessary to balance the parallel computing between subgraphs and the parallel computing within the subgraph; due to the shared memory architecture of the stand-alone system, the cost of work migration between the cores of a single machine is very low, so flexible resource scheduling is needed to improve system performance.

本实施例中，提供一种基于单机的大规模图数据处理系统，包括数据加载模块、数据计算模块、数据释放模块、存储管理模块和磁盘；所述磁盘存储有由若干子图构成的大规模图数据；所述存储管理模块存储有与每一所述子图对应的状态信息；初始状态下，所述子图的状态为活跃；In this embodiment, a large-scale graph data processing system based on a single machine is provided, comprising a data loading module, a data calculation module, a data release module, a storage management module and a disk; the disk stores large-scale graph data consisting of a plurality of subgraphs; the storage management module stores state information corresponding to each of the subgraphs; in an initial state, the state of the subgraph is active;

在本申请的实施例中，相对于现有基于单机的大规模图处理系统通信开销或I/O开销较高的问题，本申请将基于子图中心的计算模型应用到单机系统，并建立了一套流水线处理架构。参照图2，给定一个大图G(大图G最初存储在磁盘上)，所述流水线处理架构以图G的子图{F0,F1,F2,F3,...,Fn-1}作为最小输入输出单元，并以流水线迭代地对大图G进行更新，具体来说，该架构将子图Fi的核外处理分解为三个连续阶段：将Fi读入内存，计算并更新Fi，以及如果需要，将更新后的Fi写回外存，通过三个模块，即数据加载模块、数据计算模块和数据释放模块完成这些阶段，这些模块在流水线处理架构中，通过两个任务队列“输入队列”和“输出队列”异步工作。In the embodiment of the present application, compared with the existing large-scale graph processing system based on a single machine, In order to solve the problem of high pin or I/O overhead, the present application applies the subgraph-centric computing model to a single-machine system and establishes a pipeline processing architecture. Referring to Figure 2, given a large graph G (large graph G is initially stored on disk), the pipeline processing architecture uses the subgraph {F0, F1, F2, F3, ..., Fn-1} of graph G as the minimum input and output unit, and updates the large graph G iteratively with a pipeline. Specifically, the architecture decomposes the out-of-core processing of the subgraph Fi into three consecutive stages: reading Fi into memory, calculating and updating Fi, and if necessary, writing the updated Fi back to external memory. These stages are completed through three modules, namely, a data loading module, a data calculation module, and a data release module. In the pipeline processing architecture, these modules work asynchronously through two task queues "input queue" and "output queue".

所述流水线处理架构有效地重叠了子图I/O和CPU操作，在内存子图上进行计算，同时从磁盘加载挂起的子图，可以降低传统的顶点中心计算模型的I/O成本，同时通过减少空闲等待提高CPU的利用率，并且能够连续访问磁盘；此外，该架构采用共享内存数据结构进行消息传递和高效同步，能够将计算从内存管理和调度中分离出来，从而为优化提供了新的机会。The pipeline processing architecture effectively overlaps subgraph I/O and CPU operations, performs calculations on memory subgraphs while loading suspended subgraphs from disk, which can reduce the I/O cost of traditional vertex-centric computing models, while improving CPU utilization by reducing idle waits and enabling continuous access to disks; in addition, the architecture uses shared memory data structures for message passing and efficient synchronization, which can separate computing from memory management and scheduling, thereby providing new opportunities for optimization.

下面，将对本示例性实施例中一种基于单机的大规模图数据处理系统做进一步地说明。Next, a large-scale graph data processing system based on a single machine in this exemplary embodiment will be further described.

本实施例中，所述系统采用基于混合计算模型的APIs。该APIs采用统一的PIE+接口，这一套接口集成了顶点中心和子图中心的编程模型，用户不仅可以在基于子图中心的计算模型下并行化顺序图算法以简化并行编程(子图间并行)，还可以通过新的接口进一步探索基于顶点中心计算模型的子图内部的并行性。需要说明的是，所述混合模型同时支持“子图中心计算模型”的子图间并行和“顶点中心的计算模型”的子图内并行，在有限内存下，可以更好地利用多核资源，避免输入图的碎片化；此外，它还提供了一个统一的界面，用户可以从中选择最适合他们的应用程序和图形的界面。In this embodiment, the system adopts APIs based on a hybrid computing model. The APIs adopt a unified PIE+ interface, which integrates vertex-centric and subgraph-centric programming models. Users can not only parallelize sequential graph algorithms under the subgraph-centric computing model to simplify parallel programming (inter-subgraph parallelism), but also further explore the parallelism within the subgraph based on the vertex-centric computing model through new interfaces. It should be noted that the hybrid model supports both inter-subgraph parallelism of the "subgraph-centric computing model" and intra-subgraph parallelism of the "vertex-centric computing model". Under limited memory, it can better utilize multi-core resources and avoid fragmentation of the input graph; in addition, it provides a unified interface from which users can choose the interface that best suits their applications and graphics.

本实施例中，所述系统还包括调度器。所述调度器用于在线程池中跟踪并分配线程，其中每个线程对应于一个物理CPU核心，它决定将物理线程分配给虚拟工作线程，以便在子图上执行(并行)计算，它还进行主动调整以支持两级并行：当线程可用时，所述调度器通过消耗“输入队列”来将其分配给新的计算单元，以加快子图间的并行性，或通过正在运行的工作核心来改善子图内的并行性。In this embodiment, the system further includes a scheduler. The scheduler is used to track and allocate threads in a thread pool, where each thread corresponds to a physical CPU core. It decides to allocate physical threads to virtual worker threads to perform (parallel) calculations on subgraphs. It also actively adjusts to support two levels of parallelism: when threads are available, the scheduler allocates them to new computing units by consuming the "input queue" to speed up the parallelism between subgraphs, or through the running worker cores. To improve the parallelism within a subgraph.

本实施例中，所述存储管理模块包括消息存储单元；所述数据计算模块用于将更新产生的消息传输至所述消息存储单元。所述消息存储单元用于实现并行计算单元之间的消息同步。具体的，所述消息存储单元被实现为一种内存中的数据结构，为了提高空间效率，可以将其实现为一个紧凑的可变长度数组。需要说明的是，所述消息存储单元的空间复杂度与划分策略密切相关，如果有越多的边界顶点/边，那么所述消息存储单元消耗的空间就越大。与多机系统的消息传递策略相比，采用消息存储单元在共享内存环境中工作效率更高。In this embodiment, the storage management module includes a message storage unit; the data calculation module is used to transmit the updated messages to the message storage unit. The message storage unit is used to realize message synchronization between parallel computing units. Specifically, the message storage unit is implemented as a data structure in memory. In order to improve space efficiency, it can be implemented as a compact variable-length array. It should be noted that the space complexity of the message storage unit is closely related to the partitioning strategy. The more boundary vertices/edges there are, the larger the space consumed by the message storage unit. Compared with the message passing strategy of a multi-machine system, the use of a message storage unit is more efficient in a shared memory environment.

本实施例中，所述存储管理模块还包括状态管理单元；所述状态管理单元存储有所述状态信息，并可以在特定时刻更新所述状态信息。所述状态管理单元用于维护一个状态机来建模子图的状态。具体的，所述状态管理单元被实现为一个轻量级的数据结构，每个子图只维护几个状态，占用的内存空间可以忽略不计。In this embodiment, the storage management module further includes a state management unit; the state management unit stores the state information and can update the state information at a specific time. The state management unit is used to maintain a state machine to model the state of the subgraph. Specifically, the state management unit is implemented as a lightweight data structure, and each subgraph only maintains a few states, and the memory space occupied can be ignored.

采用所述状态管理单元记录子图的状态信息是一种低成本的收敛检测方法，只需要使用一个标记列表M来帮助跟踪计算单元之间的消息交换，以及一个轻量级状态机来对每个计算单元的工作进度建模。具体的，所述状态管理单元构建了一个标记列表M，每个子图对应一个标记，用于指示其是否在该轮迭代接收到有任何消息，如果一个子图至少有一个挂起的更新要从所述消息存储单元中提取，则其对应的M[i]为真，否则，M[i]为假。实际操作时，可以采用有限状态机对每个子图的进度进行建模，并使用标志M[i]来触发子图的状态转换。Using the state management unit to record the state information of the subgraph is a low-cost convergence detection method. It only requires a tag list M to help track the message exchange between computing units, and a lightweight state machine to model the work progress of each computing unit. Specifically, the state management unit constructs a tag list M, and each subgraph corresponds to a tag to indicate whether it has received any messages in this round of iteration. If a subgraph has at least one pending update to be extracted from the message storage unit, its corresponding M[i] is true, otherwise, M[i] is false. In actual operation, a finite state machine can be used to model the progress of each subgraph, and the flag M[i] can be used to trigger the state transition of the subgraph.

如图3所示，所述子图的状态包括“活跃”、“等待计算”、“正在计算”、“释放中”和“收敛”五种，在任何时刻，所述子图都处于五种状态中的一种，其中，前两种状态表示所述子图在磁盘上，剩下的状态表示所述子图在内存中。每个所述子图的初始状态都是“活跃”，意味着所述子图正在等待所述数据加载模块将其载入到内存；当所述子图被所述数据加载模块获取时，所述状态管理单元用于将所述子图的状态设置为“等待计算”，意味着所述子图已经驻留在内存中，所述子图等待被分配处理核心；当所述子图被传输至所述数据计算模块时，所述状态管理单元用于将所述子图的状态设置为“正在计算”，意味着所述子图正在被处理核心计算处理；当所述子图被传输至所述数据释放模块(也即所述子图在当前轮更新产生了需要发送给其他所述子图的消息)时，所述状态管理单元用于将所述子图的状态设置为“释放中”；当所述子图被写入所述磁盘时，所述状态管理单元用于将所述子图的状态设置为“收敛”；当当前轮更新结束时，所述状态管理单元还用于将参与下一轮更新的子图的状态设置为活跃，以便这部分子图开始下一轮的更新；当且仅当当前轮更新结束且所述消息存储单元中不存在消息缓存时，整个系统停止更新。As shown in Figure 3, the states of the subgraphs include "active", "waiting for calculation", "calculating", "releasing" and "converging". At any time, the subgraph is in one of the five states, where the first two states indicate that the subgraph is on the disk, and the remaining states indicate that the subgraph is in the memory. The initial state of each subgraph is "active", which means that the subgraph is waiting for the data loading module to load it into the memory; when the subgraph is obtained by the data loading module, the state management unit is used to set the state of the subgraph to "waiting for calculation", which means It means that the subgraph has been resident in the memory and the subgraph is waiting to be assigned a processing core; when the subgraph is transmitted to the data computing module, the state management unit is used to set the state of the subgraph to "computing", which means that the subgraph is being processed by the processing core; when the subgraph is transmitted to the data release module (that is, the subgraph generates a message that needs to be sent to other subgraphs in the current round of update), the state management unit is used to set the state of the subgraph to "releasing"; when the subgraph is written to the disk, the state management unit is used to set the state of the subgraph to "converging"; when the current round of update ends, the state management unit is also used to set the state of the subgraph participating in the next round of update to active, so that this part of the subgraph starts the next round of update; if and only if the current round of update ends and there is no message cache in the message storage unit, the entire system stops updating.

在一定条件下，所述系统可以跳过一轮计算中的某些状态而不影响正确性，也就是说，可以在状态转换中采取一些“捷径”，并减少不必要的计算和I/O。Under certain conditions, the system can skip certain states in a round of computation without affecting correctness, that is, it can take some "shortcuts" in state transitions and reduce unnecessary computation and I/O.

如图3所示，本实施例中，当当前轮更新结束时，所述存储管理模块用于将接收到消息的所述子图的状态设置为“活跃”(“捷径A”)。为了开始新一轮增量计算，状态为“收敛”的所述子图的状态需要被重置为“活跃”，如果此时一个子图对应的M[i]为真，可以让该子图保持在“收敛”状态，使其不参与下一轮的更新，由此可以全面跳过对无需更新的子图的处理，而不影响程序的正确性。“捷径A”经常在输入图没有被很好地连接，某一子图是“孤立的”的情况下被利用，并有效减少了I/O成本。As shown in FIG3 , in this embodiment, when the current round of updates is completed, the storage management module is used to set the state of the subgraph that received the message to "active" ("shortcut A"). In order to start a new round of incremental calculations, the state of the subgraph with a "convergence" state needs to be reset to "active". If M[i] corresponding to a subgraph is true at this time, the subgraph can be kept in the "convergence" state so that it does not participate in the next round of updates. This allows the processing of subgraphs that do not need to be updated to be skipped without affecting the correctness of the program. "Shortcut A" is often used when the input graph is not well connected and a subgraph is "isolated", and effectively reduces I/O costs.

本实施例中，当所述子图为当前轮更新中的最后一个时，所述数据释放模块用于将所述子图传输至所述数据加载模块(“捷径B”)。在全部所述子图完成当前轮更新后会开始新一轮的更新，如果一个子图仍然处于“释放中”状态，也就是说还没有完全保存到磁盘上，可以直接将该子图的状态设置为“等待计算”，由此可以在不经过磁盘的情况下启动对该子图的新一轮更新。“捷径B”可以在每一轮结束时被利用，并有效减少了I/O成本。In this embodiment, when the subgraph is the last one in the current round of updates, the data release module is used to transfer the subgraph to the data loading module ("shortcut B"). After all the subgraphs have completed the current round of updates, a new round of updates will begin. If a subgraph is still in the "releasing" state, that is, it has not been completely saved to the disk, the state of the subgraph can be directly set to "waiting for calculation", thereby starting a new round of updates to the subgraph without going through the disk. "Shortcut B" can be used at the end of each round and effectively reduces I/O costs.

本实施例中，当更新后的所述子图不存在改变时，所述数据计算模块用于将所述子图写入所述磁盘(“捷径C”)。当一个子图更新完成时，如果其相比于计算前并没有任何改变，可以跳过“释放中”状态，直接将其设置为“收敛”，由此可以有效减少冗余的磁盘写入。In this embodiment, when there is no change in the updated sub-graph, the data calculation module is used to write the sub-graph to the disk ("shortcut C"). There is no change compared to before calculation. You can skip the "releasing" state and directly set it to "converging", which can effectively reduce redundant disk writes.

本实施例中，所述数据计算模块包括聚合计算单元，当当前轮更新结束且所述消息存储单元中不存在消息缓存时，所述聚合计算单元用于调用预设的聚合函数对全部所述子图进行聚合，得到更新后的大规模图数据。In this embodiment, the data calculation module includes an aggregation calculation unit. When the current round of update is completed and there is no message cache in the message storage unit, the aggregation calculation unit is used to call a preset aggregation function to aggregate all the sub-graphs to obtain updated large-scale graph data.

尽管已描述了本申请实施例的优选实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例做出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。Although the preferred embodiments of the present application have been described, those skilled in the art may make additional changes and modifications to these embodiments once they have learned the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications that fall within the scope of the embodiments of the present application.

最后，还需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。Finally, it should be noted that, in this article, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "include", "comprise" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or terminal device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or terminal device. In the absence of further restrictions, the elements defined by the sentence "comprise a ..." do not exclude the existence of other identical elements in the process, method, article or terminal device including the elements.

以上对本申请所提供的一种基于单机的大规模图数据处理系统，进行了详细介绍，本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的一般技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。 The above is a detailed introduction to a large-scale graph data processing system based on a single machine provided by the present application. This article uses specific examples to illustrate the principles and implementation methods of the present application. The description of the above embodiments is only used to help understand the method of the present application and its core idea; at the same time, for general technical personnel in this field, according to the idea of the present application, there will be changes in the specific implementation method and application scope. In summary, the content of this specification should not be understood as a limitation on the present application.

Claims

A large-scale graph data processing system based on a single machine, characterized by comprising a data loading module, a data calculation module, a data release module, a storage management module and a disk; the disk stores large-scale graph data consisting of a plurality of subgraphs; the storage management module stores state information corresponding to each of the subgraphs; in an initial state, the state of the subgraph is active;

The data loading module is used for acquiring the sub-graph in the active state from the disk, and transmitting the sub-graph to the data calculation module;

The data calculation module is used to update the subgraph and transmit the message generated by the update to the storage management module;

When there is a change in the updated sub-graph, the data calculation module is further used to transmit the sub-graph to the data release module;

When the sub-graph is not the last one in the current round of update, the data release module is used to write the sub-graph into the disk;

When the subgraph is written to the disk, the storage management module is used to set the state of the subgraph to converged.

The system according to claim 1 is characterized in that when there is no change in the updated sub-graph, the data calculation module is also used to write the sub-graph to the disk.

The system according to claim 1 is characterized in that when the sub-graph is the last one in the current round of updates, the data release module is also used to transfer the sub-graph to the data loading module.

The system according to claim 1 is characterized in that when the current round of update is completed, the storage management module is also used to set the state of the subgraph that received the message to active.

The system according to claim 1 is characterized in that when the current round of updates is completed and there is no message cache in the storage management module, the data calculation module is also used to aggregate all the sub-graphs to obtain updated large-scale graph data.

The system according to claim 1, characterized in that the storage management module comprises a message storage unit and a state management unit; the state management unit stores the state information;

The data calculation module is used to transmit the updated message to the message storage unit;

When the current round of update is finished, the state management unit is used to update the subgraph received the message Set the status to Active.

The system according to claim 6, characterized in that the data calculation module includes an aggregation calculation unit;

When the current round of updates is completed and there is no message cache in the message storage unit, the aggregation calculation unit is used to aggregate all the subgraphs to obtain updated large-scale graph data.

The system according to claim 1 is characterized in that when the subgraph is acquired by the data loading module, the storage management module is also used to set the state of the subgraph to waiting for calculation.

The system according to claim 1 is characterized in that when the subgraph is transmitted to the data computing module, the storage management module is also used to set the state of the subgraph to being calculated.

The system according to claim 1 is characterized in that when the sub-graph is transmitted to the data release module, the storage management module is also used to set the state of the sub-graph to being released.