WO2024255500A1 - 一种多线程并发管理方法和相关装置 - Google Patents

一种多线程并发管理方法和相关装置 Download PDF

Info

Publication number
WO2024255500A1
WO2024255500A1 PCT/CN2024/092938 CN2024092938W WO2024255500A1 WO 2024255500 A1 WO2024255500 A1 WO 2024255500A1 CN 2024092938 W CN2024092938 W CN 2024092938W WO 2024255500 A1 WO2024255500 A1 WO 2024255500A1
Authority
WO
WIPO (PCT)
Prior art keywords
thread
target
variable
sub
bit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2024/092938
Other languages
English (en)
French (fr)
Other versions
WO2024255500A8 (zh
Inventor
李修昶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of WO2024255500A1 publication Critical patent/WO2024255500A1/zh
Publication of WO2024255500A8 publication Critical patent/WO2024255500A8/zh
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements

Definitions

  • the present application relates to the field of computer technology, and in particular to a multi-threaded concurrent management method and related devices.
  • a thread is a single sequential control flow in a process and is the smallest unit of program execution.
  • threads are used as the basic unit of independent operation and scheduling. Threads can be executed concurrently, including concurrent execution of multiple threads in a process, concurrent execution of threads in different processes, and concurrent execution of threads on different cores in a multi-core computer system.
  • CAS compare and switch
  • a common application scenario for CAS operations is in read-write locks or other functions that implement similar read-write functions (such as pinbuffer/unpinbuffer). Read-write locks have two states: read lock and write lock.
  • Read-write locks generally use a variable for state control, and consistency can be ensured between multiple threads through CAS operations.
  • the embodiments of the present application provide a multi-thread concurrent management method and related devices, which are applied in the scenario of multi-thread concurrent CAS operation and can improve the success rate of CAS operation.
  • an embodiment of the present application provides a multi-threaded concurrency management method, which may include: dividing a target global variable into multiple sub-variable areas; the target global variable is used to control multi-threaded access to target data, and each of the multiple sub-variable areas includes one or more flag bits; dividing multiple threads into multiple thread groups; the multiple threads are used to apply for permission to read the target data; authorizing a target thread in a target thread group to modify the flag bit in a target sub-variable area; wherein the target thread group is one of the multiple thread groups, the target sub-variable area is one of the multiple sub-variable areas, and one of the thread groups and one of the multiple sub-variable areas correspond one to one; if the target thread successfully modifies the flag bit, the target thread is allowed to read the target data.
  • the multiple threads applying to read the shared data are divided into multiple thread groups, so that one thread group corresponds to one sub-variable area, and different thread groups correspond to different sub-variable areas, so that when threads in different thread groups apply to read shared data, they can obtain the permission to read the shared data by modifying the value of the corresponding sub-variable area.
  • the modification of the values of different sub-variable areas between threads of different thread groups is independent, so that the value of each sub-variable area can be successfully modified by one thread respectively, that is, in one round of concurrent modification, multiple threads can be successfully modified.
  • the method further includes: determining the total number of threads that can currently read the target data according to the state of the mark bit in each sub-variable area.
  • the number of threads that can currently read the shared data represented by each sub-variable area is determined according to the state of the flag bit in each sub-variable area, and the total number of threads that can currently read the shared data is the sum of the number of threads that can be determined in each sub-variable area.
  • the lock will fail and the thread will go into hibernation and wait.
  • the thread that previously applied for a write lock and temporarily went into hibernation can be awakened.
  • one thread in the target thread group is allowed to successfully modify the mark bit in the target sub-variable area.
  • the authorizing a target thread in a target thread group to modify a flag bit in a target subvariable area includes:
  • the target thread in the target thread group is authorized to modify the flag bit in the target sub-variable area through a compare and exchange (CAS) operation.
  • CAS compare and exchange
  • the thread during the process of modifying the mark bit of the sub-variable area by a thread, it can first be determined whether the target data has been exclusively occupied. If no other thread has exclusively occupied the target data, the thread can modify the mark bit through a CAS operation to ensure the consistency of the modification and avoid confusion.
  • each sub-variable area further includes a control status bit, and the control status bit is used to indicate whether the target data is exclusively occupied; and the method further includes:
  • the first thread If the first thread successfully modifies the control status bit, the first thread is allowed to exclusively occupy the target data.
  • each sub-variable area may also have a control status bit, and the control status bit may be used to identify whether the target data has been exclusively occupied by a thread.
  • the thread needs to modify the control status bit in each sub-variable area, so that when other threads subsequently apply to read the target data, they only need to determine whether the target data is exclusively occupied by judging the control status bit in the sub-variable area corresponding to their own thread group. If the thread applying for exclusive use of the target data successfully modifies the control status bit, the thread is allowed to exclusively occupy the target data.
  • the step of authorizing the first thread to modify the control status bit of each sub-variable area includes:
  • the first thread is authorized to modify the control status bit of each sub-variable area through a CAS operation.
  • the thread can modify the control status bit of each sub-variable area through a CAS operation to ensure the consistency of the modification and avoid confusion when multiple threads apply for exclusive use of the target data at the same time.
  • the global variable is a lock state variable
  • the mark bit is a read lock count bit
  • the threads in the multiple thread groups modify the mark bit in the sub-variable area to add a read lock or an unread lock.
  • the method of controlling multi-threaded access to shared data can be a read-write lock
  • the global variable can be a lock state variable
  • the mark bit can be a read lock count bit
  • the process of multiple threads applying to read the target data and modifying the mark bit is equivalent to the encryption and unencryption lock operation.
  • the lock state variable is partitioned, the threads are grouped, and multiple threads can modify the read lock count bit of the partition in multiple parallel ways, multiple threads can modify successfully, thereby improving the success rate of encryption and unencryption lock.
  • the global variable is a lock state variable
  • the control state bit is a write lock state bit
  • the threads in the multiple thread groups modify the control state bit in the sub-variable area to add a write lock or release a write lock.
  • the method of controlling multi-threaded access to shared data can be a read-write lock
  • the global variable can be a lock state variable
  • the control state bit can be a write lock state bit
  • the process of multiple threads applying for exclusive use of the target data and modifying the write lock state bit is equivalent to adding and unlocking the write lock operation.
  • an embodiment of the present application provides a multi-threaded concurrency management device, which may include: a first processing unit, configured to divide a target global variable into a plurality of sub-variable areas; the target global variable is used to control multi-threaded access to target data, each of the plurality of sub-variable areas comprising one or more flag bits;
  • a second processing unit is used to divide the multiple threads into multiple thread groups; the multiple threads are used to apply for permission to read the target data;
  • Target thread group is one of the multiple thread groups
  • target sub-variable region is one of the multiple sub-variable regions
  • one of the multiple thread groups and one of the multiple sub-variable regions correspond to each other one by one
  • the third processing unit is configured to allow the target thread to read the target data if the target thread successfully modifies the mark bit.
  • the device further includes:
  • a determination unit is used to determine the total number of threads that can currently read the target data according to the state of the mark bit in each sub-variable area.
  • one thread in the target thread group is allowed to successfully modify the mark bit in the target sub-variable area.
  • the second processing unit is specifically configured to:
  • the target thread in the target thread group is authorized to modify the flag bit in the target sub-variable area through a compare and exchange (CAS) operation.
  • CAS compare and exchange
  • each of the sub-variable regions further includes a control status bit, and the control status bit is used to indicate whether the target data is exclusively occupied;
  • the second processing unit is further used to authorize the first thread of the target thread group to modify the control status bit of each sub-variable region respectively when the first thread applies for exclusive use of the target data;
  • the third processing unit is further configured to allow the first thread to exclusively occupy the target data if the first thread successfully modifies the control status bit.
  • the second processing unit is specifically configured to:
  • the first thread is authorized to modify the control status bit of each sub-variable area through a CAS operation.
  • the global variable is a lock state variable
  • the mark bit is a read lock count bit
  • the threads in the multiple thread groups modify the mark bit in the sub-variable area to add a read lock or an unread lock.
  • the global variable is a lock state variable
  • the control state bit is a write lock state bit
  • the threads in the multiple thread groups modify the control state bit in the sub-variable area to add a write lock or release a write lock.
  • an embodiment of the present application provides a multi-threaded concurrent management device, including a processor, the processor being configured to support the device to implement the corresponding functions in the multi-threaded concurrent management method provided in the first aspect.
  • the device may also include a memory, the memory is used to couple with the processor, and the memory stores the necessary program instructions and data of the device.
  • the device may also include an interface circuit for the device to communicate with other devices, other equipment or a communication network.
  • an embodiment of the present application provides a computer-readable storage medium for storing computer software instructions used by a device apparatus for implementing a multi-threaded concurrent management method provided by one or more of the second aspects above, which includes a program designed for executing the above aspects.
  • an embodiment of the present application provides a computer program, which includes instructions.
  • the computer program When the computer program is executed by a computer, the computer can execute a process executed by an apparatus for implementing a multi-threaded concurrent management method provided by one or more of the second aspects above.
  • an embodiment of the present application provides an electronic device, the electronic device includes a processor, and the processor is configured to support the electronic device to implement the corresponding functions in the multi-threaded concurrent management method provided in the first aspect.
  • the electronic device may also include a memory, the memory is used to couple with the processor, and the memory stores the necessary program instructions and data of the electronic device.
  • the electronic device may also include a communication interface for enabling the electronic device to communicate with other devices or a communication network.
  • an embodiment of the present application provides a chip system, which includes a processor for supporting a device to implement the functions involved in the first aspect, for example, generating or processing the information involved in the multi-threaded concurrent management method.
  • the chip system also includes a memory, which is used to store program instructions and data necessary for the device.
  • the chip system can be composed of a chip, or it can include a chip and other discrete devices.
  • an embodiment of the present application provides a server, including a communication interface, a memory, and a processor; the communication interface, the memory, and the processor are coupled, the communication interface is used for the server to communicate with other devices or a communication network, and the memory is used to store computer program code.
  • the computer program code includes computer instructions.
  • an embodiment of the present application provides a vehicle-mounted device, including a communication interface, a memory and a processor; the communication interface, the memory and the processor are coupled, the communication interface is used for the vehicle-mounted device to communicate with other devices or a communication network, the memory is used to store computer program code, the computer program code includes computer instructions, and when the processor reads the computer instructions from the memory, the vehicle-mounted device executes any possible implementation method as in the first aspect.
  • FIG1 is a schematic diagram of the structure of a lock state variable
  • FIG2 is a schematic diagram of a read-write lock process
  • FIG3 is a schematic diagram of the result of a multi-threaded CAS operation performed simultaneously
  • FIG4 is a schematic diagram of a system architecture of a multi-threaded concurrent management method application provided in an embodiment of the present application
  • FIG5 is a flow chart of a multi-threaded concurrent management method provided in an embodiment of the present application.
  • FIG6 is a schematic diagram of the structure of a lock state variable partition provided in an embodiment of the present application.
  • FIG7 is a schematic diagram of a result of a multi-thread group performing CAS operations simultaneously provided by an embodiment of the present application
  • FIG8 is a performance analysis perf flame graph provided in an embodiment of the present application.
  • FIG9 is another performance analysis perf flame graph provided in an embodiment of the present application.
  • FIG10 is another performance analysis perf flame graph provided in an embodiment of the present application.
  • FIG11 is a schematic diagram of the structure of a multi-threaded concurrent management device provided in an embodiment of the present application.
  • FIG. 12 is a schematic diagram of the structure of another multi-threaded concurrent management device provided in an embodiment of the present application.
  • a component can be, but is not limited to, a process running on a processor, a processor, an object, an executable file, an execution thread, a program and/or a computer.
  • applications running on a computing device and a computing device can be components.
  • One or more components may reside in a process and/or an execution thread, and a component may be located on a computer and/or distributed between two or more computers.
  • these components may be executed from various computer-readable media having various data structures stored thereon.
  • Components may, for example, communicate through local and/or remote processes based on signals having one or more data packets (e.g., data from two components interacting with another component between a local system, a distributed system and/or a network, such as the Internet interacting with other systems through signals).
  • signals having one or more data packets (e.g., data from two components interacting with another component between a local system, a distributed system and/or a network, such as the Internet interacting with other systems through signals).
  • Atomic operation refers to one or a series of operations that cannot be interrupted, that is, operations that will not be interrupted by the thread scheduling mechanism, and there will be no context switch during operation.
  • Atomic operations mainly include simple addition and subtraction operations, compare and exchange (CAS) operations, and assignment operations.
  • CAS compare and exchange
  • the variable is divided into multiple regions, and multiple threads are grouped.
  • the count value of the corresponding target region in the variable can be modified through a CAS operation. That is, by dividing the CAS operations of concurrent threads into multiple regions, the concurrency conflicts of CAS operations are reduced, thereby improving the success rate of CAS operations.
  • Read-write locks including read locks and write locks. Among them, only one thread can occupy the read-write lock in the write lock state at a time, but multiple threads can occupy the read-write lock in the read lock state at the same time. Therefore, when the read-write lock is in the write lock state, all threads attempting to lock the lock (including read lock and write lock) will be blocked before the lock is unlocked. When the read-write lock is in the read lock state, other threads attempting to add the read lock can obtain access rights, but if a thread wants to add the write lock, it must wait until all threads release the lock.
  • the read-write lock generally uses a variable for state control, and the consistency between multiple threads can be guaranteed through CAS operations.
  • the variable used to control the state of the read-write lock can be partitioned, and multiple threads with concurrent CAS operations can be grouped.
  • the count value of the corresponding target area in the variable can be modified through the CAS operation, and other areas are used to process concurrent CAS operations of threads in other groups, thereby improving the success rate of CAS.
  • the compare and switch (CAS) operation will pass in three values: the address of the target variable to be modified, the original value of the variable, and the new value to be modified.
  • memory access will be exclusively controlled in certain units (usually within a cache line). If the value of the target variable in the memory (target variable address) is equal to the "original value of the variable” passed in by the CAS operation, the target variable in the memory will be changed to the "new value to be modified” passed in by the CAS operation; if the value of the target variable in the memory is inconsistent with the "original value of the variable” passed in by the CAS operation, the modification fails and the current value of the target variable is transmitted. This ensures the atomicity of the target variable value modification in multi-threaded concurrent scenarios.
  • Concurrency means that multiple threads can run part of the logic in a period of time. The current thread can start executing regardless of whether the previous thread has completed, without having to wait for the previous thread to complete.
  • plural means two or more.
  • “and/or” describes the association relationship of related objects, indicating that three relationships can exist.
  • a and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone.
  • the character “/” generally indicates that the related objects are in an "or” relationship.
  • Solution 1 The common implementation of the read-write lock is to control it through a lock state variable, see Figure 1, which is a schematic diagram of the structure of a lock state variable. Taking Linux rwlock as an example, the variable shown in Figure 1 indicates that no thread currently holds the write lock, and 3 threads hold the read lock. When the variable is 0, it means that the read-write lock is in an idle state, that is, no thread holds the lock (including the read lock and the write lock). When the highest bit is 1, it means that a thread holds the write lock, and the highest bit is set to 0 when the thread releases the write lock.
  • each thread is synchronized through CAS operation instructions to ensure that only one thread can modify the variable at a time.
  • the write lock status bit does not have to be in the highest bit, or not all bits in the variable are used for read-write lock counting. Several bits can be used for other customized identification purposes.
  • Figure 2 is a schematic diagram of the process of adding read-write locks.
  • a thread applies for a write lock (such as method 1 in Figure 2)
  • the lock state is first judged to see whether it is a read lock or a write lock. If it has been locked, then the application for the write lock fails and needs to be reapplied (it can be retried immediately or waited to be awakened by other release processes); if it has not been locked, then the variable write lock state position needs to be set to 1, and then the CAS atomic operation is performed on the variable.
  • the write lock is added successfully; if the modification fails, it means that other threads have locked and modified the variable value first during this period, and the write lock fails, and enters the retry, wait or other processes.
  • a thread applies for a read lock (such as method 2 in Figure 2)
  • the lock state is first judged to see whether it is a write lock. If a write lock has been added, the read lock application fails and needs to be reapplied (you can retry immediately or wait to be awakened by other release processes); if a write lock has not been added, the variable read lock count bit needs to be increased by 1, and then the CAS atomic operation is performed on the variable.
  • the read lock is successfully added; if the modification fails, it means that other threads have modified the variable value during this period, that is, when multiple threads concurrently add read locks, other threads successfully add read locks, resulting in the failure of this thread to add read locks, and enter retry or other processes. Under normal circumstances, the read lock process will be retried immediately according to the latest value of the current lock state variable.
  • Figure 3 is a schematic diagram of the results of multiple threads performing CAS operations simultaneously. Taking 4 threads adding read locks at the same time as an example, the current value is read from the memory that stores the lock state variable, and the old value and the new value are the input values of the CAS operation.
  • each CAS operation In the four rounds of concurrent CAS operations, there are 4 valid CAS operations (ie OK), and 6 invalid CAS operations (ie failed). That is, the number of successful CAS operations is 4, and the number of failed operations is 6 (3 in the first round, 2 in the second round, and 1 in the third round).
  • the overhead of each CAS operation is relatively large, and it also requires synchronous memory access between multiple cores and across slices, occupying memory bandwidth and computing resources.
  • Solution 2 Other functions that implement similar read and write functions (such as pinbuffer/unpinbuffer) control the sharing, exclusivity, and other status identification of data pages.
  • sharing uses a sharing count, which is similar to adding a read lock and unreading a read lock.
  • Pinbuffer can be simply understood as adding a read lock
  • unpinbuffer can be simply understood as an unreading lock.
  • Kunpeng 2P+openGauss open source Gauss
  • the present application proposes a multi-threaded concurrent management method and related devices, which are applied in the scenario of multi-threaded concurrent CAS operations, and can improve the success rate of CAS operations.
  • the global variables of shared data are divided into multiple areas, and the multiple threads that apply to read the shared data are divided into multiple thread groups, one thread group corresponds to one variable area, and different thread groups correspond to different variable areas, so that when threads in different thread groups apply to read shared data, they can obtain the permission to read the shared data by modifying the value of the corresponding variable area.
  • the modification of the values of different variable areas between threads in different thread groups is independent, so that the value of each variable area can be modified successfully separately. Therefore, when multiple threads apply to read shared data, and concurrent CAS operations modify the values of different variable areas, multiple CAS operations can be successful, which improves the success rate of CAS operations.
  • FIG4 is a schematic diagram of the system architecture of a multi-threaded concurrent management method provided in the embodiment of the present application
  • the system architecture may include one or more central processing units (CPU), one or more graphics processing units (GPU), or one or more system-on-chips (SOC), and a memory.
  • CPU central processing units
  • GPU graphics processing units
  • SOC system-on-chips
  • the basic unit for independent operation and independent scheduling of the processor is a thread.
  • the system architecture shown in FIG4 includes multiple CPUs (such as CPU1-CPU5), wherein any one CPU (such as CPU1) can be used to execute the multi-threaded concurrent management method provided in the embodiment of the present application, firstly, the global variable used to control multi-threaded access to shared data is divided into multiple sub-variable areas, and multiple threads requesting to read shared data in multiple CPUs are grouped.
  • the threads of different thread groups can be authorized to modify the flag bits in different sub-variable areas, and if the thread successfully modifies the flag bits, CPU1 can allow it to read the shared data in the memory.
  • thread 1 when CPU1 runs independently and schedules thread 1 to apply for reading the shared data in the memory (taking the application for a read lock as an example), thread 1 can only modify the mark bit of the first partition of the global variable (such as the lock state variable). After the modification is successful, thread 1 can read the shared data; when CPU2 runs independently and schedules thread 2 to apply for reading the shared data, thread 2 can only modify the mark bit of the second partition of the global variable. After the modification is successful, thread 2 can read the shared data. Similarly, CPU3 runs or schedules thread 3 and CPU4 runs or schedules thread 4 to apply for reading the shared data. Thread 3 and thread 4 can modify the mark bits of the third partition and the fourth partition respectively. After the modification is successful, thread 3 and thread 4 can read the shared data.
  • the global variable such as the lock state variable
  • the length of the global variable can be 1 byte to 16 bytes, or even longer or shorter.
  • the range in different architecture instruction sets may be different.
  • Figure 4 only takes the global variable length equal to the cache line length as an example, which should not constitute a limitation on this application.
  • each partition of the global variable can also include a control status bit (such as a write lock status bit).
  • the control status bit of each partition can be modified.
  • the control status bit of each of the four partitions can be modified. After the modification is successful, thread 5 can perform operations such as reading and writing on the shared data.
  • the embodiments of the present application can be applied to various computer system architectures.
  • the architecture in FIG. 4 above is only an exemplary implementation in the embodiments of the present application.
  • the architecture applicable to the embodiments of the present application includes but is not limited to the above architecture.
  • the architecture of the computer can have more or fewer units/modules than those shown in the figure, can combine two or more units/modules, or can have different unit/module configurations.
  • Various units/modules can be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application-specific integrated circuits.
  • the multi-threaded concurrent management method provided in the embodiment of the present application can be executed by an electronic device.
  • An electronic device refers to a device that can be abstracted as a computer system, wherein an electronic device that supports the above-mentioned multi-threaded concurrent management function can also be referred to as a multi-threaded concurrent management device.
  • the multi-threaded concurrent management device can be a complete machine of the electronic device, such as: a server, a vehicle-mounted computer or a terminal device, and the terminal device can be a smart wearable device, a smart phone, a tablet computer, a laptop computer, a desktop computer, etc.; it can also be a system/device composed of multiple complete machines; it can also be a part of the electronic device, such as: a chip related to the multi-threaded concurrent management function, such as a processor, a system chip (system on a chip, SoC), etc., which is not specifically limited in the embodiment of the present application. Among them, the system chip is also called a system on chip. It is understandable that the above-mentioned multi-threaded concurrent management method provided in the embodiment of the present application can also be applied to the operating system, database, application software, middleware or underlying software of the above-mentioned electronic device.
  • FIG. 5 is a flow chart of a multi-threaded concurrent management method provided in an embodiment of the present application, the method including but not limited to the following steps:
  • the target global variable is used to control multi-threaded access to target data, which is a resource that can be shared by multiple threads, and can generally be referred to as shared data or shared resources.
  • the target data can be specific data, for example, a table that is shared and queried; the target data can also refer to a critical section, that is, a program fragment that accesses a shared resource (such as a shared device, a shared memory, etc.).
  • a global variable i.e., the target global variable
  • FIG. 6 is a structural schematic diagram of a lock state variable partition provided in an embodiment of the present application, wherein, taking the length of the lock state variable as 8 bytes as an example, the lock state variable is divided into 4 2-byte regions (including partition 1, partition 2, partition 3 and partition 4) in the figure, and each partition may include one or more read lock count bits.
  • the region for the global variable when dividing the region for the global variable, it can be divided equally (as shown in FIG6 ) or unequally, for example, it can be divided into four sub-variable regions with lengths of 1 byte, 1 byte, 2 bytes and 4 bytes, respectively, which is not specifically limited here.
  • the length of the sub-variable region can be divided according to the number of threads in different thread groups. When the number of threads in a thread group is large, the length of the sub-variable region corresponding to the thread group can be longer; on the contrary, when the number of threads in a thread group is small, the length of the sub-variable region corresponding to the thread group can be shorter.
  • the thread in order to ensure the consistency of the modification of the sub-variable area mark bit, the thread generally modifies the mark bit through atomic operations (such as CAS operations). Because the minimum unit that can be operated by atomic operations is 1 byte at present, the minimum length of a single sub-variable area in the global variable should be 1 byte; the maximum length of the entire global variable supports the maximum number of bytes that the computer can perform atomic operations on (such as cache lines).
  • atomic operations such as CAS operations
  • the global variable and the length range that the sub-variable area can support can also change accordingly, that is, the minimum length of the above-mentioned single sub-variable area is 1 byte, and the maximum length of the entire global variable is a cache line, which should not constitute a limitation on the embodiment of the present application.
  • S503 Authorize a target thread in the target thread group to modify a flag bit in the target subvariable area.
  • multiple threads applying to read the target data can be grouped so that the threads in the multiple thread groups can modify the mark bits in the multiple sub-variable areas respectively, and the threads in the target thread group modify the mark bits of the target sub-variable area.
  • one thread group corresponds to one sub-variable area
  • different thread groups correspond to different sub-variable areas, that is to say, a thread group in the multiple thread groups corresponds to a sub-variable area in the multiple sub-variable areas one by one.
  • the multiple threads may need to read the target data in the process of executing a certain task, so they apply for permission to read the target data.
  • the multiple threads can not only be used to apply for permission to read the target data, but also can be used to complete other tasks.
  • the target thread group among the multiple thread groups includes multiple threads
  • only one thread in the target thread group is allowed to successfully modify the mark bit in the sub-variable area (i.e., the target sub-variable area) corresponding to the target thread group.
  • the mark bit in the sub-variable area i.e., the target sub-variable area
  • only one thread can successfully modify the mark bit, thereby ensuring the consistency of the modification of the mark bit of the sub-variable area and avoiding confusion.
  • Other threads in the same thread group that fail to modify the mark bit can try again to modify the mark bit of the sub-variable area corresponding to the thread group.
  • the grouping of threads can be based on thread IDs, for example, threads 0 to 9 belong to group 1, threads 10 to 19 belong to group 2, threads 20 to 29 belong to group 3, etc.; the grouping of threads can also be based on identifying the group of the threads when they are created, for example, the threads created by CPU1 are identified as group 1, the threads created by CPU2 are identified as group 2, the threads created by CPU3 are identified as group 3, etc.
  • a request to read the target data may be applied for through the thread, that is, the thread is allowed to read the target data.
  • the threads in Group 1 can modify the mark bit (read lock count bit) in Partition 1
  • the threads in Group 2 can modify the mark bit in Partition 2, and so on.
  • the threads in Group 1 apply to read the target data, it is equivalent to applying for a read lock.
  • the thread can perform a lock state judgment on the 2 bytes corresponding to the thread group (e.g., Partition 1) according to the group of the thread group to which it belongs, to determine whether no thread holds a write lock, and then modify the read lock count bit of the 2 bytes (e.g., an atomic plus 1 operation). If the modification is successful, the thread holds the read lock. Corresponding to the application for read is the release. When the thread in the first group wants to release the target data, it is equivalent to applying for a read lock. The thread modifies the read lock count bit of the 2 bytes (partition 1) corresponding to the thread group according to the group of its own thread group (atomic minus 1 operation). If the modification is successful, the thread releases the read lock.
  • Partition 1 the group of the thread group to which it belongs
  • each of the above sub-variable areas may also include a control status bit, and the control status bit is used to indicate whether the target data is exclusively owned.
  • the control status bit is a write lock status bit, which is used to indicate whether a thread holds a write lock.
  • each partition also includes a write lock status bit.
  • the above method may also include steps S505 and S506, wherein,
  • the thread needs to modify the control status bit in each sub-variable area. If the thread successfully modifies the control status bit, the request for exclusive use of the above-mentioned target data can be applied through this thread, that is, the thread is allowed to exclusively use the above-mentioned target data.
  • the lock state variable length is 8 bytes
  • the 4 sub-variable areas are all 2 bytes as an example
  • the control status bit is the write lock status bit.
  • a thread in the above-mentioned multiple thread groups applies for exclusive use of the target data, it is equivalent to applying for adding a write lock.
  • the thread needs to perform a lock state judgment on the 8 bytes to determine whether no thread holds a write lock or a read lock, and then modify the write lock status bit in each 2-byte partition (such as an atomic addition operation). If the modification is successful, the thread holds the write lock.
  • Corresponding to applying for exclusive control is releasing exclusive control.
  • a thread wants to release the exclusive control over the target data it is equivalent to applying for a write lock.
  • the thread modifies the write lock status bit in each 2-byte partition (such as an atomic minus 1 operation). If the modification is successful, the thread releases the write lock.
  • the above method may further include step S507 in addition to steps S501 to S506, wherein:
  • S507 Determine the total number of threads that can currently read the target data according to the status of the mark bit in each sub-variable area.
  • the lock state variable length is 8 bytes
  • the 4 sub-variable areas are all 2 bytes as an example
  • the read lock count bit of partition 1 indicates that 3 threads hold read locks
  • partition 2 indicates that 5 threads hold read locks
  • partition 3 indicates that 2 threads hold read locks
  • partition 4 indicates that 2 threads hold read locks
  • the read-write lock when the read-write lock is in the read lock state, and another thread attempts to apply for a write lock, the request to apply for a write lock will fail, and the thread may temporarily sleep.
  • the read-write lock After the thread applies for a write lock, the read-write lock usually blocks subsequent read lock requests to avoid long-term occupation of the read lock state, which causes the thread requesting the write lock to wait for a long time.
  • the thread that previously failed to apply for a write lock and temporarily sleeps can be awakened, and then through the write lock request of this thread, it is allowed to hold the write lock, allowing it to read, write, and other operations on the above target data.
  • Figure 7 is a result schematic diagram of a multi-threaded grouping and simultaneous CAS operation provided by the embodiment of the present application, taking 4 threads adding read locks at the same time, dividing the 4 threads into 2 groups, and each group of 2 threads as an example, the present value is read in the memory storing the lock state variable, and the old value and the new value are the incoming values of the CAS operation.
  • the valid CAS operation is 4 times
  • the invalid CAS operation is 2 times, which is reduced from 6 times in the example of Figure 3 to 2 times, and the number of concurrent rounds is also reduced from 4 rounds to 2 rounds.
  • threads 1 and 2 in group 1 simultaneously add read locks (the first round of concurrency)
  • only one thread for example, thread 1 can successfully modify the mark bit of the sub-variable area corresponding to the thread group; then thread 2 can immediately retry the process of adding read locks (the second round) based on the latest value of the current sub-variable area, and successfully modify the mark bit of the sub-variable area and hold the read lock.
  • threads 3 and 4 in group 2 are in the first round of concurrency, one of the threads (for example, thread 3) successfully modifies the mark bit and holds the read lock; then the other thread can successfully modify it in the second round and also holds the read lock.
  • the mark bit of the sub-variable area corresponding to each thread group can be modified.
  • these four threads can successfully modify the mark bit of the sub-variable area in one round of concurrency, that is, the number of modification operation failures is 0.
  • the pinbuffer/unpinbuffer function implements functions similar to read and write, and can share, monopolize, and control other status identifications of data pages. Among them, sharing uses a shared count (similar to adding and reading locks), and each time a thread queries a data page, it is necessary to add a read lock to the index root directory page. In the Kunpeng 2P+openGauss (open source Gauss) scenario, the concurrent access volume of the same page is extremely large.
  • Figure 8 is a performance analysis perf flame graph provided by the embodiment of the present application. Among them, when the query performance of openGauss is 1.1 million queries per second (query per second, QPS), the atomic operation instruction overhead of adding and reading locks has exceeded 50%, and the CPU is fully loaded.
  • the performance of the computer system can be seen in Figure 10, which is another performance analysis perf flame graph provided by the embodiment of the present application, wherein the query performance of openGauss is improved to 1.56 million QPS, and the atomic operation instruction overhead of the unlocking and decoding has been reduced to 14%.
  • the query performance of openGauss is improved by 41%, and the unlocking and decoding performance is improved by 5 times.
  • the present application proposes a multi-threaded concurrent management method, which divides the global variable used to control multi-threaded access to shared data into multiple areas, divides the multiple threads applying to read the shared data into multiple thread groups, and makes one thread group correspond to one variable area, and different thread groups correspond to different variable areas, so that when threads in different thread groups apply to read shared data, they can obtain the permission to read the shared data by modifying the value of the corresponding variable area.
  • the modification of the values of different variable areas between threads in different thread groups is independent, so that the value of each variable area can be modified successfully respectively, that is, in one round of concurrent modification, multiple threads can be modified successfully.
  • FIG. 11 is a schematic diagram of the structure of a multi-threaded concurrent management device provided in an embodiment of the present application.
  • the device 110 may include a first processing unit 1101, a second processing unit 1102, and a third processing unit 1103, and may also include a determination unit 1104.
  • the detailed description of each unit is as follows:
  • the first processing unit 1101 is used to divide the target global variable into a plurality of sub-variable areas; the target global variable is used to control multi-threaded access to target data, and each of the plurality of sub-variable areas includes one or more flag bits;
  • the second processing unit 1102 is used to divide the multiple threads into multiple thread groups; the multiple threads are used to apply for permission to read the target data;
  • Target thread group is one of the multiple thread groups
  • target sub-variable region is one of the multiple sub-variable regions
  • one of the multiple thread groups and one of the multiple sub-variable regions correspond to each other one by one
  • the third processing unit 1103 is configured to allow the target thread to read the target data if the target thread successfully modifies the mark bit.
  • the device further includes:
  • the determining unit 1104 is configured to determine the total number of threads that can currently read the target data according to the status of the flag bit in each sub-variable area.
  • a target thread group among the multiple thread groups includes multiple threads
  • one thread in the target thread group is allowed to successfully modify a flag bit in a target sub-variable area corresponding to the target thread group.
  • the second processing unit 1102 is specifically configured to:
  • the target thread in the target thread group is authorized to modify the mark bit in the target sub-variable area through a compare and exchange (CAS) operation.
  • CAS compare and exchange
  • each of the sub-variable regions further includes a control status bit, and the control status bit is used to indicate whether the target data is exclusively occupied; the second processing unit 1102 is further used to authorize the first thread to modify the control status bit of each sub-variable region respectively when the first thread of the target thread group among the multiple thread groups applies for exclusive use of the target data;
  • the third processing unit 1103 is further configured to allow the first thread to exclusively occupy the target data if the first thread successfully modifies the control status bit.
  • the second processing unit 1102 is specifically configured to:
  • the first thread is authorized to modify the control status bit of each sub-variable area through a CAS operation.
  • the global variable is a lock state variable
  • the mark bit is a read lock count bit
  • the threads in the multiple thread groups modify the mark bit in the sub-variable area to add a read lock or an unread lock.
  • the global variable is a lock state variable
  • the control state bit is a write lock state bit
  • the threads in the multiple thread groups modify the control state bit in the sub-variable area to add a write lock or release a write lock.
  • FIG12 is a schematic diagram of the structure of another multi-threaded concurrent management device provided in an embodiment of the present application, wherein the device 120 includes at least one processor 1201, at least one memory 1202, and at least one communication interface 1203.
  • the device may also include common components such as an antenna, which will not be described in detail here.
  • Processor 1201 can be a general-purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the above program.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • the communication interface 1203 is used to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), core network, wireless local area networks (WLAN), etc.
  • RAN radio access network
  • WLAN wireless local area networks
  • the memory 1202 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM) or other types of dynamic storage devices that can store information and instructions, or an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compressed optical disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store the desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.
  • the memory may exist independently and be connected to the processor through a bus. The memory may also be integrated with the processor.
  • the memory 1202 is used to store application code for executing the multi-thread concurrent management method described above, and the execution is controlled by the processor 1201.
  • the processor 1201 is used to execute the application code stored in the memory 1202.
  • the code stored in the memory 1202 can execute the multi-threaded concurrency management method provided in Figure 5 above, such as dividing the target global variable into multiple sub-variable areas; the target global variable is used to control multi-threaded access to target data, and each of the multiple sub-variable areas includes one or more flag bits; dividing multiple threads into multiple thread groups; the multiple threads are used to apply for permission to read the target data; authorizing the target thread in the target thread group to modify the flag bit in the target sub-variable area; the target thread group is one of the multiple thread groups, and the target sub-variable area is one of the multiple sub-variable areas, and one of the thread groups in the multiple thread groups corresponds to one of the multiple sub-variable areas; if the target thread successfully modifies the flag bit, the target thread is allowed to read the target data.
  • the multi-threaded concurrent management device described in this application is not limited to this.
  • the multi-threaded concurrent management device can be located in any electronic device, such as a server, a computer, a mobile phone, a tablet, and other devices.
  • the multi-threaded concurrent management device can specifically be a chip or a chipset or a circuit board equipped with a chip or a chipset.
  • the chip or chipset or the circuit board equipped with a chip or a chipset can work under the necessary software drive.
  • the multi-threaded concurrent management device can be:
  • the IC set may also include a storage component for storing data and computer programs;
  • An embodiment of the present application further provides a computer-readable storage medium, in which a computer program code is stored.
  • a computer program code is stored.
  • the embodiment of the present application also provides an electronic device, which can exist in the form of a chip product, and the electronic device includes a processor, and the processor is configured to support the electronic device to implement the corresponding functions of the method in any of the above embodiments.
  • the electronic device may also include a memory, which is coupled to the processor and stores the necessary program instructions and data of the electronic device.
  • the electronic device may also include a communication interface for the electronic device to communicate with other devices or a communication network.
  • the embodiment of the present application also provides a computer program product.
  • the computer program product When the computer program product is run on a computer, the computer executes the method in any of the aforementioned embodiments.
  • the embodiment of the present application provides a chip system, which includes a processor for supporting a device to implement the functions involved in the first aspect, for example, generating or processing the information involved in the multi-threaded concurrent management method.
  • the chip system also includes a memory, which is used to store program instructions and data necessary for the device.
  • the chip system can be composed of a chip, or it can include a chip and other discrete devices.
  • the disclosed device can be implemented in other ways.
  • the device embodiments described above are only schematic, such as the division of the above-mentioned units, which is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, and the indirect coupling or communication connection of devices or units can be electrical or other forms.
  • the units described above as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiments of the present application.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including several instructions to enable a computer device (which can be a personal computer, a server or a network device, etc., specifically a processor in a computer device) to perform all or part of the steps of the above-mentioned methods in each embodiment of the present application.
  • the aforementioned storage medium may include: U disk, mobile hard disk, magnetic disk, optical disk, read-only memory (read-only memory, abbreviated: ROM) or random access memory (random access memory, abbreviated: RAM) and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

本申请公开了一种多线程并发管理方法和相关装置,该方法可包括:将目标全局变量划分为多个子变量区;所述目标全局变量用于控制多线程访问目标数据,所述多个子变量区中每个子变量区包括一个或多个标记位;将多个线程划分为多个线程组;所述多个线程用于申请对所述目标数据进行读取的权限;授权目标线程组中的目标线程对目标子变量区中的标记位进行修改;所述多个线程组中的一个线程组和所述多个子变量区中的一个子变量区一一对应;若所述目标线程修改标记位成功,则允许所述目标线程读取所述目标数据。本申请实施例应用于多线程并发CAS操作的场景中,能够提高CAS操作的成功率。

Description

一种多线程并发管理方法和相关装置
本申请要求于2023年06月15日提交中国国家知识产权局、申请号为202310717179.1、申请名称为“一种多线程并发管理方法和相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种多线程并发管理方法和相关装置。
背景技术
线程(thread),是进程中某个单一顺序的控制流,并作为程序执行的最小单元。在引入线程的操作系统中,把线程作为独立运行和独立调度的基本单位。线程可以并发执行,包括一个进程中多个线程的并发、不同进程中的线程并发,以及在多核心的计算机系统中不同核心的线程并发。
多个线程并发时,可能需要访问同一共享数据。也即是,该共享数据可能共享给不同线程。为保证该共享数据的完整性,多个线程同时访问该共享数据时需要进行原子操作,例如比较和交换(compare and switch,CAS)操作。从而使得并发的多个线程不能同时修改共享数据;一个线程不能读取正在被其他线程修改的共享数据;多个线程可以同时读取共享数据。CAS操作常见的应用场景是在读写锁或其他实现类似读写功能的函数(如pinbuffer/unpinbuffer)中,读写锁有读锁和写锁两种状态。一个线程持有写锁时,可以对共享数据进行读、写操作,其他线程无法持有写锁或读锁。一个线程持有读锁时,可以对共享数据进行读操作,其他线程也可以持有读锁。读写锁一般使用一个变量进行状态控制,多个线程之间可以通过CAS操作保证一致性。
但是,随着计算机核数不断增加、单核计算能力不断增强,线程的并发量和并发速度进一步提升。在高并发场景下,多线程并发进行CAS操作失败率高,CAS操作重试的现象严重,针对CAS操作的开销呈指数级增加,成为限制计算机性能的瓶颈之一。
因此,如何提供一种可以提高多线程并发进行CAS操作成功率的方法,是亟待解决的问题。
发明内容
本申请实施例提供一种多线程并发管理方法和相关装置,应用于多线程并发CAS操作的场景中,能够提高CAS操作的成功率。
第一方面,本申请实施例提供了一种多线程并发管理方法,可包括:将目标全局变量划分为多个子变量区;所述目标全局变量用于控制多线程访问目标数据,所述多个子变量区中每个子变量区包括一个或多个标记位;将多个线程划分为多个线程组;所述多个线程用于申请对所述目标数据进行读取的权限;授权目标线程组中的目标线程对目标子变量区中的标记位进行修改;其中,所述目标线程组为所述多个线程组中的一个线程组,所述目标子变量区为所述多个子变量区中的一个子变量区,所述多个线程组中的一个线程组和所述多个子变量区中的一个子变量区一一对应;若所述目标线程修改标记位成功,则允许所述目标线程读取所述目标数据。
现有技术中,多个线程申请读取共享数据,并发修改用于控制多线程访问共享数据的全局变量时,一次只会有一个线程能够成功修改全局变量的值,导致大量修改操作出现失败重试,修改成功率低,修改操作的指令开销大,资源浪费严重。本申请实施例中,通过将用于控制多线程访问共享数据的全局变量划分成多个子变量区,将申请读取该共享数据的多个线程划分为多个线程组,令一个线程组对应一个子变量区,不同线程组对应不同的子变量区,使得不同线程组中的线程在申请读取共享数据时,可以通过修改对应子变量区的值来获得读取该共享数据的权限。其中,在申请读取共享数据时,不同线程组的线程之间对于不同子变量区的值的修改是独立,使得每个子变量区的值都能分别被一个线程修改成功,即在一轮并发的修改中,可以有多个线程修改成功。因此,上述方法应用于多线程并发CAS操作的场景时,若多个线程申请读取共享数据,并发CAS操作对不同子变量区的值进行修改,可以有多个CAS操作成功,提高了CAS操作的成功率,从而减少资源浪费。
在一种可能的实现方式中,所述方法还包括:通过所述每个子变量区中标记位的状态确定当前可读取所述目标数据的线程总数。
本申请实施例中,根据每个子变量区中标记位的状态分别确定出每个子变量区所表示的当前可读取共享数据的线程数量,而当前可读取共享数据的线程总数即为每个子变量区中可确定出的线程数量的总和。在有线程持有读锁的情况下,如另有线程申请加写锁,会加锁失败,进而休眠等待,当持有读锁的线程总数为0时,可以唤醒此前申请加写锁而暂时休眠的线程。
在一种可能的实现方式中,当所述目标线程组包括多个线程时,在一个时钟周期内,所述目标线程组中的一个线程被允许成功修改所述目标子变量区中的标记位。
本申请实施例中,同一个线程组中多个线程并发申请读取目标数据时,即并发修改子变量区标记位的尝试时,只允许有一个线程可以修改成功,以此保证子变量区中标记位修改的一致性,避免出现混乱。
在一种可能的实现方式中,所述授权目标线程组中的目标线程对目标子变量区中的标记位进行修改,包括:
若当前无线程独占所述目标数据,授权所述目标线程组中的目标线程通过比较和交换CAS操作对所述目标子变量区中的标记位进行修改。
本申请实施例中,线程对子变量区标记位修改的过程中,可以先判断目标数据是否已被独占,如果没有其他线程独占目标数据,线程可以通过CAS操作对标记位进行修改,以保证修改的一致性,避免出现混乱。
在一种可能的实现方式中,所述每个子变量区还包括控制状态位,所述控制状态位用于指示所述目标数据是否被独占;所述方法,还包括:
在所述目标线程组的第一线程申请对所述目标数据进行独占时,授权所述第一线程分别对所述每个子变量区的控制状态位进行修改;
若所述第一线程修改控制状态位成功,则允许所述第一线程独占所述目标数据。
本申请实施例中,每个子变量区中还可以有控制状态位,控制状态位可以用来标识目标数据是否已被线程独占。在某个线程申请独占目标数据,该线程需要对每个子变量区中控制状态位进行修改,使得后续如有其它线程申请读取目标数据时,只需要判断自身线程组别对应的子变量区中控制状态位就可确定目标数据是否被独占。如果申请独占目标数据的线程成功修改了控制状态位,则允许此线程独占目标数据。
在一种可能的实现方式中,所述授权所述第一线程分别对所述每个子变量区的控制状态位进行修改,包括:
若当前无线程独占或读取所述目标数据,授权所述第一线程通过CAS操作对所述每个子变量区的控制状态位进行修改。
本申请实施例中,线程对子变量区控制状态位修改的过程中,可以先判断目标数据是否已被独占或读取,如果没有其他线程独占或读取目标数据,线程可以通过CAS操作对每个子变量区的控制状态位进行修改,以保证修改的一致性,避免在多个线程同时申请独占目标数据时出现混乱。
在一种可能的实现方式中,所述全局变量为锁状态变量,所述标记位为读锁计数位,所述多个线程组中的线程针对子变量区中标记位的修改用于加读锁或解读锁。
本申请实施例中,控制多线程访问共享数据的方式可以是读写锁,则全局变量可以是锁状态变量,标记位可以是读锁计数位,多个线程申请读取目标数据并对标记位进行修改的过程相当于加解读锁操作。在锁状态变量分区,线程分组,多线程可多路并行修改分区的读锁计数位时,可以有多个线程修改成功,提高了加解读锁的成功率。
在一种可能的实现方式中,所述全局变量为锁状态变量,所述控制状态位为写锁状态位,所述多个线程组中的线程针对子变量区中控制状态位的修改用于加写锁或解写锁。
本申请实施例中,控制多线程访问共享数据的方式可以是读写锁,则全局变量可以是锁状态变量,控制状态位可以是写锁状态位,多个线程申请独占目标数据并对写锁状态位进行修改的过程相当于加解写锁操作。某个线程申请独占目标数据,该线程需要对每个子变量区中写锁状态位进行修改,使得后续如有其它线程申请读取目标数据时,只需要判断自身线程组别对应的子变量区中写锁状态位就可确定目标数据是否被独占,即是否有线程持有写锁。
第二方面,本申请实施例提供了一种多线程并发管理装置,可包括:第一处理单元,用于将目标全局变量划分为多个子变量区;所述目标全局变量用于控制多线程访问目标数据,所述多个子变量区中每个子变量区包括一个或多个标记位;
第二处理单元,用于将多个线程划分为多个线程组;所述多个线程用于申请对所述目标数据进行读取的权限;
授权目标线程组中的目标线程对目标子变量区中的标记位进行修改;其中,所述目标线程组为所述多个线程组中的一个线程组,所述目标子变量区为所述多个子变量区中的一个子变量区,所述多个线程组中的一个线程组和所述多个子变量区中的一个子变量区一一对应;
第三处理单元,用于若所述目标线程修改标记位成功,则允许所述目标线程读取所述目标数据。
在一种可能的实现方式中,所述装置还包括:
确定单元,用于通过所述每个子变量区中标记位的状态确定当前可读取所述目标数据的线程总数。
在一种可能的实现方式中,当所述目标线程组包括多个线程时,在一个时钟周期内,所述目标线程组中的一个线程被允许成功修改所述目标子变量区中的标记位。
在一种可能的实现方式中,所述第二处理单元,具体用于:
若当前无线程独占所述目标数据,授权所述目标线程组中的目标线程通过比较和交换CAS操作对所述目标子变量区中的标记位进行修改。
在一种可能的实现方式中,所述每个子变量区还包括控制状态位,所述控制状态位用于指示所述目标数据是否被独占;所述第二处理单元,还用于在所述目标线程组的第一线程申请对所述目标数据进行独占时,授权所述第一线程分别对所述每个子变量区的控制状态位进行修改;
所述第三处理单元,还用于若所述第一线程修改控制状态位成功,则允许所述第一线程独占所述目标数据。
在一种可能的实现方式中,所述第二处理单元,具体用于:
若当前无线程独占或读取所述目标数据,授权所述第一线程通过CAS操作对所述每个子变量区的控制状态位进行修改。
在一种可能的实现方式中,所述全局变量为锁状态变量,所述标记位为读锁计数位,所述多个线程组中的线程针对子变量区中标记位的修改用于加读锁或解读锁。
在一种可能的实现方式中,所述全局变量为锁状态变量,所述控制状态位为写锁状态位,所述多个线程组中的线程针对子变量区中控制状态位的修改用于加写锁或解写锁。
第三方面,本申请实施例提供了一种多线程并发管理装置,包括处理器,处理器被配置为支持该装置实现第一方面所提供的多线程并发管理方法中相应的功能。该装置还可以包括存储器,存储器用于与处理器耦合,其保存该装置必要的程序指令和数据。该装置还可以包括接口电路,用于该装置与其他装置、其他设备或通信网络通信。
第四方面,本申请实施例提供了一种计算机可读存储介质,用于存储上述第二方面中的一种或多种所提供的一种用于实现多线程并发管理方法的设备装置所用的计算机软件指令,其包含用于执行上述方面所设计的程序。
第五方面,本申请实施例提供了一种计算机程序,该计算机程序包括指令,当该计算机程序被计算机执行时,使得计算机可以执行上述第二方面中的一种或多种所提供的一种用于实现多线程并发管理方法的装置所执行的流程。
第六方面,本申请实施例提供了一种电子设备,该电子设备中包括处理器,处理器被配置为支持该电子设备实现第一方面所提供的多线程并发管理方法中相应的功能。该电子设备还可以包括存储器,存储器用于与处理器耦合,其保存该电子设备必要的程序指令和数据。该电子设备还可以包括通信接口,用于实现该电子设备与其他设备或通信网络通信。
第七方面,本申请实施例提供了一种芯片系统,该芯片系统包括处理器,用于支持设备实现上述第一方面所涉及的功能,例如,生成或处理上述多线程并发管理方法中所涉及的信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
第八方面,本申请实施例提供了一种服务器,包括通信接口、存储器和处理器;所述通信接口、存储器与处理器耦合,通信接口,用于该服务器与其他设备或通信网络通信,存储器用于存储计算机程序代码, 计算机程序代码包括计算机指令,当处理器从存储器中读取计算机指令,以使得服务器执行如第一方面中任一种可能的实现方式。
第九方面,本申请实施例提供了一种车载设备,包括通信接口、存储器和处理器;所述通信接口、存储器与处理器耦合,通信接口,用于该车载设备与其他设备或通信网络通信,存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当处理器从存储器中读取计算机指令,以使得车载设备执行如第一方面中任一种可能的实现方式。
附图说明
为了更清楚地说明本申请实施例或背景技术中的技术方案,下面将对本申请实施例或背景技术中所需要使用的附图进行说明。
图1是一种锁状态变量的结构示意图;
图2是一种读写锁加锁的流程示意图;
图3是一种多线程同时进行CAS操作的结果示意图;
图4是本申请实施例提供的一种多线程并发管理方法应用的系统架构示意图;
图5是本申请实施例提供的一种多线程并发管理方法的流程示意图;
图6是本申请实施例提供的一种锁状态变量分区的结构示意图;
图7是本申请实施例提供的一种多线程分组同时进行CAS操作的结果示意图;
图8是本申请实施例提供的一种性能剖析perf火焰图;
图9是本申请实施例提供的另一种性能剖析perf火焰图;
图10是本申请实施例提供的另一种性能剖析perf火焰图;
图11是本申请实施例提供的一种多线程并发管理装置的结构示意图;
图12是本申请实施例提供的另一种多线程并发管理装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例进行描述。
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的一个或多个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其他实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其他实施例相结合。
在本说明书中使用的术语“部件”、“模块”、“系统”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。例如,部件可以是但不限于,在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序和/或计算机。通过图示,在计算设备上运行的应用和计算设备都可以是部件。一个或多个部件可驻留在进程和/或执行线程中,部件可位于一个计算机上和/或分布在2个或更多个计算机之间。此外,这些部件可从在上面存储有各种数据结构的各种计算机可读介质执行。部件可例如根据具有一个或多个数据分组(例如来自与本地系统、分布式系统和/或网络间的另一部件交互的二个部件的数据,例如通过信号与其他系统交互的互联网)的信号通过本地和/或远程进程来通信。
首先,对本申请中的部分用语进行解释说明,以便于本领域技术人员理解。
(1)原子操作,是指不可中断的一个或者一系列操作,也就是不会被线程调度机制打断的操作,运行期间不会有任何的上下文切换(context switch)。原子操作主要包括单纯加1减1操作、比较和交换(CAS)操作以及赋值操作。在多个线程同时访问共享数据时,执行CAS原子操作对读写锁或其他实现类似读写功能的函数的变量进行修改,可以确保在多线程并发场景下,该变量值修改的原子性,以免发生混乱。本申请实施例中,对变量划分多个区域,对多个线程进行分组,目标分组中的某个线程读取共享数据时,可以通过CAS操作修改变量中对应的目标区域的计数值,也即是,通过将并发线程的CAS操作分在多个区域进行,减少CAS操作的并发冲突,从而提高CAS操作的成功率。
(2)读写锁,包括读锁和写锁。其中,一次只有一个线程可以占有写锁状态的读写锁,但是可以有多个线程同时占有读锁状态的读写锁。因此,当读写锁是写锁状态时,在这个锁被解锁之前,所有试图对这个锁进行加锁(包括读锁和写锁)的线程都会被阻塞。而当读写锁在读锁状态时,其他试图进行加读锁的线程可以得到访问权,但是如果线程希望进行加写锁的话,它必须等到所有的线程释放锁。读写锁一般使用一个变量进行状态控制,多个线程之间可以通过CAS操作保证一致性。本申请实施例中,可以通过对用于控制读写锁状态的变量进行分区,并将并发CAS操作的多个线程分组,当目标分组中的某个线程申请读锁时(即读取共享数据时),可以通过CAS操作修改变量中对应的目标区域的计数值,其他区域用于处理其他分组的线程并发的CAS操作,以此提高CAS的成功率。
(3)比较和交换(compare and switch,CAS)操作,会传入三个值:修改的目标变量地址、变量的原值、将修改的新值。在执行CAS操作时,会将访存按一定单位进行独占控制(通常在一个缓存行(cacheline)范围内),如果内存(目标变量地址)中目标变量的值和CAS操作传入的“变量的原值”相等,就将内存中的目标变量更改为CAS操作传入的“将修改的新值”;如果内存中目标变量的值和CAS操作传入的“变量的原值”不一致,那么修改失败,同时将目标变量的当前值传出。以此确保在多线程并发场景下,目标变量值修改的原子性。
(4)并发,是指在一个时间段内,可以有多个线程在运行部分逻辑。其中,无论上一个开始执行的线程是否完成,当前线程都可以开始执行,而不必等上一个线程完成。
(5)在本申请中,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
首先,分析并提出本申请所具体要解决的技术问题。在现有技术中,CAS操作常见的应用场景是在读写锁或其他实现类似读写功能的函数(如pinbuffer/unpinbuffer),见以下方案一和方案二:
方案一:读写锁普遍的实现方式是通过一个锁状态变量控制,参见图1,图1是一种锁状态变量的结构示意图,以linux rwlock为例,图1中所示的变量表示当前无线程持有写锁,有3个线程持有读锁。当变量为0时表示读写锁处于空闲状态,即没有线程持有锁(包括读锁和写锁)。最高位为1时,表示有线程持有写锁,该线程释放写锁时将最高位置0。每次有线程加读锁时,对变量进行加1操作,有线程释放读锁(即解读锁)时,对变量进行减1操作。为保证多线程之间变量修改的一致性,各个线程通过CAS操作指令进行同步,确保该变量每次只有一个线程可修改。当然,写锁状态位不一定要在最高比特位,或者,变量中的所有比特位也不是全用来进行读写锁计数,可以取几个比特位做其他定制化的标识用途。
该方案一存在以下缺点:
当大量线程并发同时对一个共享资源进行只读访问时,会大量并发加读锁,加读锁失败率高。请参见图2,图2是一种读写锁加锁的流程示意图,在上述方案1中,线程申请写锁时(如图2中的方式一),首先对锁状态进行判断,是否已是读锁或者写锁。如果已经被加锁,那么申请写锁失败,需要重新进行申请(可以立即重试,也可以进行等待被其他释放流程唤醒);如果未被加锁,那么需将变量写锁状态位置1,于是对该变量进行CAS原子操作,修改成功即加写锁成功;修改失败,说明这期间已有其他线程率先加锁修改了变量值,加写锁失败,进入重试、等待或其他流程。线程申请读锁时(如图2中的方式二),首先对锁状态进行判断,是否已是写锁。如果已经被加写锁,那么申请读锁失败,需要重新进行申请(可以立即重试,也可以进行等待被其他释放流程唤醒);如果未被加写锁,那么需将变量读锁计数位加1,于是对该变量进行CAS原子操作,修改成功即加读锁成功;修改失败,说明这期间已有其他线程修改了变量值,也即是多线程并发加读锁时,其他线程加读锁成功,导致本线程加读锁失败,进入重试或其他流程,正常情况下会根据当前锁状态变量的最新值,立刻重试加读锁的流程。
可见,当同时有两个线程进行加读锁时,其中一个线程的CAS操作会失败,在对最新锁状态判断后,如未被加写锁,会立即进行再次加读锁。进一步地,当大量线程并发同时对一个共享资源进行只读访问时,会大量并发加读锁,加读锁失败的次数增加,失败率陡增。可参见图3,图3是一种多线程同时进行CAS操作的结果示意图,以4个线程同时加读锁为例,现值在存储锁状态变量的内存中读取,旧值和新值为CAS操作的传入值,在四轮并发的CAS操作中,有效CAS操作是4次(即OK),而无效CAS操作就有6次(即failed), 即CAS操作修改成功的次数是4次,失败次数是6次(其中,第一轮3次,第二轮2次,第三轮1次)。每次CAS操作的开销是较大的,还需要多核、跨片之间同步访存,占用内存带宽和计算资源。
方案二:其他实现类似读写功能的函数(如pinbuffer/unpinbuffer),对数据页进行共享、独占、以及其他状态标识控制。其中,共享就采用共享计数,类似于加读锁和解读锁,pinbuffer可以简单理解为加读锁,unpinbuffer可以简单理解为解读锁。以鲲鹏2P+openGauss(开源高斯)场景为例,当多线程对同一个表(可以共享的数据)进行高并发查询时,每次查询都需要进行只读访问索引根目录,类似加读锁的CAS操作高并发,也存在着如方案一中的问题,CAS操作失败率高,CAS操作失败重试的次数多,CAS指令占了整个服务器CPU开销的51%。
为此,本申请提出一种多线程并发管理方法和相关装置,应用于多线程并发CAS操作的场景中,能够提高CAS操作的成功率。具体地,将共享数据的全局变量划分成多个区域,将申请读取该共享数据的多个线程划分为多个线程组,一个线程组对应一个变量区域,不同线程组对应不同的变量区域,使得不同线程组中的线程在申请读取共享数据时,可以通过修改对应变量区域的值获得读取该共享数据的权限。其中,在申请读取共享数据时,不同线程组的线程之间对于不同变量区域的值的修改是独立,使得每个变量区域的值都能分别被修改成功。因此,当多个线程申请读取共享数据,并发CAS操作对不同变量区域的值进行修改时,可以有多个CAS操作成功,提高了CAS操作的成功率。
为更好地理解本申请实施例提供的一种多线程并发管理方法,下面将对本申请实施例提供的多线程并发管理方法的系统架构和/或应用场景进行说明。可理解的,本申请实施例描述的系统架构以及应用场景是为了可以更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定。
本申请实施例提供的多线程并发管理方法可应用的系统架构,可参见图4,图4是本申请实施例提供的一种多线程并发管理方法应用的系统架构示意图,该系统架构可以包括一个或多个中央处理器(Central Processing Unit,CPU)、一个或多个图形处理器(Graphics Processing Unit,GPU)、或者一个或多个系统级芯片(System on Chip,SOC),以及包括存储器。其中,处理器独立运行和独立调度的基本单元是线程。例如,图4所示系统架构中包括多个CPU(如CPU1-CPU5),其中任意一个CPU(例如CPU1)可以用于执行本申请实施例提供的多线程并发管理方法,首先将用于控制多线程访问共享数据的全局变量划分为多个子变量区,并对多个CPU中请求读取共享数据的多个线程进行分组。当多个CPU并发线程对存储器中的共享数据进行读取时,可以授权不同线程组的线程修改不同子变量区中的标记位,如果线程修改标记位成功,CPU1可以允许其读取存储器中的共享数据。示例性的,CPU1独立运行和调度线程1申请对存储器中的共享数据进行读取(以申请读锁为例)时,线程1可以只对全局变量(例如锁状态变量)的第一个分区的标记位进行修改,修改成功后,线程1可以读取该共享数据;CPU2独立运行和调度线程2申请对该共享数据进行读取时,线程2可以只对全局变量的第二个分区的标记位进行修改,修改成功后,线程2可以读取该共享数据。同理,CPU3运行或调度线程3和CPU4运行或调度线程4申请对该共享数据进行读取,线程3和线程4可以分别修改第三个分区和第四个分区的标记位,修改成功后,线程3和线程4可以读取该共享数据。其中,全局变量的长度可以是1字节~16字节,甚至可以是更长或者更短,在不同的体系架构指令集中的范围可以有所区别,图4中只是以全局变量长度等于缓存行(cacheline)长度为例,不应构成对本申请的限定。需要说明的是,全局变量的每个分区中还可以包括控制状态位(如写锁状态位),当某个CPU运行或调度线程申请独占该共享数据时,可以修改每个分区的控制状态位,如图4中CPU5运行或调度线程5独占存储器中的共享数据时,可以对四个分区中每个分区的控制状态位进行修改,修改成功后,线程5可以对共享数据进行读写等操作。
需要说明的是,本申请实施例可以应用于各种计算机系统架构中,上述图4中的架构只是本申请实施例中的一种示例性的实施方式,本申请实施例可应用的架构包括但不仅限于以上架构。应该理解的是,计算机的架构可以具有比图中所示的更多的或者更少的单元/模块,可以组合两个或多个的单元/模块,或者可以具有不同的单元/模块配置。各种单元/模块可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
可理解地,本申请实施例提供的多线程并发管理方法可以由电子设备执行。电子设备是指能够被抽象为计算机系统的设备,其中,支持上述多线程并发管理功能的电子设备,也可称为多线程并发管理装置。多线程并发管理装置可以是该电子设备的整机,例如:服务器、车载计算机或终端设备,终端设备可以是智能可穿戴设备、智能手机、平板电脑、笔记本电脑、台式电脑等等;也可以是由多个整机构成的系统/装置;还可以是该电子设备中的部分器件,例如:多线程并发管理功能相关的芯片,如处理器、系统芯片(system on a chip,SoC),等等,本申请实施例对此不作具体限定。其中,系统芯片也称为片上系统。可理解,本申请实施例提供的上述多线程并发管理方法也可以应用于上述电子设备的操作系统、数据库、应用软件、中间件或者底层软件中。
为方便理解,下面将结合更多的附图对本申请提供的技术方案进行说明。
本申请中,除特殊说明外,各个实施例或实现方式之间相同或相似的部分可以互相参考。在本申请中各个实施例、以及各实施例中的各个实施方式/实施方法/实现方法中,如果没有特殊说明以及逻辑冲突,不同的实施例之间、以及各实施例中的各个实施方式/实施方法/实现方法之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例、以及各实施例中的各个实施方式/实施方法/实现方法中的技术特征根据其内在的逻辑关系可以组合形成新的实施例、实施方式、实施方法、或实现方法。以下所述的本申请实施方式并不构成对本申请保护范围的限定。
参见图5,图5是本申请实施例提供的一种多线程并发管理方法的流程示意图,该方法包括但不限于以下步骤:
S501:将目标全局变量划分为多个子变量区。
具体地,目标全局变量用于控制多线程访问目标数据,该目标数据是可以被多个线程共享的资源,一般可称为共享数据或共享资源。该目标数据可以是具体的数据,例如可以是被共享查询的表;该目标数据也可以是指临界区(critical section),即可以是访问共用资源(例如共用设备、共用存储器等)的程序片段。为保证目标数据的完整性,需要设置一个用于实现对多线程访问该目标数据进行控制的全局变量(即目标全局变量),在对这个全局变量划分区域后,得到多个子变量区,其中每个子变量区可以包括一个或多个标记位,后续不同线程组中的线程可以对不同子变量区的标记位进行修改。以全局变量为锁状态变量作为示例,此时标记位可以称为读锁计数位,例如,参见图6,图6本申请实施例提供的一种锁状态变量分区的结构示意图,其中,以锁状态变量的长度为8字节作为示例,图示中将锁状态变量划分为4个2字节的区域(包括分区1、分区2、分区3和分区4),每个分区可以包括一个或多个读锁计数位。可选地,针对全局变量划分区域时,可以是等分(如图6所示),也可以是不等分,例如可以划分为四个长度分别为1字节、1字节、2字节和4字节的子变量区,在此不作具体限定。可选地,可以根据不同线程组中的线程数量来划分子变量区的长度,当线程组中的线程数量较多时,该线程组对应的子变量区的长度可以长一些;相反地,当线程组中的线程数量较少时,该线程组对应的子变量区的长度可以短一些。
需要说明的是,为保证子变量区标记位修改的一致性,线程一般是通过原子操作(如CAS操作)对标记位进行修改。因目前原子操作可操作的最小单位是1个字节,那么,全局变量中单个子变量区的最小长度应为1字节;整个全局变量的最大长度支持到计算机能够进行原子操作的最大字节数(如缓存行(cacheline))。可理解地,随着计算机技术的发展,如果可进行原子操作的最小单位和最大单位发生改变,本申请实施例中,全局变量以及子变量区可支持的长度范围也可以随之改变,即上述目前单个子变量区的最小长度为1字节,和整个全局变量的最大长度为一个缓存行,不应构成对本申请实施例的限定。
S502:将多个线程划分为多个线程组。
S503:授权目标线程组中的目标线程对目标子变量区中的标记位进行修改。
具体地,可以对申请读取目标数据的多个线程进行分组,使得多个线程组中的线程可以分别对上述多个子变量区中标记位进行修改,目标线程组中的线程修改目标子变量区的标记位。其中,一个线程组对应一个子变量区,不同线程组对应不同子变量区,也即是说,所述多个线程组中的一个线程组和所述多个子变量区中的一个子变量区一一对应。示例性的,上述多个线程可以是在执行某个任务的过程中,需要读取目标数据,所以申请对目标数据进行读取的权限。换句话说,上述多个线程可以不只是用来申请读取目标数据的权限,还可用于完成其它任务。
可选地,当所述多个线程组中的目标线程组包括多个线程时,在一个时钟周期内,所述目标线程组中仅有一个线程被允许成功修改所述目标线程组对应的子变量区(即目标子变量区)中的标记位。也即是说,同一线程组中的多个线程在并发修改该线程组所对应的子变量区的标记位时,只会有一个线程可以修改标记位成功,以此保证该子变量区标记位修改的一致性,避免出现混乱。而同一线程组中修改标记位失败的其他线程,可以再重新尝试修改该线程组对应的子变量区的标记位。
可选地,关于线程的分组,可以是根据线程的ID来进行划分,例如线程0-线程9属于第1组,线程10-线程19属于第2组,线程20-线程29属于第3组等;线程的分组也可以是在线程创建时对该线程的组别进行标识,例如CPU1创建的线程标识为第1组,CPU2创建的线程标识为第2组,CPU3创建的线程标识为第3组等。
S504:若所述目标线程修改标记位成功,则允许所述目标线程读取所述目标数据。
具体地,如果多个线程组中的某个线程在尝试对标记位进行修改时,成功修改了标记位,则可以通过这个线程申请读取上述目标数据的请求,即允许这个线程读取上述目标数据。
为方便理解,针对上述图6中将全局变量(锁状态变量)划分为4个子变量区的例子,在对多个线程进行分组时,可以划分为4个组(包括第1组、第2组、第3组和第4组)。示例性的,第1组中的线程可以对分区1中的标记位(读锁计数位)进行修改,第2组中的线程可以对分区2中的标记位进行修改,以此类推。换句话说,在第1组中的线程申请读取目标数据时,相当于申请加读锁,该线程可以根据自己所在线程组的组别,针对与本线程组对应的2字节(例如分区1)进行锁状态判断,确定是否无线程持有写锁,再对该2字节的读锁计数位进行修改(如原子加1操作),若修改成功,则该线程持有读锁。与申请读取相对应地还有释放,在第1组中的线程要释放目标数据时,相当于申请解读锁,该线程根据自己所在线程组的组别,针对与本线程组对应的2字节(分区1)的读锁计数位进行修改(原子减1操作),若修改成功,该线程释放读锁。
在一种可能的实现方式中,上述每个子变量区还可以包括控制状态位,所述控制状态位用于指示所述目标数据是否被独占。仍以全局变量为锁状态变量为例,该控制状态位为写锁状态位,用于表示是否有线程持有写锁,可参见上述图6所示的锁状态变量分区的结构,每个分区中除了读锁计数位,还包括了写锁状态位。上述方法除了步骤S501-S504以外,还可以包括步骤S505和步骤S506,其中,
S505:在所述多个线程组中目标线程组的第一线程申请对所述目标数据进行独占时,授权所述第一线程分别对所述每个子变量区的控制状态位进行修改。
S506:若所述第一线程修改控制状态位成功,则允许所述第一线程独占所述目标数据。
具体地,如果上述多个线程组中的某一个线程申请独占上述目标数据时,该线程需要对每个子变量区中的控制状态位进行修改,如果该线程修改控制状态位成功,则可以通过这个线程申请独占上述目标数据的请求,即允许这个线程独占上述目标数据。仍以图6中所示的全局变量为锁状态变量,锁状态变量长度为8字节,4个子变量区均为2字节作为示例,控制状态位为写锁状态位,上述多个线程组中的某个线程申请独占目标数据时,相当于申请加写锁,该线程需要对8字节进行锁状态判断,确定是否无线程持有写锁或读锁,再对每个2字节分区中的写锁状态位进行修改(如原子加1操作),若修改成功,则该线程持有写锁。与申请独占相对应地还有释放独占,在某个线程要释放对目标数据的独占时,相当于申请解写锁,该线程对每个2字节分区中的写锁状态位进行修改(如原子减1操作),若修改成功,则该线程释放写锁。
在一种可能的实现方式中,上述方法除了步骤S501-S506以外,还可以包括步骤S507,其中,
S507:通过所述每个子变量区中标记位的状态确定当前可读取所述目标数据的线程总数。
具体地,在需要统计当前被允许读取上述目标数据的线程总数时,可以根据每个子变量区的标记位状态进行确定。还是以图6中所示的全局变量为锁状态变量,锁状态变量长度为8字节,4个子变量区均为2字节作为示例,需要确定当前持有读锁的线程总数时,可以将每个子变量区中读锁计数位表示的持有读锁的线程数量相加,得到持有读锁的线程总数,例如,分区1的读锁计数位表示有3个线程持有读锁,分区2表示有5个线程持有读锁,分区3表示有2个线程持有读锁,分区4表示有2个线程持有读锁,则持有读锁的线程总数为12个。
可选地,在一些场景中,例如读写锁处于读锁状态时,另外有线程试图申请加写锁的场景,此时申请加写锁的请求会失败,该线程可能暂时休眠。而在该线程申请加写锁后,读写锁通常会阻塞随后的加读锁请求,避免读锁状态长期占用,导致请求加写锁的线程长期等待的。当持有读锁的最后一个线程释放读锁,也即是说,当确定出当前可读取上述目标数据的线程总数为0时,可以唤醒此前申请加写失败而暂时休眠的线程,然后通过此线程的加写锁请求,使其持有写锁,允许其对上述目标数据进行读、写等操作。
需要说明的是,在低并发场景,也即是加解读锁冲突不明显时,本申请实施例对于计算机系统的性能改变可能不大。但是,在高并发场景中,也即是多个线程之间加解读锁冲突明显时,加解读锁操作时可以独立修改的子变量区越多,能够成功完成加解读锁操作的线程越多,计算机系统获得的性能收益就越明显。可参见图7,图7是本申请实施例提供的一种多线程分组同时进行CAS操作的结果示意图,以4个线程同时加读锁,将4个线程分为2组,每组2个线程为例,现值在存储锁状态变量的内存中读取,旧值和新值为CAS操作的传入值,在两轮并发的CAS操作中,有效CAS操作是4次,而无效CAS操作是2次,由图3示例中的6次降为2次,并发的轮数也从4轮降为了2轮。其中,第1组中的线程1和线程2同时进行加读锁时(第1轮并发),只有一个线程(例如线程1)针对该线程组对应的子变量区的标记位的修改可以成功;然后线程2可以根据当前子变量区的最新值,立刻重试加读锁的流程(第2轮),并成功修改子变量区的标记位,持有读锁。同理,第2组中的线程3和线程4在第1轮并发时,其中一个线程(例如线程3)修改标记位成功,持有读锁;然后另一个线程可以在第2轮中成功修改,也持有读锁。
进一步地,如果将4个线程分为4组,每组一个线程,即4个线程分别在不同的线程组,当这4个线程并发申请读锁时,可以针对每个线程组对应的子变量区的标记位进行修改,此时,从子变量区的角度来看,只有一个线程申请对标记位进行修改,因此,这4个线程可以在一轮并发中均能成功修改子变量区的标记位,即修改操作失败的次数为0。
本申请实施例应用于其他实现类似读写功能函数(如pinbuffer/unpinbuffer)的场景时,也可以大幅提升计算机系统的性能。pinbuffer/unpinbuffer函数实现了类似读写的功能,可以对数据页进行共享、独占、以及其他状态标识控制,其中,共享就采用共享计数(类似加解读锁),线程每次查询数据页需要对索引根目录页进行加读锁。在鲲鹏2P+openGauss(开源高斯)场景中,同页共享访问并发量超大,在未采用本申请实施例的多线程并发管理方法的情况下,计算机系统的性能可以参见图8,图8是本申请实施例提供的一种性能剖析perf火焰图,其中,openGauss的查询性能在110万每秒查询次数(query per second,QPS)时,加解读锁的原子操作指令开销已经超过50%,CPU全部满载。
在采用本申请实施例的多线程并发管理方法,划分2个子变量区,划分2个线程组,分2路修改子变量区的标记位的情况下,计算机系统的性能可以参见图9,图9是本申请实施例提供的另一种性能剖析perf火焰图,其中,openGauss的查询性能提升到120万QPS,加解读锁的原子操作指令开销已经降低到39.5%。
进一步地,在划分6个子变量区,划分6个线程组,分6路修改子变量区的标记位的情况下,计算机系统的性能可以参见图10,图10是本申请实施例提供的另一种性能剖析perf火焰图,其中,openGauss的查询性能提升到156万QPS,加解读锁的原子操作指令开销已经降低到14%。相比于图8所示的性能,openGauss的查询性能提升了41%,加解读锁性能提升了5倍。
综上,本申请提出一种多线程并发管理方法,通过将用于控制多线程访问共享数据的全局变量划分成多个区域,将申请读取该共享数据的多个线程划分为多个线程组,令一个线程组对应一个变量区域,不同线程组对应不同的变量区域,使得不同线程组中的线程在申请读取共享数据时,可以通过修改对应变量区域的值来获得读取该共享数据的权限。其中,在申请读取共享数据时,不同线程组的线程之间对于不同变量区域的值的修改是独立,使得每个变量区域的值都能分别被修改成功,即一轮并发的修改中,可以有多个线程修改成功。因此,上述方法应用于多线程并发CAS操作的场景时,若多个线程申请读取共享数据,并发CAS操作对不同变量区域的值进行修改,可以有多个CAS操作成功,提高了CAS操作的成功率。
上述详细阐述了本申请实施例的方法,下面对本申请实施例的相关装置进行简单说明。
可参见图11,图11是本申请实施例提供的一种多线程并发管理装置的结构示意图,该装置110可以包括第一处理单元1101、第二处理单元1102和第三处理单元1103,还可以包括确定单元1104。其中,各个单元的详细描述如下:
第一处理单元1101,用于将目标全局变量划分为多个子变量区;所述目标全局变量用于控制多线程访问目标数据,所述多个子变量区中每个子变量区包括一个或多个标记位;
第二处理单元1102,用于多个线程划分为多个线程组;所述多个线程用于申请对所述目标数据进行读取的权限;
授权目标线程组中的目标线程对目标子变量区中的标记位进行修改;其中,所述目标线程组为所述多个线程组中的一个线程组,所述目标子变量区为所述多个子变量区中的一个子变量区,所述多个线程组中的一个线程组和所述多个子变量区中的一个子变量区一一对应;
第三处理单元1103,用于若所述目标线程修改标记位成功,则允许所述目标线程读取所述目标数据。
在一种可能的实现方式中,所述装置还包括:
确定单元1104,用于通过所述每个子变量区中标记位的状态确定当前可读取所述目标数据的线程总数。
在一种可能的实现方式中,当所述多个线程组中的目标线程组包括多个线程时,在一个时钟周期内,所述目标线程组中的一个线程被允许成功修改所述目标线程组对应的目标子变量区中的标记位。
在一种可能的实现方式中,所述第二处理单元1102,具体用于:
若当前无线程独占所述目标数据,授权目标线程组中的目标线程通过比较和交换CAS操作对目标子变量区中的标记位进行修改。
在一种可能的实现方式中,所述每个子变量区还包括控制状态位,所述控制状态位用于指示所述目标数据是否被独占;所述第二处理单元1102,还用于在所述多个线程组中目标线程组的第一线程申请对所述目标数据进行独占时,授权所述第一线程分别对所述每个子变量区的控制状态位进行修改;
所述第三处理单元1103,还用于若所述第一线程修改控制状态位成功,则允许所述第一线程独占所述目标数据。
在一种可能的实现方式中,所述第二处理单元1102,具体用于:
若当前无线程独占或读取所述目标数据,授权所述第一线程通过CAS操作对所述每个子变量区的控制状态位进行修改。
在一种可能的实现方式中,所述全局变量为锁状态变量,所述标记位为读锁计数位,所述多个线程组中的线程针对子变量区中标记位的修改用于加读锁或解读锁。
在一种可能的实现方式中,所述全局变量为锁状态变量,所述控制状态位为写锁状态位,所述多个线程组中的线程针对子变量区中控制状态位的修改用于加写锁或解写锁。
需要说明的是,本申请实施例中所描述的多线程并发管理装置中各功能单元/模块的功能可参见上述方法实施例中的相关描述,此处不再赘述。
如图12所示,图12是本申请实施例提供的另一种多线程并发管理装置的结构示意图,该装置120包括至少一个处理器1201,至少一个存储器1202、至少一个通信接口1203。此外,该设备还可以包括天线等通用部件,在此不再详述。
处理器1201可以是通用中央处理器(CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制以上方案程序执行的集成电路。
通信接口1203,用于与其他设备或通信网络通信,如以太网,无线接入网(RAN),核心网,无线局域网(wireless local area networks,WLAN)等。
存储器1202可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。
其中,所述存储器1202用于存储执行以上所述的多线程并发管理方法的应用程序代码,并由处理器1201来控制执行。所述处理器1201用于执行所述存储器1202中存储的应用程序代码。
存储器1202存储的代码可执行以上图5提供的多线程并发管理方法,比如将目标全局变量划分为多个子变量区;所述目标全局变量用于控制多线程访问目标数据,所述多个子变量区中每个子变量区包括一个或多个标记位;将多个线程划分为多个线程组;所述多个线程用于申请对所述目标数据进行读取的权限;授权目标线程组中的目标线程对目标子变量区中的标记位进行修改;所述目标线程组为所述多个线程组中的一个线程组,所述目标子变量区为所述多个子变量区中的一个子变量区,所述多个线程组中的一个线程组和所述多个子变量区中的一个子变量区一一对应;若所述目标线程修改标记位成功,则允许所述目标线程读取所述目标数据。
需要说明的是,本申请实施例中所描述的多线程并发管理装置120中各功能单元的功能可参见上述图5中所述的方法实施例中的步骤S501-步骤S504相关描述,此处不再赘述。
需要说明的是,本申请中描述的多线程并发管理装置并不限于此,该多线程并发管理装置可以位于任意一个电子设备中,如服务器、电脑、计算机、手机、平板等各类设备中。多线程并发管理装置具体可以是芯片或芯片组或搭载有芯片或者芯片组的电路板。该芯片或芯片组或搭载有芯片或芯片组的电路板可在必要的软件驱动下工作。例如,所述多线程并发管理装置可以是:
(1)独立的集成电路IC、芯片、芯片系统或子系统;
(2)具有一个或多个IC的集合,可选的,该IC集合也可以包括用于存储数据,计算机程序的存储部件;
(3)可嵌入在其他设备内的模块;
(4)其他等等。
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序代码,当上述处理器执行该计算机程序代码时,使得计算机执行前述任一实施例中的方法。
本申请实施例还提供一种电子设备,该电子设备可以以芯片的产品形态存在,该电子设备中包括处理器,处理器被配置为支持该电子设备实现前述任一实施例中的方法中相应的功能。该电子设备还可以包括存储器,存储器用于与处理器耦合,其保存该电子设备必要的程序指令和数据。该电子设备还可以包括通信接口,用于该电子设备与其他设备或通信网络通信。
本申请实施例还提供一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行前述任一实施例中的方法。
本申请实施例提供了一种芯片系统,该芯片系统包括处理器,用于支持设备实现上述第一方面所涉及的功能,例如,生成或处理上述多线程并发管理方法中所涉及的信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可能可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其他的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其他的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例方案的目的。
另外,在本申请各实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以为个人计算机、服务端或者网络设备等,具体可以是计算机设备中的处理器)执行本申请各个实施例上述方法的全部或部分步骤。其中,而前述的存储介质可包括:U盘、移动硬盘、磁碟、光盘、只读存储器(read-only memory,缩写:ROM)或者随机存取存储器(random access memory,缩写:RAM)等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (21)

  1. 一种多线程并发管理方法,其特征在于,所述方法包括:
    将目标全局变量划分为多个子变量区;所述目标全局变量用于控制多线程访问目标数据,所述多个子变量区中每个子变量区包括一个或多个标记位;
    将多个线程划分为多个线程组;所述多个线程用于申请对所述目标数据进行读取的权限;
    授权目标线程组中的目标线程对目标子变量区中的标记位进行修改;其中,所述目标线程组为所述多个线程组中的一个线程组,所述目标子变量区为所述多个子变量区中的一个子变量区,所述多个线程组中的一个线程组和所述多个子变量区中的一个子变量区一一对应;
    若所述目标线程修改标记位成功,则允许所述目标线程读取所述目标数据。
  2. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    通过所述每个子变量区中标记位的状态确定当前可读取所述目标数据的线程总数。
  3. 如权利要求1-2中任一项所述的方法,当所述目标线程组包括多个线程时,在一个时钟周期内,所述目标线程组中的一个线程被允许成功修改所述目标子变量区中的标记位。
  4. 如权利要求1-3中任一项所述的方法,其特征在于,所述授权目标线程组中的目标线程对目标子变量区中的标记位进行修改,包括:
    若当前无线程独占所述目标数据,授权所述目标线程组中的目标线程通过比较和交换CAS操作对所述目标子变量区中的标记位进行修改。
  5. 如权利要求1-4中任一项所述的方法,其特征在于,所述每个子变量区还包括控制状态位,所述控制状态位用于指示所述目标数据是否被独占;所述方法,还包括:
    在所述目标线程组的第一线程申请对所述目标数据进行独占时,授权所述第一线程分别对所述每个子变量区的控制状态位进行修改;
    若所述第一线程修改控制状态位成功,则允许所述第一线程独占所述目标数据。
  6. 如权利要求5所述的方法,其特征在于,所述授权所述第一线程分别对所述每个子变量区的控制状态位进行修改,包括:
    若当前无线程独占或读取所述目标数据,授权所述第一线程通过CAS操作对所述每个子变量区的控制状态位进行修改。
  7. 如权利要求1-6中任一项所述的方法,其特征在于,所述全局变量为锁状态变量,所述标记位为读锁计数位,所述多个线程组中的线程针对子变量区中标记位的修改用于加读锁或解读锁。
  8. 如权利要求5-6中任一项所述的方法,其特征在于,所述全局变量为锁状态变量,所述控制状态位为写锁状态位,所述多个线程组中的线程针对子变量区中控制状态位的修改用于加写锁或解写锁。
  9. 一种多线程并发管理装置,其特征在于,包括:
    第一处理单元,用于将目标全局变量划分为多个子变量区;所述目标全局变量用于控制多线程访问目标数据,所述多个子变量区中每个子变量区包括一个或多个标记位;
    第二处理单元,用于将多个线程划分为多个线程组;所述多个线程用于申请对所述目标数据进行读取的权限;
    授权目标线程组中的目标线程对目标子变量区中的标记位进行修改;其中,所述目标线程组为所述多个线程组中的一个线程组,所述目标子变量区为所述多个子变量区中的一个子变量区,所述多个线程组中的一个线程组和所述多个子变量区中的一个子变量区一一对应;
    第三处理单元,用于若所述目标线程修改标记位成功,则允许所述目标线程读取所述目标数据。
  10. 如权利要求9所述的装置,其特征在于,所述装置还包括:
    确定单元,用于通过所述每个子变量区中标记位的状态确定当前可读取所述目标数据的线程总数。
  11. 如权利要求9-10中任一项所述的装置,其特征在于,当所述目标线程组包括多个线程时,在一个时钟周期内,所述目标线程组中的一个线程被允许成功修改所述目标子变量区中的标记位。
  12. 如权利要求9-11中任一项所述的装置,其特征在于,所述第二处理单元,具体用于:
    若当前无线程独占所述目标数据,授权所述目标线程组中的目标线程通过比较和交换CAS操作对所述目标子变量区中的标记位进行修改。
  13. 如权利要求9-12中任一项所述的装置,其特征在于,所述每个子变量区还包括控制状态位,所述控制状态位用于指示所述目标数据是否被独占;所述第二处理单元,还用于在所述目标线程组的第一线程申请对所述目标数据进行独占时,授权所述第一线程分别对所述每个子变量区的控制状态位进行修改;
    所述第三处理单元,还用于若所述第一线程修改控制状态位成功,则允许所述第一线程独占所述目标数据。
  14. 如权利要求13所述的装置,其特征在于,所述第二处理单元,具体用于:
    若当前无线程独占或读取所述目标数据,授权所述第一线程通过CAS操作对所述每个子变量区的控制状态位进行修改。
  15. 如权利要求9-14中任一项所述的装置,其特征在于,所述全局变量为锁状态变量,所述标记位为读锁计数位,所述多个线程组中的线程针对子变量区中标记位的修改用于加读锁或解读锁。
  16. 如权利要求13-14中任一项所述的装置,其特征在于,所述全局变量为锁状态变量,所述控制状态位为写锁状态位,所述多个线程组中的线程针对子变量区中控制状态位的修改用于加写锁或解写锁。
  17. 一种多线程并发管理装置,其特征在于,包括处理器和接口电路,所述接口电路用于接收来自其它通信装置的信号并传输至所述处理器或将来自所述处理器的信号发送给其它通信装置,所述处理器通过逻辑电路或执行代码指令用于实现如权利要求1-8中任一项所述的方法。
  18. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有计算机程序或指令,当所述计算机程序或指令被通信装置执行时,实现如权利要求1-8中任一项所述的方法。
  19. 一种计算机程序,其特征在于,所述计算机程序包括指令,当所述计算机程序被通信装置执行时,实现如权利要求1-8中任一项所述的方法。
  20. 一种服务器,其特征在于,包括处理器和存储器,其中,所述存储器用于存储程序代码,所述程序代码被所述处理器执行时,所述服务器实现如权利要求1-8中任一项所述的方法。
  21. 一种芯片系统,其特征在于,所述芯片系统包括处理器,用于支持设备实现如权利要求1-8中任一项所述的方法所涉及的功能。
PCT/CN2024/092938 2023-06-15 2024-05-13 一种多线程并发管理方法和相关装置 Pending WO2024255500A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310717179.1A CN119149177A (zh) 2023-06-15 2023-06-15 一种多线程并发管理方法和相关装置
CN202310717179.1 2023-06-15

Publications (2)

Publication Number Publication Date
WO2024255500A1 true WO2024255500A1 (zh) 2024-12-19
WO2024255500A8 WO2024255500A8 (zh) 2025-10-23

Family

ID=93813887

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/092938 Pending WO2024255500A1 (zh) 2023-06-15 2024-05-13 一种多线程并发管理方法和相关装置

Country Status (2)

Country Link
CN (1) CN119149177A (zh)
WO (1) WO2024255500A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100332770A1 (en) * 2009-06-26 2010-12-30 David Dice Concurrency Control Using Slotted Read-Write Locks
CN113835901A (zh) * 2013-10-15 2021-12-24 北京奥星贝斯科技有限公司 读锁操作方法、写锁操作方法及系统
CN115756863A (zh) * 2022-11-30 2023-03-07 天翼电子商务有限公司 一种高并发场景下的多线程cas操作管理方法及系统
CN116028189A (zh) * 2023-02-13 2023-04-28 广州文远知行科技有限公司 多线程服务退出方法、装置、存储介质和计算机设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100332770A1 (en) * 2009-06-26 2010-12-30 David Dice Concurrency Control Using Slotted Read-Write Locks
CN113835901A (zh) * 2013-10-15 2021-12-24 北京奥星贝斯科技有限公司 读锁操作方法、写锁操作方法及系统
CN115756863A (zh) * 2022-11-30 2023-03-07 天翼电子商务有限公司 一种高并发场景下的多线程cas操作管理方法及系统
CN116028189A (zh) * 2023-02-13 2023-04-28 广州文远知行科技有限公司 多线程服务退出方法、装置、存储介质和计算机设备

Also Published As

Publication number Publication date
WO2024255500A8 (zh) 2025-10-23
CN119149177A (zh) 2024-12-17

Similar Documents

Publication Publication Date Title
EP3701377B1 (en) Method and apparatus for updating shared data in a multi-core processor environment
CN111666330B (zh) 数据的读写方法和装置
US9690737B2 (en) Systems and methods for controlling access to a shared data structure with reader-writer locks using multiple sub-locks
US20110161540A1 (en) Hardware supported high performance lock schema
EP3404537B1 (en) Processing node, computer system and transaction conflict detection method
JPH10134008A (ja) 半導体装置およびコンピュータシステム
JPH04308961A (ja) 占有されたプロセスの同期ロックの状態を通知するための手段及び装置
CN107807858A (zh) 一种读写锁操作方法及系统、设备
CN113407414A (zh) 程序运行监测方法、装置、终端及存储介质
CN108572876B (zh) 一种读写锁的实现方法及装置
EP3379421B1 (en) Method, apparatus, and chip for implementing mutually-exclusive operation of multiple threads
CN114780248B (zh) 资源访问方法、装置、计算机设备及存储介质
CN114327642A (zh) 一种数据读写的控制方法及电子设备
WO2024255500A1 (zh) 一种多线程并发管理方法和相关装置
JP7346649B2 (ja) 同期制御システムおよび同期制御方法
JP4734348B2 (ja) 共有メモリ型マルチプロセッサにおける非同期遠隔手続き呼び出し方法、非同期遠隔手続き呼び出しプログラムおよび記録媒体
CN103885824A (zh) 接口控制电路、设备和标识切换方法
CN121166607B (zh) 基于位图的内存管理方法、内存管理单元、系统及介质
CN117785767B (zh) 消息同步方法、系统以及相关装置
US8301845B2 (en) Access control method and computer system
JP2010026575A (ja) スケジューリング方法およびスケジューリング装置並びにマルチプロセッサシステム
CN118260090A (zh) 自旋锁管理装置、方法、存储介质和程序产品
JPS5834856B2 (ja) キオクセイギヨソウチ
JP2785738B2 (ja) 分散メモリ型マルチプロセッサ情報処理システム
CN118227344A (zh) 一种共享内存保护方法和微处理芯片

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24822452

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2024822452

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2024822452

Country of ref document: EP

Effective date: 20260115

ENP Entry into the national phase

Ref document number: 2024822452

Country of ref document: EP

Effective date: 20260115

ENP Entry into the national phase

Ref document number: 2024822452

Country of ref document: EP

Effective date: 20260115

ENP Entry into the national phase

Ref document number: 2024822452

Country of ref document: EP

Effective date: 20260115