WO2017190266A1 - 管理转址旁路缓存的方法和多核处理器 - Google Patents
管理转址旁路缓存的方法和多核处理器 Download PDFInfo
- Publication number
- WO2017190266A1 WO2017190266A1 PCT/CN2016/080867 CN2016080867W WO2017190266A1 WO 2017190266 A1 WO2017190266 A1 WO 2017190266A1 CN 2016080867 W CN2016080867 W CN 2016080867W WO 2017190266 A1 WO2017190266 A1 WO 2017190266A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- core
- tlb
- entry
- idle
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1027—Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0842—Multiuser, multiprocessor or multiprocessing cache systems for multiprocessing or multitasking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1009—Address translation using page tables, e.g. page table structures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
- G06F12/121—Replacement control using replacement algorithms
- G06F12/128—Replacement control using replacement algorithms adapted to multidimensional cache systems, e.g. set-associative, multicache, multiset or multilevel
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0811—Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/68—Details of translation look-aside buffer [TLB]
- G06F2212/684—TLB miss handling
Definitions
- the present invention relates to the field of information technology and, more particularly, to a method of managing a Translation Lookaside Buffer (TLB) and a multi-core processor.
- TLB Translation Lookaside Buffer
- the user program generally runs in the virtual address space.
- the operating system (OS) and the Memory Management Unit (MMU) are responsible for converting the virtual address carried by the memory access request into the corresponding physical medium.
- the virtual address includes a virtual page number (VPN) and an in-page offset.
- the physical address includes a physical frame number (PFN) and an in-page offset.
- the mapping relationship is stored in the page table (Page Table) in the form of an entry.
- Chip Multi-Processor also called multi-core processor
- Each processor core (hereinafter referred to as the core) stores a forwarding bypass cache TLB, and the TLB stores some VPN to PFN conversion entries.
- the working set of programs in the system continues to increase, that is, the program and the data to be processed are more and more.
- the TLB storage entries in the existing cores are increasingly unable to meet the usage requirements, resulting in the TLB conversion entries that may be missing in the TLB (ie, TLB Miss), which leads to the lack of TLB.
- TLB Miss the TLB conversion entries that may be missing in the TLB
- the rate increases. If the currently required TLB translation entry is missing, the core usually needs to access the TLB translation entry from the memory through the operating system processing and the in-memory page table. This will result in a large delay and performance loss. The efficiency of the program execution.
- Embodiments of the present invention provide a method for processing an address bypass buffer and a multi-core processor, the method It can expand the TLB capacity in the working core, thus reducing the TLB miss rate and speeding up the execution of the program.
- a method for managing an address bypass buffer TLB is provided, which is applied to a multi-core processor, the multi-core processor includes a first core, the first core includes a TLB, and the method includes:
- the first core receives the first address translation request, and queries the TLB in the first core according to the first address translation request.
- the first target TLB entry is obtained when the first core determines that the first target TLB entry corresponding to the first address translation request is missing in the TLB in the first core.
- the first core determines the second core from the core in the idle state of the multi-core processor.
- the first core replaces the first entry in the TLB in the first core with the first target TLB entry, and stores the first entry in the TLB in the second core.
- the first core may obtain the first target TLB entry from other cores in the multi-core processor, for example, the first core may broadcast a TLB query request to other cores in the multi-core system, and in the broadcast The virtual address corresponding to the first target TLB entry is carried in the TLB query request. After the other core receives the broadcast address, it searches for the virtual address in the local TLB, if one of the processor cores If the TLB hits, the first target TLB entry may be fed back to the first core.
- the first core can quickly obtain the first target TLB entry from the other cores, preventing the first core from initiating a query request to the operating system, and acquiring the first target TLB entry from the memory. Can save a certain amount of time and improve application efficiency.
- the first core may also obtain the first target TLB entry from the page table of the memory.
- the first core sends a query request to the operating system, and the query request carries the virtual address that causes the deletion, and is processed by the operating system.
- the first target TLB entry is obtained in the page table of the memory. Embodiments of the invention are not limited thereto.
- the first core which may also be referred to as a working core or a working node
- the entry in the local TLB is full
- the first core needs to replace a valid TLB entry with the first target TLB entry.
- the first core attempts to acquire more TLB resources to save the replaced TLB entry. Therefore, the first core needs to determine the second core from the idle core.
- the embodiment of the present invention saves the replaced one by using the LTB resource in the idle core.
- the TLB entry not only improves the utilization of idle core TLB resources, but also indirectly increases the capacity of the first core TLB, reduces the occurrence of the first core through the memory to obtain the target TLB entry, and speeds up the execution of the program.
- the core included in the multi-core processing in the embodiment of the present invention may also be referred to as a node, and the node herein may be equivalent to the core in the multi-core processor.
- the first core replaces the first entry in the first entry in the TLB of the first core with the first target TLB entry, and stores the first entry in the TLB of the second core. .
- the location of the first entry may be any one of the entries in the TLB of the primary core, which is not limited by the embodiment of the present invention.
- the first entry may be any TLB entry in the TLB of the first core.
- the first entry may be the first entry, the last entry, or the middle of the TLB of the first core.
- An embodiment of the present invention is not limited by the embodiment of the present invention.
- the first core needs to store the first target TLB entry in the first table of the TLB of the first core.
- the item location, and the first entry replaced by the first entry location is stored in the TLB of the second core.
- the first core may be referred to as a primary core of the second core
- the second core may be referred to as a secondary core (or a standby core) of the first core
- the master writes the TLB entry to the TLB of the slave. If the TLB entry in the master is replaced, the first target TLB entry needs to be obtained because the TLB entry of the primary core is full. Refilling the first entry, that is, replacing the first entry in the first entry with the first target TLB entry, and then storing the first entry in the slave (for example, Two cores in the TLB.
- the write location is saved by the slave.
- a rotation writing mechanism may be adopted, that is, if the number of TLB entries from the core is N, the first entry is started, that is, from 0 to N-1 entries, that is, until When the entry is full, the processing method of the TLB when the entry from the core (for example, the second core) is full will be described in detail below.
- the embodiment of the present invention when the TLB entry of the first core is full and the first target TLB entry is missing, the first entry in the TLB of the first core is replaced after the first target TLB entry is obtained. And storing the first entry in the TLB of the second core, because the replaced entry is stored in the TLB of the second core. Therefore, the embodiment of the present invention expands by utilizing the TLB resource of the idle core.
- the working TLB capacity of the large working core can reduce the TLB miss rate and speed up the execution of the program.
- the working core ie, the master (eg, the first core)
- writes the replaced TLB entry to the idle node that is, from the TLB of the slave (eg, the second core).
- the working core needs to use these replaced TLB entries again, it is not necessary to re-acquire these entries through the operating system.
- These entries can be obtained by directly accessing the Slave TLB, thus greatly reducing the TLB refilling. Delays speed up the execution of the program.
- an idle core can only share its own TLB resources to a working core; a working core can acquire a TLB resource storage TLB entry of a multi-core idle core.
- the determining, by the first core, the second core from the core in the idle state of the multi-core processor including:
- the first core sends a status query request to each of the other cores in the multi-core processor, and the status query request is used to query whether each core is in an idle state;
- the first core receives a response message sent by each core of the other cores, where the response message is used to indicate whether each core is in an idle state;
- the first core selects a core from the core in an idle state as the second core according to the response message.
- the embodiment of the present invention saves the replaced TLB entry by using the LTB resource in the idle core, which not only improves the utilization of the idle core TLB resource, but also indirectly increases the capacity of the first core TLB, and reduces the capacity.
- the first core uses memory to obtain the occurrence of the target TLB entry and speed up the execution of the program.
- the first core selects a core from the core in an idle state as the second core, including:
- the idle core list includes a core in the multi-core processor that is in an idle state other than the first core;
- an idle core with the smallest communication overhead with the first core is selected as the second core.
- the first core selects the idle core with the smallest communication overhead of the first core as the second core, and stores the replaced TLB entry in the second core, thereby minimizing the communication overhead.
- the first core can quickly query the TLB entry and improve the execution efficiency of the program.
- the degree of congestion of the Network on Chip (NoC) route determines that a certain processor core is selected as the second core from the idle core based on the degree of network congestion.
- NoC Network on Chip
- selecting an idle core with the smallest communication overhead of the first core as the second core includes:
- the first core uses the idle core with the smallest number of communication hops in the idle core list as the second core as the second core; or
- the first core uses the idle core with the smallest physical distance from the first core in the free core list as the second core.
- the core with the smallest communication overhead may be determined in other manners, and the embodiment of the present invention is not limited thereto.
- a first core (which may be referred to herein as a requesting node) broadcasts a status query request (which may also be referred to as a TLB query request) to other cores (also referred to as other nodes) in the multi-core processor; the status query request is used to query the Whether each core is in an idle state; after each core receives the status query request, it sends a response message to the first core (requesting node), the response message is used to indicate whether the core is in an idle state.
- the first core obtains a list of free cores based on the response message.
- the idle core list is empty, the TLB resource is deleted, and the TLB entry (that is, the first target TLB entry) that is missing in the first core is read in the existing manner.
- the TLB resource is acquired, that is, the first core selects an idle core with the smallest communication overhead according to the communication overhead between the idle core and the idle core list, and sends a TLB sharing request to the idle core. If the idle core has been shared by other nodes at this time, or it becomes active, send a failure feedback to the requesting node, remove the node from the list of free nodes, and then repeat the above process. If the idle core is in an idle state at this time, it is determined that the idle core is the second core.
- the first core selects the idle core with the smallest communication overhead of the first core as the second core, and stores the replaced TLB entry in the second core, thereby minimizing the communication overhead.
- the first core can quickly query the TLB entry and improve the execution efficiency of the program.
- the method further includes:
- the identity of the second core is recorded in the TLB backup core list in the first core.
- the list of alternate cores here can also be referred to as a list of slave cores.
- the second core becomes a slave of the first core (which may also be referred to as a standby core).
- the slave core (second core) writes the identifier (eg, number, etc.) of the first core into the master core number register of the slave core, and the first core becomes the master core of the second core, and
- the identity of the secondary core (second core) (eg, number, etc.) is added to its alternate core list.
- the second target is determined by acquiring the first target TLB entry, and the second core is recorded in the In the standby core list of the first core, the first core can read and write the TLB in the second core in the standby core list according to the standby core list, thereby increasing the capacity of the TLB of the first core, thereby reducing the first core.
- the TLB miss rate speeds up the execution of the program.
- the method further includes:
- the method further includes:
- the first core receives the second address translation request, and queries the TLB in the first core according to the second address translation request;
- the second target TLB entry is replaced with the second entry in the TLB in the first core, and the second entry is The entry is stored in the TLB within the second core.
- the first core stores the second entry in the location of the original second target TLB entry in the second core. That is, the first core swaps the storage location of the second entry with the second target TLB entry.
- the primary core reads the TLB entry from the secondary core.
- the above process occurs when the master (first core) local TLB entry is missed, that is, if the local TLB does not have the second target TLB entry, the master reads the TLB entry in the slave.
- the Master sends a TLB read request to all slaves according to the Slave List (also known as the slave list).
- the Slave queries the TLB from the core. If the second target TLB Miss, it returns the missing feedback. If it hits, it returns the hit feedback. And hit the contents of the TLB entry.
- the Master collects all the feedback, if there is missing feedback, it sends a TLB miss request to the operating system; if there is a Slave hit, it uses the hit TLB entry to refill. When an entry occurs in the refill process, the replaced entry is written to the hit slave.
- the core working in the embodiment of the present invention has the read/write permission of the TLB of the idle core, and saves the replaced TLB entry by using the LTB resource in the idle core, thereby improving the utilization of the idle core TLB resource. Further, the capacity of the first core TLB is further improved.
- the target entry can be obtained by reading the TLB from the core, that is, the first core is reduced by the memory. The occurrence of the target TLB entry can speed up the execution of the program and improve the efficiency of the execution of the program.
- the entries in the Full Slave TLB are the replacement entries of the Master. Used by the Master, therefore, the Master wants to get more Slaves to save the replaced entries.
- the method further includes:
- the first core receives the third address translation request, and queries the TLB in the first core according to the third address translation request;
- the first core determines the third core from the core in which the multi-core processor is in an idle state
- the first core replaces the third entry in the TLB in the first core with the third target TLB entry, and stores the third entry in the TLB in the third core.
- the entry in the TLB of the primary core (first core) is already full, and the TLB entries in all the secondary cores of the primary core (eg, only one secondary core, the second core) are stored.
- the primary core (first core) determines from the primary core from other idle cores in the multi-core processor that the third core is used to store the third of the TLBs of the first core replaced by the third target TLB entry. Entry.
- the embodiment of the present invention acquires a new slave core to save the replaced TLB entry, further expanding the capacity of the first core, and when the first core is checked again.
- the replaced TLB entry is queried, the first core can be directly read from the new slave core without the core being acquired through the memory. Therefore, the embodiment of the present invention can speed up the execution of the program and improve the execution efficiency of the program.
- the method further includes:
- the first core sends a TLB release instruction to the core recorded in the TLB standby core list, and the TLB release instruction is used to indicate the core release TLB sharing recorded in the standby core list.
- the Master becomes idle: all TLB resources that have been acquired are released.
- the Master sends a TLB release request to all slaves according to the Slave List, so that the TLB resources of these idle cores can be used by other working cores.
- the primary core and the secondary core become idle cores, which can be used by other working cores.
- the acquired TLB resources of the slave core are released by sending a release command to all the slave cores, and in this way, the slave cores (idle cores) are made.
- the release of TLB resources avoids waste of resources, and the released TLB resources of the core can be used by other working cores to increase the capacity of other working cores and speed up the execution of the working core program.
- the slave core may delete the TLB entry that is stored in the core, or may not delete the TLB entry. For example, after the share is solved, all the entries in the TLB can be deleted after the slave core is the slave core of the core working as another. Replace the entry under the core storage for the other work. For example, after the unshared, when the slave core becomes a working state, the previously stored TLB entry may be reserved for its own lookup use.
- the method further includes:
- the method further includes:
- the first core receives the TLB unshare request sent by the second core, where the TLB unshare request carries the identifier of the second core;
- the first core deletes the identity of the second core.
- the second core sends a TLB unshare request to the first core, so that the first core releases the second core, so that the second core can Using its own TLB resources, it avoids the business processing that affects the second core.
- the second core can also become another main core, and other idle core TLB resources can be used.
- the method further includes:
- the first core determines a fourth core from a core in which the multi-core processor is in an idle state
- the first core copies all entries in the TLB of the second core into the TLB of the fourth core.
- the new slave core TLB is used to store all entries in the deleted TLB of the slave core (ie, the second core). If you need to query the entries in the second core in subsequent queries, you do not need to re-obtain these entries through the operating system. You can obtain these entries by directly accessing the TLBs from the core (fourth core), thus greatly reducing the number of entries. The delay of TLB refilling speeds up the execution of the program.
- the embodiment of the present invention acquires a new slave core (fourth core) by using the first core to save all entries in the released TLB in the second core, and if the first core needs to query again in subsequent queries,
- the entries in the second core do not need to be re-acquired from the memory through the operating system, but can be obtained by directly accessing the TLB from the core (fourth core), thereby greatly reducing the TLB refill.
- the delay has speeded up the execution of the program.
- a multi-core processor capable of implementing any of the first aspect and an implementation thereof, the multi-core processor including a first core, the first core including a forwarding address Bypass cache TLB,
- the first core is configured to receive a first address translation request, query a TLB in the first core according to the first address translation request, and determine, in a TLB in the first core, a first corresponding to the first address translation request.
- the target TLB entry is missing, the first target TLB entry is obtained, and when it is determined that the entry in the TLB in the first core is full, the second is determined from the core in the idle state of the multi-core processor.
- the core replaces the first target TLB entry with the first entry in the TLB in the first core;
- the second core is configured to store the first entry in a TLB in the second core.
- the embodiment of the present invention when the TLB entry of the first core is full and the first target TLB entry is missing, the first entry in the TLB of the first core is replaced after the first target TLB entry is obtained. And storing the first entry in the TLB of the second core, because the replaced entry is stored in the TLB of the second core. Therefore, the embodiment of the present invention can increase the TLB capacity of the working core by utilizing the TLB resource of the idle core, thereby reducing the TLB miss rate and speeding up the execution of the program.
- the working core ie, the master (eg, the first core)
- writes the replaced TLB entry to the idle node that is, from the TLB of the slave (eg, the second core).
- the operating system is not required to re-acquire these entries, and these entries can be obtained by directly accessing the Slave TLB, thereby greatly reducing the delay of TLB refilling and speeding up. Execution of the program.
- an idle core can only share its own TLB resources to a working core; a working core can acquire a TLB resource storage TLB entry of a multi-core idle core.
- the first core 1110 when acquiring the first target TLB entry, is specifically configured to:
- the first target TLB entry is obtained from other cores in the multi-core system.
- the first core may broadcast a TLB query request to other cores in the multi-core system, and carry the virtual address that causes the missing, that is, the virtual address corresponding to the first target TLB entry, and other cores. After the broadcast address is obtained, the virtual address is searched for in the local TLB. If the TLB in one of the processor cores is hit, the first target TLB entry may be fed back to the first core.
- the first core can quickly obtain the first target TLB entry from the other cores, preventing the first core from initiating a query request to the operating system, and acquiring the first target TLB entry from the memory. Can save a certain amount of time and improve application efficiency.
- the first core may also obtain the first target TLB entry from the page table of the memory.
- the first core sends a query request to the operating system, and the query request carries the virtual address that causes the deletion, and is processed by the operating system.
- the first target TLB entry is obtained in the page table of the memory. Embodiments of the invention are not limited thereto.
- the first core when determining the second core from the core in the idle state in the multi-core processor, is specifically used to:
- the status query request is used to query whether each core is in an idle state
- the response message is used to indicate whether each core is in an idle state
- one core is selected as the second core from the core in the idle state.
- the embodiment of the present invention saves the replaced TLB entry by using the LTB resource in the idle core, which not only improves the utilization of the idle core TLB resource, but also indirectly improves the first core.
- the capacity of the TLB reduces the occurrence of the target TLB entry through the first core through the memory, and speeds up the execution of the program.
- the first core when selecting a core from the core in an idle state as the second core according to the response message, is specifically used to:
- an idle core list Determining, according to the response message, an idle core list, where the idle core list includes a core in the multi-core processor that is in an idle state other than the first core;
- an idle core with the smallest communication overhead with the first core is selected as the second core.
- the first core selects the idle core with the smallest communication overhead of the first core as the second core, and stores the replaced TLB entry in the second core, thereby minimizing the communication overhead.
- the first core can quickly query the TLB entry and improve the execution efficiency of the program.
- the first core is specifically used to:
- the idle core with the smallest number of communication hops of the idle core list and the first core is used as the second core;
- the idle core having the smallest physical distance from the first core in the free core list is used as the second core.
- the first core is further configured to record the identifier of the second core in the TLB backup core list in the first core.
- the second target is determined by acquiring the first target TLB entry, and the second core is recorded in the In the standby core list of the first core, the first core can read and write the TLB in the second core in the standby core list according to the standby core list, thereby increasing the capacity of the TLB of the first core, thereby reducing the first core.
- the TLB miss rate speeds up the execution of the program.
- the first core is further configured to:
- the second target TLB entry is replaced with the second entry in the TLB in the first core.
- the second core is further configured to store the second entry in the TLB in the second core.
- the core working in the embodiment of the present invention has the read/write permission of the TLB of the idle core, and saves the replaced TLB entry by using the LTB resource in the idle core, thereby improving the utilization of the idle core TLB resource. Further, the capacity of the first core TLB is further improved.
- the target entry can be obtained by reading the TLB from the core, that is, the first core is reduced by the memory. The occurrence of the target TLB entry can speed up the execution of the program and improve the efficiency of the execution of the program.
- the first core is further configured to:
- the third core is configured to store the third entry in a TLB within the third core.
- the embodiment of the present invention acquires a new slave core to save the replaced TLB entry, further expanding the capacity of the first core, and when the first core queries the When the replaced TLB entry is read, the first core can be directly read from the new slave core without the core being acquired through the memory. Therefore, the embodiment of the present invention can speed up the execution of the program and improve the execution efficiency of the program.
- the first core is further configured to:
- a TLB release instruction is sent to the core recorded in the TLB standby core list, and the TLB release instruction is used to indicate the core release TLB sharing recorded in the alternate core list.
- the acquired TLB resources of the slave core are released by sending a release command to all the slave cores, and in this way, the slave cores (idle cores) are made.
- the release of TLB resources avoids waste of resources, and the released TLB resources of the core can be used by other working cores to increase the capacity of other working cores and speed up the work of the core. Program execution.
- the second core is used to send a TLB unshare request to the first core, where the TLB unshare request carries the identifier of the second core;
- the first core is further configured to receive the TLB unshare request, and delete the identifier of the second core in the TLB standby core list.
- the second core sends a TLB unshare request to the first core, so that the first core releases the second core, so that the second core can Using its own TLB resources, it avoids the business processing that affects the second core.
- the second core can also become another main core, and other idle core TLB resources can be used.
- the first core is further configured to determine a fourth core from a core in which the multi-core processor is in an idle state
- the TLB of the fourth core is used to store all entries in the TLB of the second core.
- the embodiment of the present invention acquires a new slave core (fourth core) by using the first core to save all entries in the released TLB in the second core, and if the first core query is needed again, the query is required again.
- the entries in the second core do not need to be re-acquired from the memory through the operating system, but can be obtained directly by accessing the TLB from the core (fourth core), thus greatly reducing the TLB refilling. Delays speed up the execution of the program.
- FIG. 1 is a block diagram showing the structure of a multi-core processor in accordance with one embodiment of the present invention.
- FIG. 2 is a schematic flow diagram of a method of managing a Forwarded Bypass Cache TLB in accordance with one embodiment of the present invention.
- FIG. 3 is a schematic diagram of a spare core list vector in accordance with one embodiment of the present invention.
- FIG. 4 is a schematic diagram of a spare core list vector in accordance with another embodiment of the present invention.
- FIG. 5 is a schematic diagram of a method of managing an address bypass buffer TLB, in accordance with one embodiment of the present invention.
- FIG. 6 is a schematic diagram of a method of managing a forwarding bypass buffer TLB according to another embodiment of the present invention. Figure.
- FIG. 7 is a schematic diagram of a method of managing a forwarding bypass buffer TLB according to another embodiment of the present invention.
- FIG. 8 is a schematic diagram of a method of managing a forwarding bypass buffer TLB according to another embodiment of the present invention.
- FIG. 9 is a schematic diagram of a method of managing a Forwarded Bypass Cache TLB in accordance with another embodiment of the present invention.
- FIG. 10 is a schematic diagram of a method of managing a forwarding bypass buffer TLB according to another embodiment of the present invention.
- FIG. 11 is a schematic block diagram of a multi-core processor in accordance with one embodiment of the present invention.
- the technical solution of the present invention can be run on a hardware device including, for example, a CPU, a memory management unit (MMU), and a memory.
- the operating system running by the hardware device can be various threads or processes (including multiple Threads) Operating systems that implement business processing, such as Linux systems, Unix systems, Windows systems, Android systems, OS systems, and so on.
- multi-core processor refers to a processor including a plurality of processor cores, which may be embodied as an on-chip multi-core processor or an on-board multi-core processing system.
- the on-chip multi-core processor is a processor on which a plurality of processor cores (Core) interconnect each processor core and integrate it on a chip through a Network On Chip (NOC).
- a multi-core processing system refers to a processing system formed by each of a plurality of processor cores being packaged as a processor and integrated on a circuit board.
- processor core is an abbreviation of "processor core”, which is also called a kernel or a core, and is the most important component of a CPU (Central Processing Unit), which is determined by a single crystal silicon. The manufacturing process, the CPU all calculations, receiving commands or storage commands, processing data are executed by the processor core.
- CPU Central Processing Unit
- multiprocessor core in the term refers to the inclusion of at least two processor cores, and the “multiprocessor core” covers the multi-core (Multi-Core) in the prior art, as well as the range of applications of the Many Core. .
- the TLB may also be referred to as a page table buffer, where some page table files, that is, virtual address to physical address conversion entries are stored.
- the TLB can be used for interaction between a virtual address and a physical address, providing a buffer for finding a physical address, which can effectively reduce the time taken by the kernel to find a physical address.
- a “master” indicates a core of a TLB resource management TLB entry that is in an active state and is capable of using other idle cores.
- “Slave” means a core that is in an idle state and can share its own TLB resources with the primary core.
- the “TLB Missing” indicates that there is no TLB entry corresponding to the address translation request in the TLB of the core.
- the "TLB hit” indicates that the TLB in the core has a TLB entry corresponding to the address translation request.
- TLB replacement indicates a TLB entry in the primary core and a location of the TLB entry in the core corresponding to the address translation request. For example, as shown in FIG. 6, the TLB entry corresponding to the address translation request from the core, that is, the "hit” TLB entry in the TLB from the core in FIG. 6 and the "replacement” in the TLB of the primary core. "TBL entry interchange position.
- the first, the second, the third, and the fourth are only for distinguishing the core, and should not be limited to the scope of protection of the present invention.
- the first core may also be referred to as a second.
- the core, the second core may be referred to as a first core or the like.
- the fourth core may be the same core as the second core or the third core, or may be different cores according to the actual situation, which is not limited by the embodiment of the present invention.
- the TLB resources in the idle core are dynamically allocated to the core of the work that is performing the task (which may also be referred to as the core or the working node of the working state), and the TLB capacity of the working core is expanded, and the TLB is reduced. Missing, and ultimately achieve the purpose of speeding up the operation of the program.
- a working core can obtain one or more free core TLB resources to satisfy its TLB access needs.
- the core that obtains the work of the TLB resource here is called the master, and the core that provides the idle of the TLB resource is called the slave.
- the master The core that obtains the work of the TLB resource
- the slave the core that provides the idle of the TLB resource.
- frequently accessed TLB entries are located in the primary core, and infrequently accessed TLB entries are located in the secondary core.
- the "core” included in the multi-core processing in the embodiment of the present invention may also be referred to as a "node", that is, the “node” herein may be equivalent to the "core” included in the multi-core processor.
- the multi-core processor in the embodiment of the present invention includes a plurality of cores, and may also be referred to as a multi-core processor, including a plurality of nodes, which is not limited by the embodiment of the present invention.
- the multi-core processor in the embodiments of the present invention may include at least two cores.
- it may include two cores, four cores, eight cores, 16 cores, 32 cores, and the like, and the embodiment of the present invention is not limited thereto.
- the basic structure of the multi-core processor of the embodiment of the present invention will be described below with reference to FIG.
- the multi-core processor shown in Figure 1 includes 16 cores, Core0-Core15, and each core includes:
- a cache module such as a cache module, includes a level 1 cache (L1) and a level 2 cache (L2);
- a processing block in each core contains a TLB.
- each core is connected through an on-chip network and communicates with each other through an on-chip network interface.
- the communication between the two cores that are horizontally or vertically adjacent can be called a hop.
- the communication path that Core1 communicates with Core3 requires at least three hops, namely Core1-Core2-Core3.
- FIG. 2 is a schematic flow chart of a method of managing a TLB in accordance with one implementation of the present invention.
- the method shown in Figure 2 can be performed by the first core.
- the method 200 shown in FIG. 2 is applied to a multi-core processor including a first core, the first core including a TLB, it being understood that the first core may be any one of the multi-core processors
- the core for example, in the multi-core processor 100 shown in FIG. 1, the first core may be any one of Core0-Core 15, which is not limited by the embodiment of the present invention.
- the method 200 shown in FIG. 2 includes:
- the first core receives the first address translation request, and queries the TLB in the first core according to the first address translation request.
- the first core after receiving the first address translation request, the first core queries the TLB in the first core. Whether there is a TLB entry corresponding to the first address translation request.
- the first target TLB entry is obtained when the first core determines that the first target TLB entry corresponding to the first address translation request is missing in the TLB in the first core.
- the first core may obtain the first target TLB entry from other cores in the multi-core processor, for example, the first core may broadcast a TLB query request to other cores in the multi-core system, and in the broadcast The virtual address corresponding to the first target TLB entry is carried in the TLB query request. After the other core receives the broadcast address, it searches for the virtual address in the local TLB, if one of the processor cores If the TLB hits, the first target TLB entry may be fed back to the first core.
- the first core can quickly obtain the first target TLB entry from the other cores, preventing the first core from initiating a query request to the operating system, and acquiring the first target TLB entry from the memory. Can save a certain amount of time and improve application efficiency.
- the first core may also obtain the first target TLB entry from the page table of the memory.
- the first core sends a query request to the operating system, and the query request carries the virtual address that causes the deletion, and is processed by the operating system.
- the first target TLB entry is obtained in the page table of the memory. Embodiments of the invention are not limited thereto.
- the first core determines the second core from the core in the idle state of the multi-core processor.
- the first core which may also be referred to as a working core or a working node
- the entry in the local TLB is full
- the first core needs to replace a valid TLB entry with the first target TLB entry.
- the first core attempts to acquire more TLB resources to save the replaced TLB entry. Therefore, the first core needs to determine the second core from the idle core.
- the embodiment of the present invention saves the replaced TLB entry by using the LTB resource in the idle core, which not only improves the utilization of the idle core TLB resource, but also indirectly increases the capacity of the first core TLB, and reduces the capacity.
- the first core uses memory to obtain the occurrence of the target TLB entry and speed up the execution of the program.
- the first core determines the second core from the core in the idle state of the multi-core processor, including:
- the first core sends a status query request to each of the other cores in the multi-core processor, and the status query request is used to query whether each core is in an idle state;
- the first core receives a response message sent by each core of the other cores, where the response message is used to indicate whether each core is in an idle state;
- the first core selects a core from the core in an idle state as the second core according to the response message.
- the first core selects a core from the core in an idle state as the second core according to the response message, and includes:
- the idle core list includes a core in the multi-core processor that is in an idle state other than the first core;
- an idle core with the smallest communication overhead with the first core is selected as the second core.
- the first core selects the idle core with the smallest communication overhead of the first core as the second core, and stores the replaced TLB entry in the second core, thereby minimizing the communication overhead.
- the first core can quickly query the TLB entry and improve the execution efficiency of the program.
- the degree of congestion of the network on chip (NoC) route determines that a certain processor core is selected as the second core from the idle core based on the degree of network congestion.
- NoC network on chip
- the first core uses the idle core with the smallest number of communication hops of the first core in the idle core list as the second core;
- the first core uses the idle core having the smallest physical distance from the first core in the free core list as the second core.
- the core with the smallest communication overhead may be determined in other manners, and the embodiment of the present invention is not limited thereto.
- the first core is Core5
- the idle cores included in the free core list are Core7, Core11, and Core14
- the minimum communication jump of Core5 to Core7 is 2 hops, that is, the communication path is Core5-Core6-core7; the minimum communication hop count of Core5 to Core11 is 3 hops, for example, one of the least hops communication path is Core5-Core6-core7-Core11; Core5 to Core14
- the minimum number of hops is 3 hops.
- the communication path of one of the least hops is Core5-Core9-core13-Core14. Therefore, Core5 is selected Core7 with the smallest number of communication hops is selected as the above second core.
- Core 5 selects Core 7 as the second core.
- the first core (which may be referred to herein as a requesting node) broadcasts a status query request (also referred to as a TLB query request) to other cores (also referred to as other nodes) in the multi-core processor; the status query request is used for Query whether each core is in an idle state; after receiving the status query request, each core sends a response message to the first core (the requesting node), where the response message is used to indicate whether the core is in an idle state.
- the first core obtains a list of free cores based on the response message. If the idle core list is empty, the TLB resource is deleted, and the TLB entry (that is, the first target TLB entry) that is missing in the first core is read in the existing manner.
- the TLB resource is acquired, that is, the first core selects an idle core with the smallest communication overhead according to the communication overhead between the idle core and the idle core list, and sends a TLB sharing request to the idle core. If the idle core has been shared by other nodes at this time, or it becomes active, send a failure feedback to the requesting node, remove the node from the list of free nodes, and then repeat the above process. If the idle core is in an idle state at this time, it is determined that the idle core is the second core.
- the first core selects the idle core with the smallest communication overhead of the first core as the second core, and stores the replaced TLB entry in the second core, thereby minimizing the communication overhead.
- the first core can quickly query the TLB entry and improve the execution efficiency of the program.
- the first core replaces the first entry in the TLB in the first core with the first target TLB entry, and stores the first entry in the TLB in the second core.
- the first core replaces the first entry in the first entry in the TLB of the first core with the first target TLB entry, and stores the first entry in the TLB of the second core. .
- the location of the first entry may be any one of the entries in the TLB of the primary core, which is not limited by the embodiment of the present invention.
- the first entry may be any TLB entry in the TLB of the first core.
- the first entry may be the first entry, the last entry, or the middle of the TLB of the first core.
- An embodiment of the present invention is not limited by the embodiment of the present invention.
- the first core needs to store the first target TLB entry in the first table of the TLB of the first core.
- the item location, and the first entry replaced by the first entry location is stored in the TLB of the second core.
- the first core may be referred to as a primary core of the second core
- the second core may be referred to as a secondary core (or a standby core) of the first core
- the master writes the TLB entry to the TLB of the slave. If the TLB entry in the master is replaced, as shown in Figure 2, after the first target TLB entry is obtained, the TLB entry of the primary core is full. Therefore, you need to The first target TLB entry is re-filled to the first entry, that is, the first entry in the first entry is replaced by the first target TLB entry, and then the first entry is stored in the secondary core ( Slave) (for example, the second core) in the TLB.
- Slave secondary core
- the write location is saved by the slave.
- a rotation writing mechanism may be adopted, that is, if the number of TLB entries from the core is N, the first entry is started, that is, from 0 to N-1 entries, that is, until When the entry is full, the processing method of the TLB when the entry from the core (for example, the second core) is full will be described in detail below.
- the embodiment of the present invention when the TLB entry of the first core is full and the first target TLB entry is missing, the first entry in the TLB of the first core is replaced after the first target TLB entry is obtained. And storing the first entry in the TLB of the second core, because the replaced entry is stored in the TLB of the second core. Therefore, the embodiment of the present invention can increase the TLB capacity of the working core by utilizing the TLB resource of the idle core, thereby reducing the TLB miss rate and speeding up the execution of the program.
- the working core ie, the master (eg, the first core)
- writes the replaced TLB entry to the idle node that is, from the TLB of the slave (eg, the second core).
- the working core needs to use these replaced TLB entries again, it is not necessary to re-acquire these entries through the operating system.
- These entries can be obtained by directly accessing the Slave TLB, thus greatly reducing the TLB refilling. Delays speed up the execution of the program.
- an idle core can only share its own TLB resources to a working core; a working core can acquire a TLB resource storage TLB entry of a multi-core idle core.
- the method may further include:
- the identity of the second core is recorded in the TLB backup core list in the first core.
- the list of alternate cores here can also be referred to as a list of slave cores.
- the second target is determined by acquiring the first target TLB entry, and the second core is recorded in the second core.
- the first core can read and write the TLB in the second core in the standby core list according to the standby core list, thereby increasing the capacity of the TLB of the first core, thereby reducing the first
- the nuclear TLB miss rate accelerates the execution of the program.
- the second core After the first core determines the second core from the list of free cores, the second core becomes the slave of the first core (which may also be referred to as a standby core).
- the slave core (second core) writes the identifier (eg, number, etc.) of the first core into the master core number register of the slave core, and the first core becomes the master core of the second core, and
- the identity of the secondary core (second core) (eg, number, etc.) is added to its alternate core list.
- the identifiers of all the slave cores may be recorded in the standby core list.
- all the slaves of the current master may also be recorded in a vector form.
- the vector recorded in the standby core list may be 4 bits.
- the first to fourth bits represent the first core to the fourth core, namely Core0-core3.
- the first core in the embodiment of the present invention may be Core3 in FIG.
- the vector indicating the alternate core list of the first core may be 0100, where 0 represents a secondary core that is not the first core, and 1 represents From the core of a core, then according to the vector in the list of alternate cores, the second of the four cores is the secondary core of the first core.
- the alternate core list may be a vector of 16 bits, wherein the 16 bits from the left to the right represent Core0-Core15, respectively.
- the vector of the alternate core list of the first core may be, for example, 000000100000000; since the seventh bit is 1, the core Core6 corresponding to the seventh bit is the secondary core of Core5.
- the method may further include:
- the first core receives the second address translation request, and queries the TLB in the first core according to the second address translation request;
- the second target TLB entry is replaced with the second entry in the TLB in the first core, and the second entry is Entry storage In the TLB within the second core.
- the first core stores the second entry in the location of the original second target TLB entry in the second core. That is, the first core swaps the storage location of the second entry with the second target TLB entry.
- the first core entry in the TLB of the second core is read by the first core as an example.
- the first core may be all the slave cores.
- the request for reading the second target entry is sent, and the second target entry may also be located in the TLB of the other secondary core. The process of reading and replacing the corresponding entry is similar to the foregoing description, and details are not described herein again.
- the above process occurs when the master (first core) local TLB entry is missed, that is, if the local TLB does not have the second target TLB entry, the master reads the TLB entry in the slave. For example, as shown in Figure 5.
- the Master sends a TLB read request to all slaves according to the Slave List (also known as the slave list). After receiving the TLB read request, the Slave queries the TLB from the core. If the second target TLB Miss, it returns the missing feedback. If it hits, it returns the hit feedback and hits the contents of the TLB entry.
- the TLB query request is sent to the operating system, and the missing TLB entry is obtained from the memory; if the Slave hits, the hit TLB entry is used for refilling. When an entry occurs in the refill process, the replaced entry is written to the hit slave.
- the core working in the embodiment of the present invention has the read/write permission of the TLB of the idle core, and saves the replaced TLB entry by using the LTB resource in the idle core, thereby improving the utilization of the idle core TLB resource. Further, the capacity of the first core TLB is further improved.
- the target entry can be obtained by reading the TLB from the core, that is, the first core is reduced by the memory. The occurrence of the target TLB entry can speed up the execution of the program and improve the efficiency of the execution of the program.
- a rotation writing mechanism may be adopted, that is, if the number of TLB entries from the core (for example, the second core) is N, the first entry is started, that is, sequentially written from 0 until N-1. .
- the slave can be called the full slave. (Full Slave); then the Slave will not be able to keep other replaced entries. Full Slave will send a write overflow request to the Master. After receiving the request, the Master will use the Full Slave list to record the Full Slave.
- the method may further include:
- the first core receives the third address translation request, and queries the TLB in the first core according to the third address translation request;
- the first core determines the third core from the core in which the multi-core processor is in an idle state
- the first core replaces the third entry in the TLB in the first core with the third target TLB entry, and stores the third entry in the TLB in the third core.
- the embodiment of the present invention acquires a new slave core to save the replaced TLB entry, further expanding the capacity of the first core, and when the first core queries the When the replaced TLB entry is read, the first core can be directly read from the new slave core without the core being acquired through the memory. Therefore, the embodiment of the present invention can speed up the execution of the program and improve the execution efficiency of the program.
- the entry in the TLB of the primary core (first core) is already full, and the TLB entries in all the secondary cores of the primary core (eg, only one secondary core, the second core) are stored.
- the primary core (first core) determines from the primary core from other idle cores in the multi-core processor that the third core is used to store the third of the TLBs of the first core replaced by the third target TLB entry. Entry.
- the process of determining the third core can refer to the process of determining the second core, and details are not described herein again.
- the slaves that are in the master are all Full Slaves. If you continue to write the replaced entries to the TLB of the Full Slave, the previous replaced entries will be overwritten. In order to avoid this situation, the Master repeats the Slave acquisition process described above, obtains the new Slave1 in the figure (for example, the third core), and writes the replaced entry (for example, the third entry) into the Slave 1 .
- the scheme in which the primary core determines the slave core and uses the resources from the core to store the TLB entry is described above. After the main core transitions from the working state to the idle state, since the TLB resource is no longer used, the obtained secondary resource needs to be released.
- the method further includes:
- the first core sends a TLB release instruction to the core recorded in the TLB standby core list, and the TLB release instruction is used to indicate the core release TLB sharing recorded in the standby core list.
- the Master becomes idle: all TLB resources that have been acquired are released.
- the Master sends a TLB release request to all slaves according to the Slave List, so that the TLB resources of these idle cores can be used by other working cores. For example, as shown in FIG. 8, after the master releases all the acquired TLB resources, the primary core and the secondary core become idle cores, and can be used by other working cores.
- the acquired TLB resources of the slave core are released by sending a release command to all the slave cores, and in this way, the slave cores (idle cores) are made.
- the release of TLB resources avoids waste of resources, and the released TLB resources of the core can be used by other working cores to increase the capacity of other working cores and speed up the execution of the working core program.
- the slave core may delete the TLB entry that is stored in the core, or may not delete the TLB entry. For example, after the share is solved, all the entries in the TLB can be deleted after the slave core is the slave core of the core working as another. Replace the entry under the core storage for the other work. For example, after the unshared, when the slave core becomes a working state, the previously stored TLB entry may be reserved for its own lookup use.
- the slave sends a TLB unshare request to the Master, and the Master removes it from the Slave List after receiving the request.
- the Slave e.g. the second core
- the slave sends a TLB unshare request to the Master, and the Master removes it from the Slave List after receiving the request.
- the main core releases the Slave1 and deletes it from the list of slave cores.
- the method further includes:
- the first core receives the TLB unshare request sent by the second core, where the TLB unshare request carries the identifier of the second core;
- the first core deletes the identity of the second core.
- the second core sends a TLB unshare request to the first core, so that the first core releases the second core, so that the second core can Using its own TLB resources, it avoids the business processing that affects the second core.
- the second core can also become another main core, and other idle core TLB resources can be used.
- the method before the first core deletes the identifier of the second core in the TLB backup core list, the method further includes:
- the first core determines a fourth core from a core in which the multi-core processor is in an idle state
- the first core copies all entries in the TLB of the second core into the TLB of the fourth core.
- the primary core eg, the first core
- the secondary core eg, the second core
- a core new secondary core
- the new The TLB from the core is used to store all entries in the TLB of the deleted core (ie the second core). If you need to query the entries in the second core in subsequent queries, you do not need to re-obtain these entries through the operating system. You can obtain these entries by directly accessing the TLBs from the core (fourth core), thus greatly reducing the number of entries.
- the delay of TLB refilling speeds up the execution of the program.
- the process of determining the fourth core can refer to the process of determining the second core, and details are not described herein again.
- the main core releases the Slave1 and deletes it from the list of slave cores.
- the master core will re-determine a slave core, such as Slave 2, store all entries in the TLB of Slave 1 in the TLB of Slave 2, and record the slave 2 in the list of slaves.
- the embodiment of the present invention acquires a new slave core (fourth core) by using the first core to save all entries in the released TLB in the second core, and if the first core needs to be queried again in subsequent queries.
- the entries in the second core do not need to be re-acquired from the memory through the operating system, but can be obtained by directly accessing the TLB from the core (fourth core), thereby greatly reducing the TLB refill.
- the delay has speeded up the execution of the program.
- each core of the multi-core processor in the embodiment of the present invention is provided with a flag bit register, which is used for recording the status flag bit of the core, the master flag and the slave.
- Nuclear (Slave) flag ;
- the status flag is used to indicate an operating state and a shared state of the core, the operating state including an idle state or a working state, the shared state including a primary core state, a secondary core state, or a non-participating shared state, the primary nuclear state indicating the core
- the slave core state indicates that the core is in an idle state and the TLB resource is shared by the primary core;
- the primary core flag is used to indicate an idle core list, a slave list (also referred to as a standby core list), and a full Full Slave list when the core is the primary core, the free core list. Included is a vector for representing all of the free cores, the list of slave cores including vectors for all of the slave cores representing the core, the full list of slave cores including vectors for indicating all of the slave cores of the core;
- the slave core flag is used to indicate a write location of a master core number and a replacement entry when the core is a slave core, the master core number including an identifier of a unique master core of the core, the replacement entry
- the write location includes the location of the replacement entry in the slave core from the core.
- each node it is necessary to add some registers in each node to save flag bits, including a status flag bit, a Master flag bit, and a Slave flag bit. Since each node can become a Master or a Slave, these three flags are provided in each node. For example, as shown in Table 1.
- Run status bit Differentiate between idle/work nodes, so only 1 bit wide register can be used.
- Free Node/Slave/Full Slave List The functions have been described in detail above.
- the vector is used, the width of the vector is equal to the number of nodes in the system, and each bit (bit) in the vector corresponds to one node.
- bit bit in the vector is 0 indicating that the corresponding node is not a slave, and a value of 1 indicates that the corresponding node is a slave.
- Slave can only have a unique Master.
- the Master number is recorded in the Master number.
- the corresponding Master can be notified according to the Master number.
- the binary bit width of the Master number can be That is, the smallest integer greater than or equal to log 2 (number of nodes), for example, when the multi-core processor includes 8 cores (8 nodes), it means that the master number needs That is 3 bits. For example, when the multi-core processor includes 12 cores (12 nodes), it means that the master number needs That is 3 bits.
- Write position Maintenance by Slave, indicating the write position when the Master is replaced. Its binary bit width can be That is, the smallest integer greater than or equal to log 2 (the number of TLB entries). For example, when the TBL in the Slave has 64 entries, it means that the write location needs That is 6 bits.
- the setting of the flag bit enables the main core to read and write the TLB resource from the core, thereby expanding the capacity of the TLB of the main core, that is, reducing the first core to obtain the target TLB table through the memory.
- the occurrence of the item can speed up the execution of the program and improve the efficiency of the execution of the program.
- FIG. 11 is a schematic block diagram of a multi-core processor 1100 in accordance with one embodiment of the present invention.
- the multi-core processor 1100 includes a first core 1110 and a second core 1120.
- a third core 1130 may be further included, and optionally, a fourth core 1140 may also be included.
- each core of the multi-core processor 1100 includes a processing module; the cache module, such as a cache module, includes a level 1 cache (L1) and a level 2 cache (L2); and an on-chip network interface (Switch).
- a processing block in each core contains a TLB. Among them, each core is connected through an on-chip network and communicates with each other through an on-chip network interface.
- the multi-core processor 1100 in the embodiment of the present invention may further include more cores.
- the multi-core processor 1100 may include 8 cores, 10 cores, 16 cores, 32 cores, etc., in the embodiment of the present invention. Do not limit this.
- the first core may be any one of the multi-core processors.
- the first core is described as the first core in FIG. 1, but the embodiment of the present invention is Not limited to this.
- the second core may also be any one of the multi-core processors except the first core, that is, the second core may not be directly connected to the first core, which is convenient for representation.
- the second core and the first core are directly connected as an example, but the embodiment of the present invention is not limited thereto.
- the third core and the fourth core hereinafter are directly connected directly to the first core or the second core in FIG. 11 for convenience of display, but the embodiment of the present invention is not limited thereto.
- the multi-core processor shown in FIG. 11 corresponds to the method embodiments of FIGS. 1 to 10, and the multi-core processor 1100 of FIG. 11 can implement the processes of the methods related to FIGS. 1 to 10, and the present invention is avoided in order to avoid redundancy.
- the detailed description of the embodiment is omitted as appropriate.
- the first core 1110 is configured to receive a first address translation request, and query the TLB in the first core according to the first address translation request;
- the second core 820 is configured to store the first entry in a TLB in the second core.
- first core in the embodiment of the present invention may be referred to as a primary core of the second core, and the second core may be referred to as a secondary core of the first core.
- the embodiment of the present invention when the TLB entry of the first core is full and the first target TLB entry is missing, the first entry in the TLB of the first core is replaced after the first target TLB entry is obtained. And storing the first entry in the TLB of the second core, because the replaced entry is stored in the TLB of the second core. Therefore, the embodiment of the present invention can increase the TLB capacity of the working core by utilizing the TLB resource of the idle core, thereby reducing the TLB miss rate and speeding up the execution of the program.
- the working core ie, the master (eg, the first core)
- writes the replaced TLB entry to the idle node that is, from the TLB of the slave (eg, the second core).
- the working core needs to use these replaced TLB entries again, there is no need to reacquire these through the operating system.
- the entries can be obtained by directly accessing Slave's TLB, thus greatly reducing the delay of TLB refilling and speeding up the execution of the program.
- an idle core can only share its own TLB resources to a working core; a working core can acquire a TLB resource storage TLB entry of a multi-core idle core.
- the first core 1110 when acquiring the first target TLB entry, is specifically configured to:
- the first target TLB entry is obtained from other cores in the multi-core system.
- the first core may broadcast a TLB query request to other cores in the multi-core system, and carry the virtual address that causes the missing, that is, the virtual address corresponding to the first target TLB entry, and other cores. After the broadcast address is obtained, the virtual address is searched for in the local TLB. If the TLB in one of the processor cores is hit, the first target TLB entry may be fed back to the first core.
- the first core can quickly obtain the first target TLB entry from the other cores, preventing the first core from initiating a query request to the operating system, and acquiring the first target TLB entry from the memory. Can save a certain amount of time and improve application efficiency.
- the first core may also obtain the first target TLB entry from the page table of the memory.
- the first core sends a query request to the operating system, and the query request carries the virtual address that causes the deletion, and is processed by the operating system.
- the first target TLB entry is obtained in the page table of the memory. Embodiments of the invention are not limited thereto.
- the first core 1110 is specifically configured to:
- the status query request is used to query whether each core is in an idle state
- the response message is used to indicate whether each core is in an idle state
- one core is selected as the second core from the core in the idle state.
- the embodiment of the present invention saves the replaced TLB entry by using the LTB resource in the idle core, which not only improves the utilization of the idle core TLB resource, but also indirectly increases the capacity of the first core TLB, and reduces the capacity.
- the first core uses memory to obtain the occurrence of the target TLB entry and speed up the execution of the program.
- the first core 1110 is specifically configured to:
- an idle core list Determining, according to the response message, an idle core list, where the idle core list includes a core in the multi-core processor that is in an idle state other than the first core;
- an idle core with the smallest communication overhead with the first core is selected as the second core.
- the first core selects the idle core with the smallest communication overhead of the first core as the second core, and stores the replaced TLB entry in the second core, thereby minimizing the communication overhead.
- the first core can quickly query the TLB entry and improve the execution efficiency of the program.
- the first core 1110 is specifically configured to:
- the idle core with the smallest number of communication hops of the idle core list and the first core is used as the second core;
- the idle core having the smallest physical distance from the first core in the free core list is used as the second core.
- the first core 1110 is further configured to record the identifier of the second core in a TLB backup core list in the first core.
- the second target is determined by acquiring the first target TLB entry, and the second core is recorded in the In the standby core list of the first core, the first core can read and write the TLB in the second core in the standby core list according to the standby core list, thereby increasing the capacity of the TLB of the first core, thereby reducing the first core.
- the TLB miss rate speeds up the execution of the program.
- the first core 1110 is further configured to:
- the second target TLB entry is replaced with the second entry in the TLB in the first core.
- the second core 1120 is further configured to store the second entry in the TLB in the second core.
- the core working in the embodiment of the present invention has the read/write permission of the TLB of the idle core.
- the LTB resource in the idle core is used to save the replaced TLB entry, which not only improves the utilization of the idle core TLB resource, but also further increases the capacity of the first core TLB, and the TLB missing table in the first core.
- the target entry can be obtained by reading the TLB from the core, that is, the first core is used to obtain the target TLB entry through the memory, which can speed up the execution of the program and improve the execution efficiency of the program.
- the first core 1110 is further configured to:
- the third core is configured to store the third entry in a TLB within the third core.
- the embodiment of the present invention acquires a new slave core to save the replaced TLB entry, further expanding the capacity of the first core, and when the first core queries the When the replaced TLB entry is read, the first core can be directly read from the new slave core without the core being acquired through the memory. Therefore, the embodiment of the present invention can speed up the execution of the program and improve the execution efficiency of the program.
- the first core is further configured to:
- a TLB release instruction is sent to the core recorded in the TLB standby core list, and the TLB release instruction is used to indicate the core release TLB sharing recorded in the alternate core list.
- the acquired TLB resources of the slave core are released by sending a release command to all the slave cores, and in this way, the slave cores (idle cores) are made.
- the release of TLB resources avoids waste of resources, and the released TLB resources of the core can be used by other working cores to increase the capacity of other working cores and speed up the execution of the working core program.
- the second core is used to release the sharing request to the TLB sent by the first core, and the TLB is unshared.
- the request carries the identifier of the second core;
- the first core is further configured to receive the TLB unshare request, and delete the identifier of the second core in the TLB standby core list.
- the second core sends a TLB unshare request to the first core, so that the first core releases the second core, so that the second core can Using its own TLB resources, it avoids the business processing that affects the second core.
- the second core can also become another main core, and other idle core TLB resources can be used.
- the first core is further configured to determine a fourth core 1140 from a core in which the multi-core processor is in an idle state;
- the TLB of the fourth core is used to store all entries in the TLB of the second core.
- the embodiment of the present invention acquires a new slave core (fourth core) by using the first core to save all entries in the released TLB in the second core, and if the first core needs to query again in subsequent queries,
- the entries in the second core do not need to be re-acquired from the memory through the operating system, but can be obtained by directly accessing the TLB from the core (fourth core), thereby greatly reducing the TLB refill.
- the delay has speeded up the execution of the program.
- each core of the multi-core processor in the embodiment of the present invention is provided with a flag bit register, which is used for recording the status flag bit of the core, the master flag and the slave.
- Nuclear (Slave) flag ;
- the status flag is used to indicate an operating state and a shared state of the core, the operating state including an idle state or a working state, the shared state including a primary core state, a secondary core state, or a non-participating shared state, the primary nuclear state indicating the core
- the slave core state indicates that the core is in an idle state and the TLB resource is shared by the primary core;
- the primary core flag is used to indicate an idle core list, a slave list (also referred to as a standby core list), and a full Full Slave list when the core is the primary core, the free core list. Included is a vector for representing all of the free cores, the list of slave cores including vectors for all of the slave cores representing the core, the full list of slave cores including vectors for indicating all of the slave cores of the core;
- the slave core flag is used to indicate a write location of a master core number and a replacement entry when the core is a slave core, the master core number including an identifier of a unique master core of the core, the replacement entry
- the write location includes the location of the replacement entry in the slave core from the core.
- each node in order to implement the TLB resource sharing method proposed by the present invention, it is necessary to add some registers in each node to save flag bits, including status flag bits, Master flag bits, and Slave. Sign bit. Since each node can become a Master or a Slave, these three flags are provided in each node. For example, as shown in Table 1 above, it will not be described here.
- the setting of the flag bit enables the main core to read and write the TLB resource from the core, thereby expanding the capacity of the TLB of the main core, that is, reducing the first core to obtain the target TLB table through the memory.
- the occurrence of the item can speed up the execution of the program and improve the efficiency of the execution of the program.
- system and “network” are used interchangeably herein.
- the term “and/or” in this context is merely an association describing the associated object, indicating that there may be three relationships, for example, A and / or B, which may indicate that A exists separately, and both A and B exist, respectively. B these three situations.
- the character "/" in this article generally indicates that the contextual object is an "or" relationship.
- B corresponding to A means that B is associated with A, and B can be determined according to A.
- determining B from A does not mean that B is only determined based on A, and that B can also be determined based on A and/or other information.
- the disclosed systems, devices, and methods may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of cells is only a logical function division.
- multiple units or components may be combined or integrated. Go to another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, or an electrical, mechanical or other form of connection.
- the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the embodiments of the present invention.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
- Computer readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one location to another.
- a storage medium may be any available media that can be accessed by a computer.
- computer readable media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, disk storage media or other magnetic storage device, or can be used for carrying or storing in the form of an instruction or data structure.
- connection may suitably be a computer readable medium.
- the software is transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave
- coaxial cable , fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, wireless, and microwave are included in the fixing of the associated media.
- a disk and a disc include a compact disc (CD), a laser disc, a compact disc, a digital versatile disc (DVD), a floppy disk, and a Blu-ray disc, wherein the disc is usually magnetically copied, and the disc is The laser is used to optically replicate the data.
- CD compact disc
- DVD digital versatile disc
- a floppy disk a compact disc
- Blu-ray disc wherein the disc is usually magnetically copied, and the disc is The laser is used to optically replicate the data.
- Combinations should also be included within the scope of the computer readable media.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
一种管理转址旁路缓存TLB的方法和多核处理器,其方法用于多核处理器,多核处理器包括第一核,第一核包含一个TLB,该方法包括:第一核接收第一地址转换请求,根据第一地址转换请求查询第一核内的TLB(210);在第一核内的TLB中,确定和第一地址转换请求对应的第一目标TLB表项缺失时,获取第一目标TLB表项(220);在判断第一核内的TLB中的表项存储已满时,从多核处理器中处于空闲状态的核中确定第二核(230);将第一目标TLB表项替换掉第一核内的TLB中的第一表项,并将第一表项存储在第二核内的TLB中(240)。本方法通过利用空闲状态的核的TLB资源来扩大工作的核的TLB容量,能够降低TLB缺失率,加快程序的执行。
Description
本发明涉及信息技术领域,并且更具体地,涉及管理转址旁路缓存(Translation Lookaside Buffer,TLB)的方法和多核处理器。
用户程序一般运行在虚拟地址空间中,在用户程序执行过程中,由操作系统(Operating System,OS)和内存管理单元(Memory Management Unit,MMU)负责将访存请求携带的虚拟地址转换成对应物理内存空间的物理地址。虚拟地址包括虚拟页号(Virtual Page Number,VPN)和页内偏移,物理地址包括物理框号(Physical Frame Number,PFN)和页内偏移,在虚拟地址转换为物理地址的过程中,虚拟地址中的页内偏移和物理地址中的页内偏移保持不变,而是通过映射关系将虚拟页号转换为物理框号。一般情况下,映射关系是以表项的方式存储在内存上的页表(Page Table)中。一次地址转换需要经过操作系统处理和内存中页表的访问,会产生较大的时延及性能损失,所以片上多处理器(Chip Multi-Processor,CMP)(也可以称为多核处理器)中的每个处理器核(如下简称核(Core))会存储一个转址旁路缓存TLB,TLB中保存了一些VPN到PFN的转换表项。
随着技术的进步,现有多核场景下,系统中的程序工作集持续增大,也即程序和待处理数据越来越多,然而由于核中的TLB的存储空间有限,因此,随着应用程序和待处理数据的增加,现有核中的TLB存储的表项越来越不能满足使用需求,导致TLB中可能缺失(即:TLB Miss)当前需要的TLB转换表项,即导致TLB的缺失率增大。若当前需要的TLB转换表项缺失,该核通常需要从内存中经过操作系统处理和内存中页表的访问,来获取该TLB转换表项,这样会产生较大的时延及性能损失,降低了程序的执行效率。
因此,如何降低TLB缺失率,加快程序的执行,成为亟待解决的问题。
发明内容
本发明实施例提供一种处理转址旁路缓存的方法和多核处理器,该方法
能够扩大处于工作状态的核内的TLB容量,从而降低TLB缺失率,加快程序的执行。
第一方面,提供了一种管理转址旁路缓存TLB的方法,应用于多核处理器,该多核处理器包括第一核,该第一核包含一个TLB,该方法包括:
该第一核接收第一地址转换请求,根据该第一地址转换请求查询该第一核内的TLB。
在该第一核内的TLB中,该第一核确定和该第一地址转换请求对应的第一目标TLB表项缺失时,获取该第一目标TLB表项。
在判断该第一核内的TLB中的表项存储已满时,该第一核从该多核处理器中处于空闲状态的核中确定第二核。
该第一核将该第一目标TLB表项替换掉该第一核内的TLB中的第一表项,并将该第一表项存储在该第二核内的TLB中。
具体地,该第一核可以是从多核处理器中的其他核中获得该第一目标TLB表项,例如,第一核可以向多核系统中的其他核广播TLB查询请求,并在该广播的TLB查询请求中携带导致缺失的虚拟地址,即第一目标TLB表项对应的虚拟地址,其他的核收到广播地址后,在本地的TLB中查找此虚拟地址,如果其中一个处理器核内的TLB命中,则可以将该第一目标TLB表项反馈给该第一核。
通过这种方式,使得第一核可以从其他的核中快速的获的该第一目标TLB表项,避免该第一核向操作系统发起查询请求,从内存中获取该第一目标TLB表项,能够节省一定的时间,提高了应用效率。
另外,第一核也可以是从内存的页表中获得该第一目标TLB表项,例如,第一核向操作系统发送查询请求,查询请求中携带导致缺失的虚拟地址,经过操作系统处理从内存的页表中获得该第一目标TLB表项。本发明实施例并不限于此。
也就是说,在第一核(也可以称为工作的核或工作节点)发生TLB缺失(Miss),并且本地TLB中的表项存储已满时,为了存储获取的第一目标TLB表项,该第一核需要用第一目标TLB表项替换一个有效TLB表项。在这种情况下,第一核试图获取更多TLB资源,来保存被替换下的TLB表项,因此,第一核需要从空闲核中确定第二核。
因此,本发明实施例通过使用空闲核内的LTB资源来保存被替换下的
TLB表项,不但提高了空闲核TLB资源的利用率,还间接的提高了第一核的TLB的容量,减低了第一核通过内存来获取目标TLB表项的发生,加快程序的执行。
应理解,本发明实施例中的多核处理包括的核也可以称为节点,本文中的节点可以与多核处理器中的核等同。
例如,该第一核将第一目标TLB表项替换掉该第一核的TLB中的第一表项位置中的第一表项,并将该第一表项存储在第二核的TLB中。
应理解,第一表项位置可以为主核的TLB中任意一个表项位置,本发明实施例并不对此做限定。换句话说,第一表项可以为第一核的TLB中的任意一个TLB表项,例如,第一表项可以是第一核的TLB中的第一个表项、最后一个表项或者中间的一个表项,本发明实施例并不对此做限定。
也就是说,在第一核的TLB表项已存储满,且第一目标TLB表项缺失时,该第一核需要将第一目标TLB表项存储在该第一核的TLB的第一表项位置,并将由第一表项位置替换下来的第一表项存储在第二核的TLB中。
在本发明实施例中,可以将第一核称为第二核的主核,可以将第二核称为第一核的从核(或备用核)。
应理解,主核(Master)将TLB表项写入从核(Slave)的TLB中。这个过程发生在Master中的TLB表项被替换的情况下,当Master中获得第一目标TLB表项后,由于主核的TLB表项已存储满,因此,需要将该第一目标TLB表项重填到第一表项位置,也即用第一目标TLB表项将第一表项位置中的第一表项替换下来,然后将第一表项存储在从核(Slave)(例如,第二核)的TLB中。
其中,在将被替换表项写入Slave的TLB的过程中,由Slave来保存写入位置。本发明实施例中可以采用轮流写入机制,即如果从核的TLB表项数为N,则从第一个表项开始,即从0开始依次写入直到N-1个表项,即直到表项全满,在从核(例如,第二核)的表项存储已满时的TLB的处理方法,将在下文中详细描述。
因此,本发明实施例,在第一核的TLB表项存储已满且第一目标TLB表项缺失,在获取到该第一目标TLB表项后替换第一核的TLB中的第一表项,并将第一表项存储在第二核的TLB中,由于将替换下来的表项存储在了第二核的TLB中。因此,本发明实施例通过利用空闲核的TLB资源来扩
大工作的核的TLB容量,从而能够降低TLB缺失率,加快程序的执行。
由于工作的核即主核(Master)(例如,第一核)将被替换的TLB表项写入空闲节点即从核(Slave)(例如,第二核)的TLB中。当工作的核需要再次用到这些被替换的TLB表项时,不需要通过操作系统来重新获取这些表项,通过直接访问Slave的TLB就可以获取这些表项,因此大大降低了TLB重填的时延,加快了程序的执行。
应注意,本发明实施例中一个空闲的核仅能将自身的TLB资源共享给一个工作的核使用;一个工作的核可以获取多核空闲的核的TLB资源存储TLB表项。
进一步地,在第一方面的一种实现方式中,该第一核从该多核处理器中处于空闲状态的核中确定第二核,包括:
该第一核向该多核处理器中其他核中每个核发送状态查询请求,该状态查询请求用于查询该每个核是否处于空闲状态;
该第一核接收该其他核中每个核发送的响应消息,该响应消息用于指示该每个核是否处于空闲状态;
该第一核根据该响应消息,从处于空闲状态的核中选择一个核作为该第二核。
因此,本发明实施例通过使用空闲核内的LTB资源来保存被替换下的TLB表项,不但提高了空闲核TLB资源的利用率,还间接的提高了第一核的TLB的容量,减低了第一核通过内存来获取目标TLB表项的发生,加快程序的执行。
进一步地。在第一方面的另一种实现方式中,该第一核根据该响应消息,从处于空闲状态的核中选择一个核作为该第二核,包括:
该第一核根据该响应消息确定空闲核列表,该空闲核列表中包括该多核处理器中除该第一核外的其他核中处于空闲状态的核;
在该空闲核列表中,选择与该第一核的通信开销最小的空闲核作为该第二核。
因此,本发明实施例中第一核选择与该第一核通信开销最小的空闲核作为第二核,并将替换下的TLB表项存储在第二核中,最大限度的降低通信开销。并且在需要查询第二核内的TLB的表项时,由于通信开销很小,所以第一核能够快速的查询到该TLB表项,提高程序的执行效率。
本领域的技术人员可以理解,基于与第一核的通信开销最小作为选择第二核的原则是一种较为理想的选择方式,在具体实现中,通信开销的判断需要结合多处理器芯片中的片上网络NoC(Network on Chip)路由的拥塞程度来决定,基于网络拥塞程度从空闲核中选择某一处理器核作为第二核,为了简化第二核的选择过程,可采用如下方面的实现方式:
进一步地,在第一方面的另一种实现方式中,在该空闲核列表中,选择与该第一核的通信开销最小的空闲核作为该第二核,包括:
该第一核将该空闲核列表中与该第一核的通信跳数最小的空闲核作为该第二核;或者,
该第一核将该空闲核列表中与该第一核的物理距离最小的空闲核作为该第二核。
其中,本发明实施例中还可以采用其他的方式确定通信开销最小的核,本发明实施例并不限于此。
例如,第一核(这里可以称为请求节点)向多核处理器中其他核(也可以称为其他节点)广播状态查询请求(也可以称为TLB查询请求);该状态查询请求用于查询该每个核是否处于空闲状态;所每个核收到状态查询请求之后,会向第一核(请求节点)发送响应消息,该响应消息用于指示该核是否处于空闲状态。这样第一核根据响应消息就获得了空闲核列表。若空闲核列表为空,则结束对TLB资源的获取,则按照现有的方式读取内存中的该第一核缺失的TLB表项(即第一目标TLB表项)。若空闲核列表非空,则进行TLB资源的获取,即第一核根据和空闲核列表中空闲的核之间的通信开销,选取通信开销最小的空闲核,向其发送TLB共享请求。若此时这个空闲核已经被其他节点共享,或其变为工作状态,则向请求节点发送失败反馈,将此节点从空闲节点列表中移除,然后重复上述过程。若此时这个空闲核是空闲状态,则确定该空闲核为第二核。
因此,本发明实施例中第一核选择与该第一核通信开销最小的空闲核作为第二核,并将替换下的TLB表项存储在第二核中,最大限度的降低通信开销。并且在需要查询第二核内的TLB的表项时,由于通信开销很小,所以第一核能够快速的查询到该TLB表项,提高程序的执行效率。
可选地,在第一方面的另一种实现方式中,在确定该第二核之后,该方法还包括:
将该第二核的标识记录在该第一核内的TLB备用核列表中。
这里备用核列表也可以称为从核列表。
具体而言,在第一核从空闲核列表中确定出第二核后,第二核成为第一核的从核(Slave)(也可以称为备用核)。该从核(第二核)将第一核的标识(例如,编号等)写入该从核的主核(Master)编号寄存器中,第一核成为第二核的主核(Master),并将该从核(第二核)的标识(例如编号等)加入其备用核列表中。
因此,本发明实施例,在第一核的TLB表项存储已满且第一目标TLB表项缺失,在获取到该第一目标TLB表项确定第二核,并将第二核记录在该第一核的备用核列表中,这样第一核根据备用核列表可以对备用核列表中的第二核内的TLB进行读写,进而提升第一核的TLB的容量,从而能够降低第一核的TLB缺失率,加快程序的执行。
可选地,在第一方面的另一种实现方式中,该方法还包括:
在该第一核从该多核处理器处于空闲状态的核中确定第二核之后,该方法还包括:
该第一核接收第二地址转换请求,根据该第二地址转换请求查询该第一核内的TLB;
在该第一核内的TLB中,该第一核确定和该第二地址转换请求对应的第二目标TLB表项缺失时,查询该第二核内的TLB;
当在该第二核内的TLB中查询到该第二目录TLB表项时,将该第二目标TLB表项替换掉该第一核内的TLB中的第二表项,并将该第二表项存储在该第二核内的TLB中。
例如,第一核将第二表项存储在第二核中原第二目标TLB表项的位置。也就是说,第一核将第二表项与第二目标TLB表项互换存储位置。
也就是说,当主核本地的TLB表项缺失时,主核会读取从核中的TLB表项。
上述过程发生在Master(第一核)本地TLB缺失(Miss),即本地TLB不存在第二目标TLB表项的情况下,Master会读取Slave中的TLB表项。Master根据备用核列表(Slave List)(也称为从核列表),向所有的Slave发送TLB读取请求。Slave接收到TLB读取请求之后,查询从核本地的TLB,如果第二目标TLB Miss,则返回缺失反馈;如果命中,则返回命中反馈,
以及命中TLB的表项内容。Master收集所有反馈之后,若都是缺失反馈,则向操作系统发送TLB缺失请求;若有Slave命中,则利用命中TLB表项进行重填。当在重填过程若发生表项替换的话,则将被替换表项写到命中的从核中。
因此,本发明实施例中工作的核拥有空闲的核的TLB的读/写权限,通过使用空闲核内的LTB资源来保存被替换下的TLB表项,不但提高了空闲核TLB资源的利用率,更进一步提高了第一核的TLB的容量,在第一核内的TLB缺失表项时,可以通过读取从核内的TLB获取目标表项,也即减低了第一核通过内存来获取目标TLB表项的发生,能够加快程序的执行,提高程序的执行效率。
在Master中的有效TLB表项被替换,并且备用核列表中的所有Slave都是Full Slave的情况时,由于Full Slave的TLB中保存的都是Master的被替换表项,这些表项后续可能会被Master用到,因此,Master希望获取更多的Slave来保存被替换表项。
相应地,在第一方面的另一种实现方式中,该方法还包括:
该第一核接收第三地址转换请求,根据该第三地址转换请求查询该第一核内的TLB;
在该第一核内的TLB中,该第一核确定和该第三地址转换请求对应的第三目标TLB表项缺失时,查询该第二核内的TLB;
当在该第二核内的TLB中确定第三目标TLB表项缺失时,获取该第三目标TLB表项;
在判断该第一核内的TLB中的表项以及第二核内的TLB中的表项存储都已满时,该第一核从该多核处理器处于空闲状态的核中确定第三核;
该第一核将该第三目标TLB表项替换掉该第一核内的TLB中的第三表项,并将该第三表项存储在该第三核内的TLB中。
具体而言,在主核(第一核)的TLB中的表项已存储满,且该主核的所有从核(例如这里只有一个从核,为第二核)中的TLB表项已存储时,该主核(第一核)从该主核从多核处理器中的其他空闲核中确定第三核用于存储被第三目标TLB表项替换下来的第一核的TLB中的第三表项。
因此,本发明实施例在当前的从核存储已全满后,会获取新的从核来保存替换下的TLB表项,进一步扩大了该第一核的容量,当该第一核再次查
询该替换下来的TLB表项时,该第一核可以直接从该新的从核中读取,无需该核通过内存来获取。因此,本发明实施例能够加快程序的执行,提高程序的执行效率。
可选地,在第一方面的另一种实现方式中,当该第一核由工作状态转换到空闲状态后,该方法还包括:
该第一核向该TLB备用核列表中记录的核发送TLB释放指令,该TLB释放指令用于指示该备用核列表中记录的核解除TLB共享。
也就是说,Master变成空闲状态:释放所有已经获取的TLB资源。Master根据Slave List向所有Slave发送TLB释放请求,这样这些空闲的核的TLB资源可以被其他工作的核使用。Master释放所有已经获取的TLB资源后,该主核和从核均成为空闲的核,可以被其他工作的核使用。
因此,本发明实施例在第一核转换到空闲状态后,通过向所有的从核发送释放指令释放已获取的从核的TLB资源,通过这种方式,使得这些从核(空闲的核)的TLB资源释放,避免资源的浪费,进而该释放的核的TLB资源能够被其他的工作的核使用,提升其他工作核的容量,加快工作的核的程序执行。
其中,从核解除TLB共享后,该从核可以将自身存储的TLB表项删除,也可以不删除,本发明实施例并不对此做限定。例如,在解出共享后,在该从核又作为另一个工作的核的从核后,可以将TLB中的表项全部删除。以供该另一个工作的核存储替换下的表项。在例如,在解除共享后,在该从核成为工作状态时,可以保留之前存储的TLB表项,以供自身查找使用。
可选地,在第一方面的另一种实现方式中,该方法还包括:
当该第二核由空闲状态转换到工作状态后,该方法还包括:
该第一核接收该第二核发送的TLB解除共享请求,该TLB解除共享请求中携带有该第二核的标识;
在该TLB备用核列表中,该第一核将该第二核的标识删除。
因此,本发明实施例在第二核由空闲状态变为工作状态后,该第二核会向第一核发送TLB解除共享请求,进而使得第一核释放该第二核,使得第二核能够使用自身的TLB资源,避免了影响第二核正在的业务处理,同时,第二核还也可以成为另一个主核,可以使用其他的空闲核的TLB资源。
进一步地,在第一方面的另一种实现方式中,在该TLB备用核列表中,
该第一核将该第二核的标识删除之前,该方法还包括:
该第一核从该多核处理器处于空闲状态的核中确定第四核;
该第一核将该第二核的TLB中的所有表项拷贝到该第四核的TLB中。
也就是说,在主核(例如,第一核)从备用核列表中删除从核(例如,第二核)之前,会从多核处理器中重新获取一个核(新的从核)(例如,第四核),该新的从核的TLB用于存储删除的从核(即第二核)的TLB中所有的表项。这样在后续查询中如果需要查询第二核中的表项,不需要通过操作系统来重新获取这些表项,通过直接访问从核(第四核)的TLB就可以获取这些表项,因此大大降低了TLB重填的时延,加快了程序的执行。
因此,本发明实施例通过第一核获取新的从核(第四核),来保存释放的第二核内的TLB中的所有表项,在该第一核在后续查询中如果再次需要查询第二核中的表项,不需要通过操作系统从内存中重新获取这些表项,而可以直接通过访问从核(第四核)的TLB就可以获取这些表项,因此大大降低了TLB重填的时延,加快了程序的执行。
第二方面,提供了一种多核处理器,该多核处理器能够实现第一方面及其实现方式中的任一实现方式,该多核处理器包括第一核,该第一核内包括一个转址旁路缓存TLB,
该第一核用于接收第一地址转换请求,根据该第一地址转换请求查询该第一核内的TLB,在该第一核内的TLB中,确定和该第一地址转换请求对应的第一目标TLB表项缺失时,获取该第一目标TLB表项,在判断该第一核内的TLB中的表项存储已满时,从该多核处理器中处于空闲状态的核中确定第二核,将该第一目标TLB表项替换掉该第一核内的TLB中的第一表项;
该第二核用于将该第一表项存储在该第二核内的TLB中。
因此,本发明实施例,在第一核的TLB表项存储已满且第一目标TLB表项缺失,在获取到该第一目标TLB表项后替换第一核的TLB中的第一表项,并将第一表项存储在第二核的TLB中,由于将替换下来的表项存储在了第二核的TLB中。因此,本发明实施例通过利用空闲核的TLB资源来扩大工作的核的TLB容量,从而能够降低TLB缺失率,加快程序的执行。
由于工作的核即主核(Master)(例如,第一核)将被替换的TLB表项写入空闲节点即从核(Slave)(例如,第二核)的TLB中。当工作的核需要
再次用到这些被替换的TLB表项时,不需要通过操作系统来重新获取这些表项,通过直接访问Slave的TLB就可以获取这些表项,因此大大降低了TLB重填的时延,加快了程序的执行。
应注意,本发明实施例中一个空闲的核仅能将自身的TLB资源共享给一个工作的核使用;一个工作的核可以获取多核空闲的核的TLB资源存储TLB表项。
可选地,在第二方面的另一种实现方式中,在获取该第一目标TLB表项时,该第一核1110具体用于:
从内存的页表中获取该第一目标TLB表项,
或者,从该多核系统中的其他核获取该第一目标TLB表项。
例如,第一核可以向多核系统中的其他核广播TLB查询请求,并在该广播的TLB查询请求中携带导致缺失的虚拟地址,即第一目标TLB表项对应的虚拟地址,其他的核收到广播地址后,在本地的TLB中查找此虚拟地址,如果其中一个处理器核内的TLB命中,则可以将该第一目标TLB表项反馈给该第一核。
通过这种方式,使得第一核可以从其他的核中快速的获的该第一目标TLB表项,避免该第一核向操作系统发起查询请求,从内存中获取该第一目标TLB表项,能够节省一定的时间,提高了应用效率。
另外,第一核也可以是从内存的页表中获得该第一目标TLB表项,例如,第一核向操作系统发送查询请求,查询请求中携带导致缺失的虚拟地址,经过操作系统处理从内存的页表中获得该第一目标TLB表项。本发明实施例并不限于此。
进一步地,第二方面的另一种实现方式中,在从该多核处理器中处于空闲状态的核中确定第二核时,该第一核具体用于:
向该多核处理器中其他核中每个核发送状态查询请求,该状态查询请求用于查询该每个核是否处于空闲状态;
接收该每个核发送的响应消息,该响应消息用于指示该每个核是否处于空闲状态;
并根据该响应消息,从处于空闲状态的核中选择一个核作为该第二核。
因此,本发明实施例通过使用空闲核内的LTB资源来保存被替换下的TLB表项,不但提高了空闲核TLB资源的利用率,还间接的提高了第一核
的TLB的容量,减低了第一核通过内存来获取目标TLB表项的发生,加快程序的执行。
进一步地,第二方面的另一种实现方式中,在根据该响应消息,从处于空闲状态的核中选择一个核作为该第二核时,该第一核具体用于:
根据该响应消息确定空闲核列表,该空闲核列表中包括该多核处理器中除该第一核外的其他核中处于空闲状态的核;
在该空闲核列表中,选择与该第一核的通信开销最小的空闲核作为该第二核。
因此,本发明实施例中第一核选择与该第一核通信开销最小的空闲核作为第二核,并将替换下的TLB表项存储在第二核中,最大限度的降低通信开销。并且在需要查询第二核内的TLB的表项时,由于通信开销很小,所以第一核能够快速的查询到该TLB表项,提高程序的执行效率。
进一步地,第二方面的另一种实现方式中,在选择与该第一核的通信开销最小的空闲核作为该第二核时,该第一核具体用于:
将该空闲核列表中与该第一核的通信跳数最小的空闲核作为该第二核;或者,
将该空闲核列表中与该第一核的物理距离最小的空闲核作为该第二核。
可选地,第二方面的另一种实现方式中,在确定该第二核之后,该第一核还用于将该第二核的标识记录在该第一核内的TLB备用核列表中。
因此,本发明实施例,在第一核的TLB表项存储已满且第一目标TLB表项缺失,在获取到该第一目标TLB表项确定第二核,并将第二核记录在该第一核的备用核列表中,这样第一核根据备用核列表可以对备用核列表中的第二核内的TLB进行读写,进而提升第一核的TLB的容量,从而能够降低第一核的TLB缺失率,加快程序的执行。
可选地,第二方面的另一种实现方式中,该第一核还用于:
接收第二地址转换请求,根据该第二地址转换请求查询该第一核内的TLB;
在该第一核内的TLB中,确定和该第二地址转换请求对应的第二目标TLB表项缺失时,查询该第二核内的TLB;
当在该第二核内的TLB中查询到该第二目录TLB表项时,将该第二目标TLB表项替换掉该第一核内的TLB中的第二表项,
该第二核还用于将该第二表项存储在该第二核内的TLB中。
因此,本发明实施例中工作的核拥有空闲的核的TLB的读/写权限,通过使用空闲核内的LTB资源来保存被替换下的TLB表项,不但提高了空闲核TLB资源的利用率,更进一步提高了第一核的TLB的容量,在第一核内的TLB缺失表项时,可以通过读取从核内的TLB获取目标表项,也即减低了第一核通过内存来获取目标TLB表项的发生,能够加快程序的执行,提高程序的执行效率。
可选地,第二方面的另一种实现方式中,该第一核还用于:
接收第三地址转换请求,根据该第三地址转换请求查询该第一核内的TLB;
在该第一核内的TLB中,确定和该第三地址转换请求对应的第三目标TLB表项缺失时,查询该第二核内的TLB;
当在该第二核内的TLB中确定第三目标TLB表项缺失时,获取该第三目标TLB表项;
在判断该第一核内的TLB中的表项以及第二核内的TLB中的表项存储都已满时,从该多核处理器处于空闲状态的核中确定第三核;
将该第三目标TLB表项替换掉该第一核内的TLB中的第三表项,
该第三核用于将该第三表项存储在该第三核内的TLB中。
因此,本发明实施例在当前的从核存储已全满后,会获取新的从核来保存替换下的TLB表项,进一步扩大了该第一核的容量,当该第一核再次查询该替换下来的TLB表项时,该第一核可以直接从该新的从核中读取,无需该核通过内存来获取。因此,本发明实施例能够加快程序的执行,提高程序的执行效率。
可选地,第二方面的另一种实现方式中,当该第一核由工作状态转换到空闲状态后,该第一核还用于:
向该TLB备用核列表中记录的核发送TLB释放指令,该TLB释放指令用于指示该备用核列表中记录的核解除TLB共享。
因此,本发明实施例在第一核转换到空闲状态后,通过向所有的从核发送释放指令释放已获取的从核的TLB资源,通过这种方式,使得这些从核(空闲的核)的TLB资源释放,避免资源的浪费,进而该释放的核的TLB资源能够被其他的工作的核使用,提升其他工作核的容量,加快工作的核的
程序执行。
可选地,第二方面的另一种实现方式中,当该第二核由空闲状态转换到工作状态后,
所第二核用于向该第一核发送的TLB解除共享请求,该TLB解除共享请求中携带有该第二核的标识;
该第一核还用于接收该TLB解除共享请求,在该TLB备用核列表中,将该第二核的标识删除。
因此,本发明实施例在第二核由空闲状态变为工作状态后,该第二核会向第一核发送TLB解除共享请求,进而使得第一核释放该第二核,使得第二核能够使用自身的TLB资源,避免了影响第二核正在的业务处理,同时,第二核还也可以成为另一个主核,可以使用其他的空闲核的TLB资源。
可选地,第二方面的另一种实现方式中,该第一核还用于从该多核处理器处于空闲状态的核中确定第四核;
该第四核的TLB用于存储该第二核的TLB中的所有表项。
因此,本发明实施例通过第一核获取新的从核(第四核),来保存释放的第二核内的TLB中的所有表项,在该第一核后续查询中如果再次需要查询第二核中的表项,不需要通过操作系统从内存中重新获取这些表项,而可以直接通过访问从核(第四核)的TLB就可以获取这些表项,因此大大降低了TLB重填的时延,加快了程序的执行。
为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例中所需要使用的附图作简单地介绍。
图1是根据本发明一个实施例的多核处理器的结构示意图。
图2是根据本发明一个实施例的管理转址旁路缓存TLB的方法的示意流程图。
图3是根据本发明一个实施例的备用核列表向量的示意图。
图4是根据本发明另一实施例的备用核列表向量的示意图。
图5是根据本发明一个实施例的管理转址旁路缓存TLB的方法的示意图。
图6是根据本发明另一实施例的管理转址旁路缓存TLB的方法的示意
图。
图7是根据本发明另一实施例的管理转址旁路缓存TLB的方法的示意图。
图8是根据本发明另一实施例的管理转址旁路缓存TLB的方法的示意图。
图9是根据本发明另一实施例的管理转址旁路缓存TLB的方法的示意图。
图10是根据本发明另一实施例的管理转址旁路缓存TLB的方法的示意图。
图11是根据本发明一个实施例的多核处理器示意框图。
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部实施例。
本发明的技术方案,可以运行在包括例如,CPU、存储器管理单元(MMU,Memory Management Unit)、内存的硬件设备上,该硬件设备所运行的操作系统可以是各种通过线程或进程(包括多个线程)实现业务处理的操作系统,例如,Linux系统、Unix系统、Windows系统、Android系统、OS系统等。
应理解,以上列举的实时操作系统仅是示例性说明,本发明并未特别限定,只要本发明实施例的硬件设备具有多核处理器即可,本发明实施例并不限于此。
为了方便理解本发明实施例,首先在此对本发明实施例描述中的一些术语定义如下:
在本发明实施例中,术语“多核处理器”指的是包含了多个处理器核(Core)的处理器,具体可以表现为片上多核处理器,或者板上多核处理系统。其中,片上多核处理器是多个处理器核(Core)通过片上网络(Network On Chip,NOC)将各个处理器核互连并集成在一个芯片(Chip)上的处理器(Processor),板上多核处理系统指的是多个处理器核中每个核分别封装(Package)为处理器,并集成在电路板上所构成的处理系统。
在本发明实施例中,术语“处理器核”是“处理器核心”的简称,又称为内核或核,是CPU(Central Processing Unit)最重要的组成部分,它是由单晶硅以一定的生产工艺制造出来的,CPU所有的计算、接收命令或存储命令、处理数据都由处理器核执行。术语中“多处理器核”指的是包含至少两个处理器核,“多处理器核”涵盖了现有技术中的多核(Multi-Core),以及众核(Many Core)所应用的范围。
在本发明实施例中,TLB也可以称为页表缓冲,里面存放的是一些页表文件即虚拟地址到物理地址的转换表项。TLB可以用于虚拟地址与物理地址之间的交互,提供一个寻找物理地址的缓存区,能够有效减少内核寻找物理地址所消耗时间。
在本发明实施例中,“主核(Master)”表示处于工作状态且能够使用其他空闲核的TLB资源管理TLB表项的核。“从核(Slave)”表示处于空闲状态且能为主核共享自身TLB资源的核。
在本发明实施例中,“TLB缺失”表示核的TLB中不存在地址转换请求所对应的TLB表项。“TLB命中”表示核中的TLB存在地址转换请求所对应的TLB表项。
“TLB替换”表示主核中的一个TLB表项和从核中的与地址转换请求所对应的TLB表项互换位置。例如,如图6所示,将从核中与地址转换请求所对应的TLB表项,即图6中的从核内的TLB中的“命中”TLB表项与主核的TLB中的“替换”TBL表项互换位置。
还应理解,在本发明实施例中,第一、第二、第三和第四只是为了区分核,而不应该对本发明的保护范围构成任何限定,例如,第一核也可以称为第二核,第二核可以称为第一核等。
应注意,本发明实施例中,需要根据实际情况的变化,第四核可以与第二核或第三核是相同的核,也可以是不同的核,本发明实施例并不对此做限定。
在多核系统中,通常采用较多的处理器核来提高并行处理性能。但是应用程序中的一些部分并行度较低,导致系统中一些处理器核没有任务可以执行,成为空闲的核(也可以称为空闲状态的核或空闲节点)。本发明实施例就是将空闲的核内的TLB资源动态地分配给正在执行任务的工作的核(也可以称为工作状态的核或工作节点),扩大工作的核的TLB容量,减少TLB
缺失,最终达到加快程序运行的目的。工作的核可以获得一个或多个空闲的核的TLB资源,来满足其TLB访问需求。这里获得TLB资源的工作的核被称为主核(Master),而提供TLB资源的空闲的核称为从核(Slave)。在TLB资源的使用过程中,经常访问的TLB表项位于主核中,不常访问的TLB表项位于从核中。
应理解,本发明实施例中的多核处理包括的“核”还可以称为“节点”,也即本文中的“节点”可以与多核处理器中包括的“核”等同。本发明实施例中的多核处理器包括多个核,也可以称为多核处理器包括多个节点,本发明实施例并不对此做限定。
应理解本法发明实施例中的多核处理器可以包括至少两个核。例如可以包括2个核、4个核、8个核、16个核、32个核等,本发明实施例并不限于此。下面结合图1描述本发明实施例的多核处理器的基本结构。
如图1所示的多核处理器包括16个核(Core),分别为Core0-Core15,每个核包括:
处理模块;
缓存模块,例如缓存模块包括一级缓存(L1)和二级缓存(L2);
和片上网络接口(Switch)。
每个核中的处理模块内包含有一个TLB。其中,各个核之间通过片上网络连接,通过片上网络接口互相通信。水平或垂直相邻的两个核通过之间的链路通信,可以称为一跳。例如,Core1与Core3通信的通信路径至少需要三跳,即Core1-Core2-Core3。
下面将结合图2至图10详细描述本发明实施例的管理TLB的方法。
图2是根据本发明一个实施的管理TLB的方法的示意流程图。图2所示的方法可以由第一核执行。具体地,图2所示的方法200应用于多核处理器,该多核处理器包括第一核,该第一核包含一个TLB,应理解,该第一核可以是该多核处理器中的任意一个核,例如以图1所示的多核处理器100而言,第一核可以是Core0-Core15中的任意一个,本发明实施例并不对此做限定。具体地,图2所示的方法200包括:
210,第一核接收第一地址转换请求,根据该第一地址转换请求查询该第一核内的TLB;
换句话说,第一核接收到第一地址转换请求后,查询该第一核内的TLB
中是否存在与该第一地址转换请求对应的TLB表项。
220,在该第一核内的TLB中,该第一核确定和该第一地址转换请求对应的第一目标TLB表项缺失时,获取该第一目标TLB表项;
具体地,该第一核可以是从多核处理器中的其他核中获得该第一目标TLB表项,例如,第一核可以向多核系统中的其他核广播TLB查询请求,并在该广播的TLB查询请求中携带导致缺失的虚拟地址,即第一目标TLB表项对应的虚拟地址,其他的核收到广播地址后,在本地的TLB中查找此虚拟地址,如果其中一个处理器核内的TLB命中,则可以将该第一目标TLB表项反馈给该第一核。
通过这种方式,使得第一核可以从其他的核中快速的获的该第一目标TLB表项,避免该第一核向操作系统发起查询请求,从内存中获取该第一目标TLB表项,能够节省一定的时间,提高了应用效率。
另外,第一核也可以是从内存的页表中获得该第一目标TLB表项,例如,第一核向操作系统发送查询请求,查询请求中携带导致缺失的虚拟地址,经过操作系统处理从内存的页表中获得该第一目标TLB表项。本发明实施例并不限于此。
230,在判断该第一核内的TLB中的表项存储已满时,该第一核从该多核处理器中处于空闲状态的核中确定第二核。
也就是说,在第一核(也可以称为工作的核或工作节点)发生TLB缺失(Miss),并且本地TLB中的表项存储已满时,为了存储获取的第一目标TLB表项,该第一核需要用第一目标TLB表项替换一个有效TLB表项。在这种情况下,第一核试图获取更多TLB资源,来保存被替换下的TLB表项,因此,第一核需要从空闲核中确定第二核。
因此,本发明实施例通过使用空闲核内的LTB资源来保存被替换下的TLB表项,不但提高了空闲核TLB资源的利用率,还间接的提高了第一核的TLB的容量,减低了第一核通过内存来获取目标TLB表项的发生,加快程序的执行。
具体地,作为另一实施例,该第一核从该多核处理器中处于空闲状态的核中确定第二核,包括:
该第一核向该多核处理器中其他核中每个核发送状态查询请求,该状态查询请求用于查询该每个核是否处于空闲状态;
该第一核接收该其他核中每个核发送的响应消息,该响应消息用于指示该每个核是否处于空闲状态;
该第一核根据该响应消息,从处于空闲状态的核中选择一个核作为该第二核。
进一步地,该第一核根据该响应消息,从处于空闲状态的核中选择一个核作为该第二核,包括:
该第一核根据该响应消息确定空闲核列表,该空闲核列表中包括该多核处理器中除该第一核外的其他核中处于空闲状态的核;
在该空闲核列表中,选择与该第一核的通信开销最小的空闲核作为该第二核。
因此,本发明实施例中第一核选择与该第一核通信开销最小的空闲核作为第二核,并将替换下的TLB表项存储在第二核中,最大限度的降低通信开销。并且在需要查询第二核内的TLB的表项时,由于通信开销很小,所以第一核能够快速的查询到该TLB表项,提高程序的执行效率。
本领域的技术人员可以理解,基于与第一核的通信开销最小作为选择第二核的原则是一种较为理想的选择方式,在具体实现中,通信开销的判断需要结合多处理器芯片中的片上网络NoC(Network on Chip)路由的拥塞程度来决定,基于网络拥塞程度从空闲核中选择某一处理器核作为第二核,为了简化第二核的选择过程,可采用如下的实现方式:
(1)该第一核将该空闲核列表中与该第一核的通信跳数最小的空闲核作为该第二核;或者,
(2)该第一核将该空闲核列表中与该第一核的物理距离最小的空闲核作为该第二核。
其中,本发明实施例中还可以采用其他的方式确定通信开销最小的核,本发明实施例并不限于此。
例如,以图1的多核处理器为例,针对上述实现方式(1),假设第一核为Core5,空闲核列表中包括的空闲核为Core7、Core11和Core14,由于Core5至Core7的最少通信跳数为2跳,即通信路径为Core5-Core6-core7;Core5至Core11的最少通信跳数为3跳,例如其中一个最少跳数的通信路径为Core5-Core6-core7-Core11;Core5至Core14的通最少信跳数为3跳,例如其中一个最少跳数的通信路径为Core5-Core9-core13-Core14。因此,Core5选
择与其通信跳数最小的Core7作为上述第二核。
针对上述实现方式(2)根据图1中的多核处理器的分布可知,由于Core7、Core11和Core14中与Core5的物理距离最小的核为Core7,因此,Core5选择Core7作为上述第二核。
具体而言,第一核(这里可以称为请求节点)向多核处理器中其他核(也可以称为其他节点)广播状态查询请求(也可以称为TLB查询请求);该状态查询请求用于查询该每个核是否处于空闲状态;所每个核收到状态查询请求之后,会向第一核(请求节点)发送响应消息,该响应消息用于指示该核是否处于空闲状态。这样第一核根据响应消息就获得了空闲核列表。若空闲核列表为空,则结束对TLB资源的获取,则按照现有的方式读取内存中的该第一核缺失的TLB表项(即第一目标TLB表项)。若空闲核列表非空,则进行TLB资源的获取,即第一核根据和空闲核列表中空闲的核之间的通信开销,选取通信开销最小的空闲核,向其发送TLB共享请求。若此时这个空闲核已经被其他节点共享,或其变为工作状态,则向请求节点发送失败反馈,将此节点从空闲节点列表中移除,然后重复上述过程。若此时这个空闲核是空闲状态,则确定该空闲核为第二核。
因此,本发明实施例中第一核选择与该第一核通信开销最小的空闲核作为第二核,并将替换下的TLB表项存储在第二核中,最大限度的降低通信开销。并且在需要查询第二核内的TLB的表项时,由于通信开销很小,所以第一核能够快速的查询到该TLB表项,提高程序的执行效率。
240,该第一核将该第一目标TLB表项替换掉该第一核内的TLB中的第一表项,并将该第一表项存储在该第二核内的TLB中。
例如,该第一核将第一目标TLB表项替换掉该第一核的TLB中的第一表项位置中的第一表项,并将该第一表项存储在第二核的TLB中。
应理解,第一表项位置可以为主核的TLB中任意一个表项位置,本发明实施例并不对此做限定。换句话说,第一表项可以为第一核的TLB中的任意一个TLB表项,例如,第一表项可以是第一核的TLB中的第一个表项、最后一个表项或者中间的一个表项,本发明实施例并不对此做限定。
也就是说,在第一核的TLB表项已存储满,且第一目标TLB表项缺失时,该第一核需要将第一目标TLB表项存储在该第一核的TLB的第一表项位置,并将由第一表项位置替换下来的第一表项存储在第二核的TLB中。
在本发明实施例中,可以将第一核称为第二核的主核,可以将第二核称为第一核的从核(或备用核)。
应理解,主核(Master)将TLB表项写入从核(Slave)的TLB中。这个过程发生在Master中的TLB表项被替换的情况下,如图2所示,当Master中获得第一目标TLB表项后,由于主核的TLB表项已存储满,因此,需要将该第一目标TLB表项重填到第一表项位置,也即用第一目标TLB表项将第一表项位置中的第一表项替换下来,然后将第一表项存储在从核(Slave)(例如,第二核)的TLB中。
其中,在将被替换表项写入Slave的TLB的过程中,由Slave来保存写入位置。本发明实施例中可以采用轮流写入机制,即如果从核的TLB表项数为N,则从第一个表项开始,即从0开始依次写入直到N-1个表项,即直到表项全满,在从核(例如,第二核)的表项存储已满时的TLB的处理方法,将在下文中详细描述。
因此,本发明实施例,在第一核的TLB表项存储已满且第一目标TLB表项缺失,在获取到该第一目标TLB表项后替换第一核的TLB中的第一表项,并将第一表项存储在第二核的TLB中,由于将替换下来的表项存储在了第二核的TLB中。因此,本发明实施例通过利用空闲核的TLB资源来扩大工作的核的TLB容量,从而能够降低TLB缺失率,加快程序的执行。
由于工作的核即主核(Master)(例如,第一核)将被替换的TLB表项写入空闲节点即从核(Slave)(例如,第二核)的TLB中。当工作的核需要再次用到这些被替换的TLB表项时,不需要通过操作系统来重新获取这些表项,通过直接访问Slave的TLB就可以获取这些表项,因此大大降低了TLB重填的时延,加快了程序的执行。
应注意,本发明实施例中一个空闲的核仅能将自身的TLB资源共享给一个工作的核使用;一个工作的核可以获取多核空闲的核的TLB资源存储TLB表项。
可选地,作为另一实施例,在确定该第二核之后,该方法还可以包括:
将该第二核的标识记录在该第一核内的TLB备用核列表中。
这里备用核列表也可以称为从核列表。
因此,本发明实施例,在第一核的TLB表项存储已满且第一目标TLB表项缺失,在获取到该第一目标TLB表项确定第二核,并将第二核记录在
该第一核的备用核列表中,这样第一核根据备用核列表可以对备用核列表中的第二核内的TLB进行读写,进而提升第一核的TLB的容量,从而能够降低第一核的TLB缺失率,加快程序的执行。
在第一核从空闲核列表中确定出第二核后,第二核成为第一核的从核(Slave)(也可以称为备用核)。该从核(第二核)将第一核的标识(例如,编号等)写入该从核的主核(Master)编号寄存器中,第一核成为第二核的主核(Master),并将该从核(第二核)的标识(例如编号等)加入其备用核列表中。
例如,本发明实施例中可以在备用核列表中记录所有从核的标识。或者,本发明实施例中还可以用向量形式记录当前Master的所有Slave,例如,如图3所示,多核处理器中共有4个核,那么备用核列表中记录的向量可以为4个比特位,第一到第四个比特位分别代表第一个核到第四个核,即Core0-core3。例如,本发明实施例中的第一核可以是图3中的Core3,表示该第一核的备用核列表的向量可以为0100,其中,0表示不是第一核的从核,1表示为第一核的从核,那么根据备用核列表中的向量可知,4个核中的第二个核为第一核的从核。
再例如,如图4所示,针对图1中的多核处理器为例,备用核列表可以为16个比特位的向量,其中从左之右的16个比特分别代表Core0-Core15。以第一核为core5而言,第一核的备用核列表的向量例如可以为000000100000000;由于第7位为1,因此,与第7位对应的核Core6为Core5的从核。
在实际应用中,主核获取到地址转换请求后,当主核本地的TLB表项缺失时,主核会读取从核中的TLB表项。下面描述主核读取从核中的TLB表项的具体过程。相应的作为另一实施例,在该第一核从该多核处理器处于空闲状态的核中确定第二核之后,该方法还可以包括:
该第一核接收第二地址转换请求,根据该第二地址转换请求查询该第一核内的TLB;
在该第一核内的TLB中,该第一核确定和该第二地址转换请求对应的第二目标TLB表项缺失时,查询该第二核内的TLB;
当在该第二核内的TLB中查询到该第二目录TLB表项时,将该第二目标TLB表项替换掉该第一核内的TLB中的第二表项,并将该第二表项存储
在该第二核内的TLB中。
例如,第一核将第二表项存储在第二核中原第二目标TLB表项的位置。也就是说,第一核将第二表项与第二目标TLB表项互换存储位置。
应理解,本实施例仅以第一核读取第二核的TLB中的第二目标表项为例进行说明,当第一核具有多个从核时,第一核可以向所有的从核发送读取第二目标表项的请求,第二目标表项也可以位于其他从核的TLB中,相应的读取与替换表项的过程与上述描述类似,这里不再赘述。
上述过程发生在Master(第一核)本地TLB缺失(Miss),即本地TLB不存在第二目标TLB表项的情况下,Master会读取Slave中的TLB表项。例如,如图5所示。Master根据备用核列表(Slave List)(也称为从核列表),向所有的Slave发送TLB读取请求。Slave接收到TLB读取请求之后,查询从核本地的TLB,如果第二目标TLB Miss,则返回缺失反馈;如果命中,则返回命中反馈,以及命中TLB的表项内容。Master收集所有反馈之后,若都是缺失反馈,则向操作系统发送TLB查询请求,进而从内存中获取该缺失的TLB表项;若有Slave命中,则利用命中TLB表项进行重填。当在重填过程若发生表项替换的话,则将被替换表项写到命中的从核中。
因此,本发明实施例中工作的核拥有空闲的核的TLB的读/写权限,通过使用空闲核内的LTB资源来保存被替换下的TLB表项,不但提高了空闲核TLB资源的利用率,更进一步提高了第一核的TLB的容量,在第一核内的TLB缺失表项时,可以通过读取从核内的TLB获取目标表项,也即减低了第一核通过内存来获取目标TLB表项的发生,能够加快程序的执行,提高程序的执行效率。
前面已经说明,在将被替换表项写入Slave的TLB的过程中,由Slave来保存写入位置。本发明实施例中可以采用轮流写入机制,即如果从核(例如第二核)的TLB表项数为N,则从第一个表项开始,即从0开始依次写入直到N-1。
当N-1位置TLB表项也被写入,即从核的TLB表项已存储满,表明从核的TLB中保存的都是Master的被替换表项,这里可以称此Slave为满从核(Full Slave);那么该Slave将无法在保持其他替换下来的表项,Full Slave会向Master发送写溢出请求,Master收到请求之后用Full Slave列表来记录该Full Slave。
这时当Master替换下表项后,需要获取其他的从核用以存储该替换下来的表项。
即在Master中的有效TLB表项被替换,并且备用核列表中的所有Slave都是Full Slave的情况时,由于Full Slave的TLB中保存的都是Master的被替换表项,这些表项后续可能会被Master用到,因此,Master希望获取更多的Slave来保存被替换表项。
相应地,作为另一实施例,该方法还可以包括:
该第一核接收第三地址转换请求,根据该第三地址转换请求查询该第一核内的TLB;
在该第一核内的TLB中,该第一核确定和该第三地址转换请求对应的第三目标TLB表项缺失时,查询该第二核内的TLB;
当在该第二核内的TLB中确定第三目标TLB表项缺失时,获取该第三目标TLB表项;
在判断该第一核内的TLB中的表项以及第二核内的TLB中的表项存储都已满时,该第一核从该多核处理器处于空闲状态的核中确定第三核;
该第一核将该第三目标TLB表项替换掉该第一核内的TLB中的第三表项,并将该第三表项存储在该第三核内的TLB中。
因此,本发明实施例在当前的从核存储已全满后,会获取新的从核来保存替换下的TLB表项,进一步扩大了该第一核的容量,当该第一核再次查询该替换下来的TLB表项时,该第一核可以直接从该新的从核中读取,无需该核通过内存来获取。因此,本发明实施例能够加快程序的执行,提高程序的执行效率。
具体而言,在主核(第一核)的TLB中的表项已存储满,且该主核的所有从核(例如这里只有一个从核,为第二核)中的TLB表项已存储时,该主核(第一核)从该主核从多核处理器中的其他空闲核中确定第三核用于存储被第三目标TLB表项替换下来的第一核的TLB中的第三表项。
确定第三核的过程可以参照确定第二核的过程,此处不再赘述。
例如,如图7所示。在Master已有的Slave都是Full Slave,如果继续将被替换表项写入Full Slave的TLB中,会覆盖之前的被替换表项。为了避免这种情况,Master会重复上文中描述的Slave获取过程,获得图中新的Slave1(例如为第三核),并将被替换表项(例如为第三表项)写入Slave 1中。
上文中描述了主核确定从核以及使用从核的资源来存储TLB表项的方案。在当主核从工作状态转变到空闲状态后,由于不再使用TLB资源,因此,需要释放上述得到的从核资源。
相应地,作为另一实施例,当该第一核由工作状态转换到空闲状态后,该方法还包括:
该第一核向该TLB备用核列表中记录的核发送TLB释放指令,该TLB释放指令用于指示该备用核列表中记录的核解除TLB共享。
也就是说,Master变成空闲状态:释放所有已经获取的TLB资源。Master根据Slave List向所有Slave发送TLB释放请求,这样这些空闲的核的TLB资源可以被其他工作的核使用。例如,如图8所示,Master释放所有已经获取的TLB资源后,该主核和从核均成为空闲的核,可以被其他工作的核使用。
因此,本发明实施例在第一核转换到空闲状态后,通过向所有的从核发送释放指令释放已获取的从核的TLB资源,通过这种方式,使得这些从核(空闲的核)的TLB资源释放,避免资源的浪费,进而该释放的核的TLB资源能够被其他的工作的核使用,提升其他工作核的容量,加快工作的核的程序执行。
其中,从核解除TLB共享后,该从核可以将自身存储的TLB表项删除,也可以不删除,本发明实施例并不对此做限定。例如,在解出共享后,在该从核又作为另一个工作的核的从核后,可以将TLB中的表项全部删除。以供该另一个工作的核存储替换下的表项。在例如,在解除共享后,在该从核成为工作状态时,可以保留之前存储的TLB表项,以供自身查找使用。
相类似地,当Slave(例如,第二核)变成运行状态:Slave会向Master发送TLB解除共享请求,Master接收到请求之后将其从Slave List中去除。例如,如图9所示,当Slave1变成运行状态后,主核会释放该Slave1,并从从核列表中删除。
相应地,作为另一实施例,该第二核由空闲状态转换到工作状态后,该方法还包括:
该第一核接收该第二核发送的TLB解除共享请求,该TLB解除共享请求中携带有该第二核的标识;
在该TLB备用核列表中,该第一核将该第二核的标识删除。
因此,本发明实施例在第二核由空闲状态变为工作状态后,该第二核会向第一核发送TLB解除共享请求,进而使得第一核释放该第二核,使得第二核能够使用自身的TLB资源,避免了影响第二核正在的业务处理,同时,第二核还也可以成为另一个主核,可以使用其他的空闲核的TLB资源。
进一步地,作为另一实施例,在该TLB备用核列表中,该第一核将该第二核的标识删除之前,该方法还包括:
该第一核从该多核处理器处于空闲状态的核中确定第四核;
该第一核将该第二核的TLB中的所有表项拷贝到该第四核的TLB中。
也就是说,在主核(例如,第一核)从备用核列表中删除从核(例如,第二核)之前,会从多核处理器中重新获取一个核(新的从核),该新的从核的TLB用于存储删除的从核(即第二核)的TLB中所有的表项。这样在后续查询中如果需要查询第二核中的表项,不需要通过操作系统来重新获取这些表项,通过直接访问从核(第四核)的TLB就可以获取这些表项,因此大大降低了TLB重填的时延,加快了程序的执行。
确定第四核的过程可以参照确定第二核的过程,此处不再赘述。
例如,如图10所示,当Slave1变成运行状态后,主核会释放该Slave1,并从从核列表中删除。同时,主核会重新确定一个从核,例如Slave 2,将Slave 1的TLB中的所有表项存储在Slave 2的TLB中,并在从核列表中记录该Slave 2。
因此,本发明实施例通过第一核获取新的从核(第四核),来保存释放的第二核内的TLB中的所有表项,在该第一核在后续查询中如果需要再次查询第二核中的表项,不需要通过操作系统从内存中重新获取这些表项,而可以直接通过访问从核(第四核)的TLB就可以获取这些表项,因此大大降低了TLB重填的时延,加快了程序的执行。
需要说明的是,本发明实施例中的该多核处理器中的每一个核均设置有标志位寄存器,该标志位寄存器用于记录该核的状态标志位、主核(Master)标志位和从核(Slave)标志位;
该状态标志位用于表示该核的运行状态和共享状态,该运行状态包括空闲状态或工作状态,该共享状态包括主核状态、从核状态或不参与共享状态,该主核状态表示该核处于工作状态且使用其他空闲核的TLB资源管理TLB表项,该从核状态表示该核处于空闲状态且为主核共享TLB资源;
该主核标志位用于表示在该核为主核时的空闲核列表、从核(Slave)列表(也称为备用核列表)和全满的从核(Full Slave)列表,该空闲核列表包括用于表示所有空闲核的向量,该从核列表包括用于表示该核的所有从核的向量,该全满的从核列表包括用于表示该核的所有满从核的向量;
该从核标志位用于表示在该核为从核时的主核(Master)编号和替换表项的写入位置,该主核编号包括该核的唯一主核的标识,该替换表项的写人位置包括从主核中被替换表项在从核中的写入位置。
具体而言,为了实现本发明提出的TLB资源共享方法,需要在每个节点中增加一些寄存器来保存标志位,包括状态标志位,Master标志位和Slave标志位。由于每个节点都可能成为Master或Slave,所以每个节点中都配备了这三种标志位。例如如表1所示。
表1每个节点中的标志位的描述
下面对各个标志位进行详细的阐述:
运行状态位:区分空闲/工作节点,所以可以只需要1bit位宽的寄存器。
共享状态位:区分Master/Slave/不参与共享节点,所以可以需要2bits位宽的寄存器。
空闲节点/Slave/Full Slave列表:功能在上文中已经有过详细描述。在寄存器实现中采用向量的方式,向量的宽度等于系统中节点的数量,向量中每一比特(bit)对应一个节点。以Slave列表为例,向量中某一比特为0表示对应节点不是Slave,为1表示对应节点是Slave。
Master编号:Slave只能有唯一的Master,对Slave而言,Master编号中记录了其Master的编号;当Slave变成运行状态时,可以根据Master编号来通知对应的Master。Master编号的二进制位宽可以为即为大于或等于log2(节点数)的最小整数,例如,当多核处理器包括8个核(8个节点)时,表示Master编号需要即3个比特。在例如,当多核处理器包括12个核(12个节点)时,表示Master编号需要即3个比特。
写入位置:由Slave来进行维护,表示接收到Master被替换表项时的写入位置。其二进制位宽可以为即大于或等于log2(TLB表项数)的最小整数。例如,当Slave内的TBL的表项为64个,那么表示写入位置需要即6个比特。
因此,本发明实施例通过标志位的设置,实现了主核对从核内的TLB资源进行读写,进而扩大了主核的TLB的容量,也即减低了第一核通过内存来获取目标TLB表项的发生,能够加快程序的执行,提高程序的执行效率。
上文中结合图1至图10详细描述本发明实施例的管理TLB的方法。下面将结合图11详细描述本发明实施例的多核处理器。
图11是根据本发明一个实施例的多核处理器1100的示意框图。
如图11所示,该多核处理器1100包括第一核1110和第二核1120。可选地,还可以包括第三核1130,可选的,还可以包括第四核1140。如图1类似,该多核处理器1100中的每一个核都包括处理模块;缓存模块,例如缓存模块包括一级缓存(L1)和二级缓存(L2);和片上网络接口(Switch)。每个核中的处理模块内包含有一个TLB。其中,各个核之间通过片上网络连接,通过片上网络接口互相通信。
应理解,本发明实施例中的多核处理器1100还可以包括更多的核,例如,多核处理器1100可以包括8个核、10核、16个核、32个核等,本发明实施例并不对此做限定。
还应理解,本发明实施例中,第一核可以是多核处理器中的任意一个核,为了表示方便,在图1中以第一个核作为第一核进行描述,但本发明实施例并不限于此。还应理解,在实际应用中,第二核也可以是多核处理器中除第一核之外的任意一个核,也就是说第二核可能不直接与第一核相连,这里为表示的方便,在图11中以第二核和第一核直接相连为例进行描述,但本发明实施例并不限于此。类似地,下文中的第三核和第四核在图11中为了展示的方便,直接与第一核或第二核直接相连,但本发明实施例并不限于此。
应理解图11所示的多核处理器与图1至图10方法实施例相对应,图11中的多核处理器1100能够实现图1至图10涉及的方法的各个过程,为了避免重复,本发明实施例适当省略详细描述。
具体而言,该第一核1110用于接收第一地址转换请求,根据该第一地址转换请求查询该第一核内的TLB;
在该第一核内的TLB中,确定和该第一地址转换请求对应的第一目标TLB表项缺失时,获取该第一目标TLB表项;
在判断该第一核内的TLB中的表项存储已满时,从该多核处理器中处于空闲状态的核中确定第二核;将该第一目标TLB表项替换掉该第一核内的TLB中的第一表项。
该第二核820用于将该第一表项存储在该第二核内的TLB中。
应理解,本发明实施例中的第一核可以称为第二核的主核,第二核可以称为第一核的从核。
因此,本发明实施例,在第一核的TLB表项存储已满且第一目标TLB表项缺失,在获取到该第一目标TLB表项后替换第一核的TLB中的第一表项,并将第一表项存储在第二核的TLB中,由于将替换下来的表项存储在了第二核的TLB中。因此,本发明实施例通过利用空闲核的TLB资源来扩大工作的核的TLB容量,从而能够降低TLB缺失率,加快程序的执行。
由于工作的核即主核(Master)(例如,第一核)将被替换的TLB表项写入空闲节点即从核(Slave)(例如,第二核)的TLB中。当工作的核需要再次用到这些被替换的TLB表项时,不需要通过操作系统来重新获取这些
表项,通过直接访问Slave的TLB就可以获取这些表项,因此大大降低了TLB重填的时延,加快了程序的执行。
应注意,本发明实施例中一个空闲的核仅能将自身的TLB资源共享给一个工作的核使用;一个工作的核可以获取多核空闲的核的TLB资源存储TLB表项。
可选地,在获取该第一目标TLB表项时,该第一核1110具体用于:
从内存的页表中获取该第一目标TLB表项,
或者,从该多核系统中的其他核获取该第一目标TLB表项。
例如,第一核可以向多核系统中的其他核广播TLB查询请求,并在该广播的TLB查询请求中携带导致缺失的虚拟地址,即第一目标TLB表项对应的虚拟地址,其他的核收到广播地址后,在本地的TLB中查找此虚拟地址,如果其中一个处理器核内的TLB命中,则可以将该第一目标TLB表项反馈给该第一核。
通过这种方式,使得第一核可以从其他的核中快速的获的该第一目标TLB表项,避免该第一核向操作系统发起查询请求,从内存中获取该第一目标TLB表项,能够节省一定的时间,提高了应用效率。
另外,第一核也可以是从内存的页表中获得该第一目标TLB表项,例如,第一核向操作系统发送查询请求,查询请求中携带导致缺失的虚拟地址,经过操作系统处理从内存的页表中获得该第一目标TLB表项。本发明实施例并不限于此。
进一步地,在从该多核处理器中处于空闲状态的核中确定第二核时,该第一核1110具体用于:
向该多核处理器中其他核中每个核发送状态查询请求,该状态查询请求用于查询该每个核是否处于空闲状态;
接收该每个核发送的响应消息,该响应消息用于指示该每个核是否处于空闲状态;
并根据该响应消息,从处于空闲状态的核中选择一个核作为该第二核。
因此,本发明实施例通过使用空闲核内的LTB资源来保存被替换下的TLB表项,不但提高了空闲核TLB资源的利用率,还间接的提高了第一核的TLB的容量,减低了第一核通过内存来获取目标TLB表项的发生,加快程序的执行。
进一步地,在根据该响应消息,从处于空闲状态的核中选择一个核作为该第二核时,该第一核1110具体用于:
根据该响应消息确定空闲核列表,该空闲核列表中包括该多核处理器中除该第一核外的其他核中处于空闲状态的核;
在该空闲核列表中,选择与该第一核的通信开销最小的空闲核作为该第二核。
因此,本发明实施例中第一核选择与该第一核通信开销最小的空闲核作为第二核,并将替换下的TLB表项存储在第二核中,最大限度的降低通信开销。并且在需要查询第二核内的TLB的表项时,由于通信开销很小,所以第一核能够快速的查询到该TLB表项,提高程序的执行效率。
进一步地,在选择与该第一核的通信开销最小的空闲核作为该第二核时,该第一核1110具体用于:
将该空闲核列表中与该第一核的通信跳数最小的空闲核作为该第二核;或者,
将该空闲核列表中与该第一核的物理距离最小的空闲核作为该第二核。
可选地,在确定该第二核之后,该第一核1110还用于将该第二核的标识记录在该第一核内的TLB备用核列表中。
因此,本发明实施例,在第一核的TLB表项存储已满且第一目标TLB表项缺失,在获取到该第一目标TLB表项确定第二核,并将第二核记录在该第一核的备用核列表中,这样第一核根据备用核列表可以对备用核列表中的第二核内的TLB进行读写,进而提升第一核的TLB的容量,从而能够降低第一核的TLB缺失率,加快程序的执行。
可选地,该第一核1110还用于:
接收第二地址转换请求,根据该第二地址转换请求查询该第一核内的TLB;
在该第一核内的TLB中,确定和该第二地址转换请求对应的第二目标TLB表项缺失时,查询该第二核内的TLB;
当在该第二核内的TLB中查询到该第二目录TLB表项时,将该第二目标TLB表项替换掉该第一核内的TLB中的第二表项,
该第二核1120还用于将该第二表项存储在该第二核内的TLB中。
因此,本发明实施例中工作的核拥有空闲的核的TLB的读/写权限,通
过使用空闲核内的LTB资源来保存被替换下的TLB表项,不但提高了空闲核TLB资源的利用率,更进一步提高了第一核的TLB的容量,在第一核内的TLB缺失表项时,可以通过读取从核内的TLB获取目标表项,也即减低了第一核通过内存来获取目标TLB表项的发生,能够加快程序的执行,提高程序的执行效率。
可选地,该第一核1110还用于:
接收第三地址转换请求,根据该第三地址转换请求查询该第一核内的TLB;
在该第一核内的TLB中,确定和该第三地址转换请求对应的第三目标TLB表项缺失时,查询该第二核内的TLB;
当在该第二核内的TLB中确定第三目标TLB表项缺失时,获取该第三目标TLB表项;
在判断该第一核内的TLB中的表项以及第二核内的TLB中的表项存储都已满时,从该多核处理器处于空闲状态的核中确定第三核1130;
将该第三目标TLB表项替换掉该第一核内的TLB中的第三表项,
该第三核用于将该第三表项存储在该第三核内的TLB中。
因此,本发明实施例在当前的从核存储已全满后,会获取新的从核来保存替换下的TLB表项,进一步扩大了该第一核的容量,当该第一核再次查询该替换下来的TLB表项时,该第一核可以直接从该新的从核中读取,无需该核通过内存来获取。因此,本发明实施例能够加快程序的执行,提高程序的执行效率。
可选地,当该第一核由工作状态转换到空闲状态后,该第一核还用于:
向该TLB备用核列表中记录的核发送TLB释放指令,该TLB释放指令用于指示该备用核列表中记录的核解除TLB共享。
因此,本发明实施例在第一核转换到空闲状态后,通过向所有的从核发送释放指令释放已获取的从核的TLB资源,通过这种方式,使得这些从核(空闲的核)的TLB资源释放,避免资源的浪费,进而该释放的核的TLB资源能够被其他的工作的核使用,提升其他工作核的容量,加快工作的核的程序执行。
可选地,当该第二核由空闲状态转换到工作状态后,
所第二核用于向该第一核发送的TLB解除共享请求,该TLB解除共享
请求中携带有该第二核的标识;
该第一核还用于接收该TLB解除共享请求,在该TLB备用核列表中,将该第二核的标识删除。
因此,本发明实施例在第二核由空闲状态变为工作状态后,该第二核会向第一核发送TLB解除共享请求,进而使得第一核释放该第二核,使得第二核能够使用自身的TLB资源,避免了影响第二核正在的业务处理,同时,第二核还也可以成为另一个主核,可以使用其他的空闲核的TLB资源。
可选地,该第一核还用于从该多核处理器处于空闲状态的核中确定第四核1140;
该第四核的TLB用于存储该第二核的TLB中的所有表项。
因此,本发明实施例通过第一核获取新的从核(第四核),来保存释放的第二核内的TLB中的所有表项,在该第一核在后续查询中如果再次需要查询第二核中的表项,不需要通过操作系统从内存中重新获取这些表项,而可以直接通过访问从核(第四核)的TLB就可以获取这些表项,因此大大降低了TLB重填的时延,加快了程序的执行。
需要说明的是,本发明实施例中的该多核处理器中的每一个核均设置有标志位寄存器,该标志位寄存器用于记录该核的状态标志位、主核(Master)标志位和从核(Slave)标志位;
该状态标志位用于表示该核的运行状态和共享状态,该运行状态包括空闲状态或工作状态,该共享状态包括主核状态、从核状态或不参与共享状态,该主核状态表示该核处于工作状态且使用其他空闲核的TLB资源管理TLB表项,该从核状态表示该核处于空闲状态且为主核共享TLB资源;
该主核标志位用于表示在该核为主核时的空闲核列表、从核(Slave)列表(也称为备用核列表)和全满的从核(Full Slave)列表,该空闲核列表包括用于表示所有空闲核的向量,该从核列表包括用于表示该核的所有从核的向量,该全满的从核列表包括用于表示该核的所有满从核的向量;
该从核标志位用于表示在该核为从核时的主核(Master)编号和替换表项的写入位置,该主核编号包括该核的唯一主核的标识,该替换表项的写人位置包括从主核中被替换表项在从核中的写入位置。
具体而言,为了实现本发明提出的TLB资源共享方法,需要在每个节点中增加一些寄存器来保存标志位,包括状态标志位,Master标志位和Slave
标志位。由于每个节点都可能成为Master或Slave,所以每个节点中都配备了这三种标志位。例如,如上文表1所示,此处不再赘述。
因此,本发明实施例通过标志位的设置,实现了主核对从核内的TLB资源进行读写,进而扩大了主核的TLB的容量,也即减低了第一核通过内存来获取目标TLB表项的发生,能够加快程序的执行,提高程序的执行效率。
应理解,说明书中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本发明的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本发明的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。
另外,本文中术语“系统”和“网络”在本文中常被可互换使用。本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
应理解,在本发明实施例中,“与A相应的B”表示B与A相关联,根据A可以确定B。但还应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其它信息确定B。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本发明实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本发明可以用硬件实现,或固件实现,或它们的组合方式来实现。当使用软件实现时,可以将上述功能存储在计算机可读介质中或作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是计算机能够存取的任何可用介质。以此为例但不限于:计算机可读介质可以包括RAM、ROM、EEPROM、CD-ROM或其他光盘存储、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质。此外。任何连接可以适当的成为计算机可读介质。例如,如果软件是使用同轴电缆、光纤光缆、双绞线、数字用户线(DSL)或者诸如红外线、无线电和微波之类的无线技术从网站、服务器或者其他远程源传输的,那么同轴电缆、光纤光缆、双绞线、DSL或者诸如红外线、无线和微波之类的无线技术包括在所属介质的定影中。如本发明所使用的,盘(Disk)和碟(disc)包括压缩光碟(CD)、激光碟、光碟、数字通用光碟(DVD)、软盘和蓝光光碟,其中盘通常磁性的复制数据,而碟则用激光来光学的复制数据。上面
的组合也应当包括在计算机可读介质的保护范围之内。
总之,以上所述仅为本发明技术方案的较佳实施例而已,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。
Claims (22)
- 一种管理转址旁路缓存TLB的方法,其特征在于,应用于多核处理器,所述多核处理器包括第一核,所述第一核内包含一个TLB,所述方法包括:所述第一核接收第一地址转换请求,根据所述第一地址转换请求查询所述第一核内的TLB;在所述第一核内的TLB中,所述第一核确定和所述第一地址转换请求对应的第一目标TLB表项缺失时,获取所述第一目标TLB表项;在判断所述第一核内的TLB中的表项存储已满时,所述第一核从所述多核处理器中处于空闲状态的核中确定第二核;所述第一核将所述第一目标TLB表项替换掉所述第一核内的TLB中的第一表项,并将所述第一表项存储在所述第二核内的TLB中。
- 根据权利要求1所述的方法,其特征在于,所述第一核从所述多核处理器中处于空闲状态的核中确定第二核,包括:所述第一核向所述多核处理器中其他核中每个核发送状态查询请求,所述状态查询请求用于查询所述每个核是否处于空闲状态;所述第一核接收所述其他核中每个核发送的响应消息,所述响应消息用于指示所述每个核是否处于空闲状态;所述第一核根据所述响应消息,从处于空闲状态的核中选择一个核作为所述第二核。
- 根据权利要求2所述的方法,其特征在于,所述第一核根据所述响应消息,从处于空闲状态的核中选择一个核作为所述第二核,包括:所述第一核根据所述响应消息确定空闲核列表,所述空闲核列表中包括所述多核处理器中除所述第一核外的其他核中处于空闲状态的核;在所述空闲核列表中,选择与所述第一核的通信开销最小的空闲核作为所述第二核。
- 根据权利要求3所述的方法,其特征在于,在所述空闲核列表中,选择与所述第一核的通信开销最小的空闲核作为所述第二核,包括:所述第一核将所述空闲核列表中与所述第一核的通信跳数最小的空闲核作为所述第二核;或者,所述第一核将所述空闲核列表中与所述第一核的物理距离最小的空闲核作为所述第二核。
- 根据权利要求1至4中任一所述的方法,其特征在于,在确定所述第二核之后,所述方法还包括:将所述第二核的标识记录在所述第一核内的TLB备用核列表中。
- 根据权利要求1至5中任一项所述的方法,其特征在于,在所述第一核从所述多核处理器处于空闲状态的核中确定第二核之后,所述方法还包括:所述第一核接收第二地址转换请求,根据所述第二地址转换请求查询所述第一核内的TLB;在所述第一核内的TLB中,所述第一核确定和所述第二地址转换请求对应的第二目标TLB表项缺失时,查询所述第二核内的TLB;当在所述第二核内的TLB中查询到所述第二目录TLB表项时,将所述第二目标TLB表项替换掉所述第一核内的TLB中的第二表项,并将所述第二表项存储在所述第二核内的TLB中。
- 根据权利要求1至5中任一项所述的方法,其特征在于,所述方法还包括:所述第一核接收第三地址转换请求,根据所述第三地址转换请求查询所述第一核内的TLB;在所述第一核内的TLB中,所述第一核确定和所述第三地址转换请求对应的第三目标TLB表项缺失时,查询所述第二核内的TLB;当在所述第二核内的TLB中确定第三目标TLB表项缺失时,获取所述第三目标TLB表项;在判断所述第一核内的TLB中的表项以及第二核内的TLB中的表项存储都已满时,所述第一核从所述多核处理器处于空闲状态的核中确定第三核;所述第一核将所述第三目标TLB表项替换掉所述第一核内的TLB中的第三表项,并将所述第三表项存储在所述第三核内的TLB中。
- 根据权利要求5至7中任一项所述的方法,其特征在于,当所述第一核由工作状态转换到空闲状态后,所述方法还包括:所述第一核向所述TLB备用核列表中记录的核发送TLB释放指令,所述TLB释放指令用于指示所述备用核列表中记录的核解除TLB共享。
- 根据权利要求5至7中任一项所述的方法,其特征在于,当所述第二核由空闲状态转换到工作状态后,所述方法还包括:所述第一核接收所述第二核发送的TLB解除共享请求,所述TLB解除共享请求中携带有所述第二核的标识;在所述TLB备用核列表中,所述第一核将所述第二核的标识删除。
- 根据权利要求9所述的方法,其特征在于,在所述TLB备用核列表中,所述第一核将所述第二核的标识删除之前,所述方法还包括:所述第一核从所述多核处理器处于空闲状态的核中确定第四核;所述第一核将所述第二核内的TLB中的所有表项拷贝到所述第四核内的TLB中。
- 根据权利要求1至10中任一项所述的方法,其特征在于,所述获取所述第一目标TLB表项,包括:所述第一核从内存的页表中获取所述第一目标TLB表项,或者,所述第一核从所述多核系统中的其他核获取所述第一目标TLB表项。
- 一种多核处理器,其特征在于,所述多核处理器包括第一核,所述第一核内包括一个转址旁路缓存TLB,所述第一核用于接收第一地址转换请求,根据所述第一地址转换请求查询所述第一核内的TLB,在所述第一核内的TLB中,确定和所述第一地址转换请求对应的第一目标TLB表项缺失时,获取所述第一目标TLB表项, 在判断所述第一核内的TLB中的表项存储已满时,从所述多核处理器中处于空闲状态的核中确定第二核,将所述第一目标TLB表项替换掉所述第一核内的TLB中的第一表项;所述第二核用于将所述第一表项存储在所述第二核内的TLB中。
- 根据权利要求12所述的多核处理器,其特征在于,在从所述多核处理器中处于空闲状态的核中确定第二核时,所述第一核具体用于:向所述多核处理器中其他核中每个核发送状态查询请求,所述状态查询请求用于查询所述每个核是否处于空闲状态;接收所述每个核发送的响应消息,所述响应消息用于指示所述每个核是否处于空闲状态;并根据所述响应消息,从处于空闲状态的核中选择一个核作为所述第二核。
- 根据权利要求13所述的多核处理器,其特征在于,在根据所述响应消息,从处于空闲状态的核中选择一个核作为所述第二核时,所述第一核具体用于:根据所述响应消息确定空闲核列表,所述空闲核列表中包括所述多核处理器中除所述第一核外的其他核中处于空闲状态的核;在所述空闲核列表中,选择与所述第一核的通信开销最小的空闲核作为所述第二核。
- 根据权利要求14所述的多核处理器,其特征在于,在选择与所述第一核的通信开销最小的空闲核作为所述第二核时,所述第一核具体用于:将所述空闲核列表中与所述第一核的通信跳数最小的空闲核作为所述第二核;或者,将所述空闲核列表中与所述第一核的物理距离最小的空闲核作为所述第二核。
- 根据权利要求12至15中任一所述的多核处理器,其特征在于,在确定所述第二核之后,所述第一核还用于将所述第二核的标识记录在所述第一核内的TLB备用核列表中。
- 根据权利要求16中任一项所述的多核处理器,其特征在于,所述 第一核还用于:接收第二地址转换请求,根据所述第二地址转换请求查询所述第一核内的TLB;在所述第一核内的TLB中,确定和所述第二地址转换请求对应的第二目标TLB表项缺失时,查询所述第二核内的TLB;当在所述第二核内的TLB中查询到所述第二目录TLB表项时,将所述第二目标TLB表项替换掉所述第一核内的TLB中的第二表项,所述第二核还用于将所述第二表项存储在所述第二核内的TLB中。
- 根据权利要求16中所述的多核处理器,其特征在于,所述第一核还用于:接收第三地址转换请求,根据所述第三地址转换请求查询所述第一核内的TLB;在所述第一核内的TLB中,确定和所述第三地址转换请求对应的第三目标TLB表项缺失时,查询所述第二核内的TLB;当在所述第二核内的TLB中确定第三目标TLB表项缺失时,获取所述第三目标TLB表项;在判断所述第一核内的TLB中的表项以及第二核内的TLB中的表项存储都已满时,从所述多核处理器处于空闲状态的核中确定第三核;将所述第三目标TLB表项替换掉所述第一核内的TLB中的第三表项,所述第三核用于将所述第三表项存储在所述第三核内的TLB中。
- 根据权利要求16至18中任一项所述的多核处理器,其特征在于,当所述第一核由工作状态转换到空闲状态后,所述第一核还用于:向所述TLB备用核列表中记录的核发送TLB释放指令,所述TLB释放指令用于指示所述备用核列表中记录的核解除TLB共享。
- 根据权利要求16至18中任一项所述的多核处理器,其特征在于,当所述第二核由空闲状态转换到工作状态后,所述第二核还用于向所述第一核发送的TLB解除共享请求,所述TLB解除共享请求中携带有所述第二核的标识;所述第一核还用于接收所述TLB解除共享请求,在所述TLB备用核列表中,将所述第二核的标识删除。
- 根据权利要求20所述的多核处理器,其特征在于,当所述第二核由空闲状态转换到工作状态后,所述第一核还用于从所述多核处理器处于空闲状态的核中确定第四核,并将所述第二核内的TLB中的所有表项拷贝到所述第四核内的TLB中。
- 根据权利要求12至21中任一项所述的多核处理器,其特征在于,在获取所述第一目标TLB表项时,所述第一核具体用于:从内存的页表中获取所述第一目标TLB表项,或者,从所述多核系统中的其他核获取所述第一目标TLB表项。
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2016/080867 WO2017190266A1 (zh) | 2016-05-03 | 2016-05-03 | 管理转址旁路缓存的方法和多核处理器 |
| CN201680057517.1A CN108139966B (zh) | 2016-05-03 | 2016-05-03 | 管理转址旁路缓存的方法和多核处理器 |
| EP16900788.7A EP3441884B1 (en) | 2016-05-03 | 2016-05-03 | Method for managing translation lookaside buffer and multi-core processor |
| US16/178,676 US10795826B2 (en) | 2016-05-03 | 2018-11-02 | Translation lookaside buffer management method and multi-core processor |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2016/080867 WO2017190266A1 (zh) | 2016-05-03 | 2016-05-03 | 管理转址旁路缓存的方法和多核处理器 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/178,676 Continuation US10795826B2 (en) | 2016-05-03 | 2018-11-02 | Translation lookaside buffer management method and multi-core processor |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2017190266A1 true WO2017190266A1 (zh) | 2017-11-09 |
Family
ID=60202654
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2016/080867 Ceased WO2017190266A1 (zh) | 2016-05-03 | 2016-05-03 | 管理转址旁路缓存的方法和多核处理器 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US10795826B2 (zh) |
| EP (1) | EP3441884B1 (zh) |
| CN (1) | CN108139966B (zh) |
| WO (1) | WO2017190266A1 (zh) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111124954A (zh) * | 2019-11-12 | 2020-05-08 | 上海高性能集成电路设计中心 | 一种两级转换旁路缓冲的管理装置及方法 |
| CN114579301A (zh) * | 2022-02-15 | 2022-06-03 | 无锡江南计算技术研究所 | 面向国产异构众核加速计算核心局部存储的管理方法 |
| CN114840445A (zh) * | 2022-03-02 | 2022-08-02 | 阿里巴巴(中国)有限公司 | 内存访问方法和装置 |
Families Citing this family (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11321240B2 (en) * | 2018-06-08 | 2022-05-03 | International Business Machines Corporation | MMIO addressing using a translation lookaside buffer |
| US10929302B2 (en) | 2018-06-08 | 2021-02-23 | International Business Machines Corporation | MMIO addressing using a translation table |
| US10740523B1 (en) * | 2018-07-12 | 2020-08-11 | Xilinx, Inc. | Systems and methods for providing defect recovery in an integrated circuit |
| CN111382115B (zh) * | 2018-12-28 | 2022-04-15 | 北京灵汐科技有限公司 | 一种用于片上网络的路径创建方法、装置及电子设备 |
| CN112147931B (zh) * | 2020-09-22 | 2022-06-24 | 哲库科技(北京)有限公司 | 一种信号处理器的控制方法、装置、设备以及存储介质 |
| CN112597075B (zh) * | 2020-12-28 | 2023-02-17 | 成都海光集成电路设计有限公司 | 用于路由器的缓存分配方法、片上网络及电子设备 |
| CN112965921B (zh) * | 2021-02-07 | 2024-04-02 | 中国人民解放军军事科学院国防科技创新研究院 | 一种多任务gpu中tlb管理方法及系统 |
| CN115098410B (zh) * | 2022-06-24 | 2025-08-29 | 海光信息技术股份有限公司 | 处理器、用于处理器的数据处理方法及电子设备 |
| CN117472845B (zh) * | 2023-12-27 | 2024-03-19 | 南京翼辉信息技术有限公司 | 一种多核网络共享系统及其控制方法 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6105113A (en) * | 1997-08-21 | 2000-08-15 | Silicon Graphics, Inc. | System and method for maintaining translation look-aside buffer (TLB) consistency |
| CN1848095A (zh) * | 2004-12-29 | 2006-10-18 | 英特尔公司 | 在多核心/多线程处理器中高速缓存的公平共享 |
| CN103119570A (zh) * | 2010-09-24 | 2013-05-22 | 英特尔公司 | 用于实现微页表的装置、方法和系统 |
| CN104346294A (zh) * | 2013-07-31 | 2015-02-11 | 华为技术有限公司 | 基于多级缓存的数据读/写方法、装置和计算机系统 |
| CN105095094A (zh) * | 2014-05-06 | 2015-11-25 | 华为技术有限公司 | 内存管理方法和设备 |
Family Cites Families (37)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH07104842B2 (ja) * | 1989-03-03 | 1995-11-13 | 日本電気株式会社 | 外部記憶装置の割込み制御方式 |
| US6922755B1 (en) * | 2000-02-18 | 2005-07-26 | International Business Machines Corporation | Directory tree multinode computer system |
| US7149218B2 (en) * | 2001-12-05 | 2006-12-12 | International Business Machines Corporation | Cache line cut through of limited life data in a data processing system |
| US7076609B2 (en) * | 2002-09-20 | 2006-07-11 | Intel Corporation | Cache sharing for a chip multiprocessor or multiprocessing system |
| US20040117587A1 (en) | 2002-12-12 | 2004-06-17 | International Business Machines Corp. | Hardware managed virtual-to-physical address translation mechanism |
| US20050027960A1 (en) * | 2003-07-31 | 2005-02-03 | International Business Machines Corporation | Translation look-aside buffer sharing among logical partitions |
| US20070094450A1 (en) * | 2005-10-26 | 2007-04-26 | International Business Machines Corporation | Multi-level cache architecture having a selective victim cache |
| GB0603552D0 (en) * | 2006-02-22 | 2006-04-05 | Advanced Risc Mach Ltd | Cache management within a data processing apparatus |
| US8533395B2 (en) * | 2006-02-24 | 2013-09-10 | Micron Technology, Inc. | Moveable locked lines in a multi-level cache |
| US8180967B2 (en) * | 2006-03-30 | 2012-05-15 | Intel Corporation | Transactional memory virtualization |
| US7552288B2 (en) * | 2006-08-14 | 2009-06-23 | Intel Corporation | Selectively inclusive cache architecture |
| US7596662B2 (en) * | 2006-08-31 | 2009-09-29 | Intel Corporation | Selective storage of data in levels of a cache memory |
| US7774549B2 (en) * | 2006-10-11 | 2010-08-10 | Mips Technologies, Inc. | Horizontally-shared cache victims in multiple core processors |
| US8161242B2 (en) * | 2008-08-01 | 2012-04-17 | International Business Machines Corporation | Adaptive spill-receive mechanism for lateral caches |
| US8516221B2 (en) * | 2008-10-31 | 2013-08-20 | Hewlett-Packard Development Company, L.P. | On-the fly TLB coalescing |
| US20100146209A1 (en) * | 2008-12-05 | 2010-06-10 | Intellectual Ventures Management, Llc | Method and apparatus for combining independent data caches |
| US8364898B2 (en) * | 2009-01-23 | 2013-01-29 | International Business Machines Corporation | Optimizing a cache back invalidation policy |
| US8407421B2 (en) * | 2009-12-16 | 2013-03-26 | Intel Corporation | Cache spill management techniques using cache spill prediction |
| US9405700B2 (en) * | 2010-11-04 | 2016-08-02 | Sonics, Inc. | Methods and apparatus for virtualization in an integrated circuit |
| US20120151232A1 (en) * | 2010-12-12 | 2012-06-14 | Fish Iii Russell Hamilton | CPU in Memory Cache Architecture |
| US8904068B2 (en) * | 2012-05-09 | 2014-12-02 | Nvidia Corporation | Virtual memory structure for coprocessors having memory allocation limitations |
| US9081706B2 (en) * | 2012-05-10 | 2015-07-14 | Oracle International Corporation | Using a shared last-level TLB to reduce address-translation latency |
| US9110718B2 (en) * | 2012-09-24 | 2015-08-18 | Oracle International Corporation | Supporting targeted stores in a shared-memory multiprocessor system |
| US10310973B2 (en) | 2012-10-25 | 2019-06-04 | Nvidia Corporation | Efficient memory virtualization in multi-threaded processing units |
| US9348752B1 (en) * | 2012-12-19 | 2016-05-24 | Amazon Technologies, Inc. | Cached data replication for cache recovery |
| US9372803B2 (en) * | 2012-12-20 | 2016-06-21 | Advanced Micro Devices, Inc. | Method and system for shutting down active core based caches |
| US9021207B2 (en) * | 2012-12-20 | 2015-04-28 | Advanced Micro Devices, Inc. | Management of cache size |
| US9418010B2 (en) * | 2013-04-17 | 2016-08-16 | Apple Inc. | Global maintenance command protocol in a cache coherent system |
| US9710380B2 (en) * | 2013-08-29 | 2017-07-18 | Intel Corporation | Managing shared cache by multi-core processor |
| US9529730B2 (en) * | 2014-04-28 | 2016-12-27 | Apple Inc. | Methods for cache line eviction |
| US20150378900A1 (en) * | 2014-06-27 | 2015-12-31 | International Business Machines Corporation | Co-processor memory accesses in a transactional memory |
| WO2016019566A1 (zh) * | 2014-08-08 | 2016-02-11 | 华为技术有限公司 | 内存管理方法、装置和系统、以及片上网络 |
| US9875186B2 (en) * | 2015-07-08 | 2018-01-23 | Futurewei Technologies, Inc. | System and method for data caching in processing nodes of a massively parallel processing (MPP) database system |
| US10453169B2 (en) * | 2016-03-28 | 2019-10-22 | Intel Corporation | Method and apparatus for multi format lossless compression |
| US20170300427A1 (en) * | 2016-04-18 | 2017-10-19 | Mediatek Inc. | Multi-processor system with cache sharing and associated cache sharing method |
| US10877886B2 (en) * | 2018-03-29 | 2020-12-29 | Intel Corporation | Storing cache lines in dedicated cache of an idle core |
| US20190041895A1 (en) * | 2018-04-12 | 2019-02-07 | Yingyu Miao | Single clock source for a multiple die package |
-
2016
- 2016-05-03 WO PCT/CN2016/080867 patent/WO2017190266A1/zh not_active Ceased
- 2016-05-03 CN CN201680057517.1A patent/CN108139966B/zh active Active
- 2016-05-03 EP EP16900788.7A patent/EP3441884B1/en active Active
-
2018
- 2018-11-02 US US16/178,676 patent/US10795826B2/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6105113A (en) * | 1997-08-21 | 2000-08-15 | Silicon Graphics, Inc. | System and method for maintaining translation look-aside buffer (TLB) consistency |
| CN1848095A (zh) * | 2004-12-29 | 2006-10-18 | 英特尔公司 | 在多核心/多线程处理器中高速缓存的公平共享 |
| CN103119570A (zh) * | 2010-09-24 | 2013-05-22 | 英特尔公司 | 用于实现微页表的装置、方法和系统 |
| CN104346294A (zh) * | 2013-07-31 | 2015-02-11 | 华为技术有限公司 | 基于多级缓存的数据读/写方法、装置和计算机系统 |
| CN105095094A (zh) * | 2014-05-06 | 2015-11-25 | 华为技术有限公司 | 内存管理方法和设备 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3441884A4 * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111124954A (zh) * | 2019-11-12 | 2020-05-08 | 上海高性能集成电路设计中心 | 一种两级转换旁路缓冲的管理装置及方法 |
| CN114579301A (zh) * | 2022-02-15 | 2022-06-03 | 无锡江南计算技术研究所 | 面向国产异构众核加速计算核心局部存储的管理方法 |
| CN114840445A (zh) * | 2022-03-02 | 2022-08-02 | 阿里巴巴(中国)有限公司 | 内存访问方法和装置 |
| CN114840445B (zh) * | 2022-03-02 | 2025-01-28 | 阿里巴巴(中国)有限公司 | 内存访问方法和装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| US20190073315A1 (en) | 2019-03-07 |
| EP3441884A1 (en) | 2019-02-13 |
| CN108139966B (zh) | 2020-12-22 |
| CN108139966A (zh) | 2018-06-08 |
| EP3441884B1 (en) | 2021-09-01 |
| US10795826B2 (en) | 2020-10-06 |
| EP3441884A4 (en) | 2019-04-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN108139966B (zh) | 管理转址旁路缓存的方法和多核处理器 | |
| US11500689B2 (en) | Communication method and apparatus | |
| US10552337B2 (en) | Memory management and device | |
| CN110892381B (zh) | 用于在数据处理系统中进行快速上下文克隆的方法和装置 | |
| US10733101B2 (en) | Processing node, computer system, and transaction conflict detection method | |
| CN105242872B (zh) | 一种面向虚拟集群的共享存储系统 | |
| WO2019237791A1 (zh) | 虚拟化缓存的实现方法及物理机 | |
| CN105518631B (zh) | 内存管理方法、装置和系统、以及片上网络 | |
| US20130227219A1 (en) | Processor, information processing apparatus, and arithmetic method | |
| KR102027391B1 (ko) | 멀티 코어 시스템에서 데이터 방문자 디렉토리에 액세스하는 방법 및 장치 | |
| CN107341114B (zh) | 一种目录管理的方法、节点控制器和系统 | |
| CN100487674C (zh) | 利用镜像锁定高速缓存传播数据的方法和处理器节点 | |
| CN114840445A (zh) | 内存访问方法和装置 | |
| WO2023155694A1 (zh) | 内存换页方法、系统及存储介质 | |
| CN117806526A (zh) | 数据迁移方法、装置、芯片以及计算机可读存储介质 | |
| JP2017033375A (ja) | 並列計算システム、マイグレーション方法、及びマイグレーションプログラム | |
| CN120670339A (zh) | 内存页面迁移方法、装置及相关设备 | |
| CN117271107A (zh) | 数据处理方法、装置、电子设备以及计算机可读存储介质 | |
| CN119292764B (zh) | 数据处理方法、装置、电子设备以及计算机可读存储介质 | |
| JP2024131398A (ja) | マルチコアシステムおよび読み出し方法 | |
| CN118779126A (zh) | 一种进程间通信方法、装置、存储介质及电子设备 | |
| CN119166376A (zh) | 一种页面访问方法以及装置 | |
| JPH0644136A (ja) | メモリ制御装置 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2016900788 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2016900788 Country of ref document: EP Effective date: 20181107 |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16900788 Country of ref document: EP Kind code of ref document: A1 |
