WO2006112111A1 - キャッシュメモリシステム及びその制御方法 - Google Patents
キャッシュメモリシステム及びその制御方法 Download PDFInfo
- Publication number
- WO2006112111A1 WO2006112111A1 PCT/JP2006/302141 JP2006302141W WO2006112111A1 WO 2006112111 A1 WO2006112111 A1 WO 2006112111A1 JP 2006302141 W JP2006302141 W JP 2006302141W WO 2006112111 A1 WO2006112111 A1 WO 2006112111A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- command
- address
- cache
- unit
- cache memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/12—Replacement control
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0893—Caches characterised by their organisation or structure
Definitions
- the present invention relates to a cache memory system and its control method, and more particularly to a technique for improving controllability of a cache memory system by software.
- a small-capacity, high-speed cache memory such as a static random access memory (SRAM) or the like, is disposed in or near the microprocessor, and part of data is stored in the cache memory. By doing this, you can speed up microprocessor memory access.
- SRAM static random access memory
- Patent Document 1 Japanese Patent Application Laid-Open No. 7-295882
- the present invention has been made in view of the above problems, and actively controls software power.
- the purpose is to provide a cache memory system with a preferred configuration for accepting and processing.
- a cache memory system comprises a cache memory provided between a processor and a memory, and transfer and attribute control means for controlling the cache memory, the transfer and attribute
- the control means is provided with a command entry unit to which a command indicating the transfer of the cache data and the attribute operation and the address specifying the target of the operation is given by the processor executing a predetermined instruction; And an operation request unit that requests the cache memory to perform an operation indicated by the command.
- command entry unit is further provided with an address range corresponding to the command from the processor, and the operation request unit is configured to perform the operation on the plurality of addresses belonging to the address range as the cache. It may be requested sequentially to the memory.
- the cache memory holds a tag indicating a high-order portion of a memory address corresponding to cache data held in the cache entry in association with a cache entry which is a management unit of cache data.
- a valid flag indicating whether the cache entry is valid a dirty flag indicating whether the write operation has been performed on the cache entry, and any other cache entry accesses when the cache entry is accessed.
- the cache entry Save the data to memory, reset the dirty flag and valid flag, and reset the dirty flag of the cache entry only if the dirty flag is reset.
- write-back and invalidation operation and hit specified address If there is a cache entry to perform, one of the oldest operations to set the weak flag of that cache entry may be executed.
- the six types of operations are effective in improving the cache hit rate, reducing unnecessary bus transactions, and balancing bus transactions (temporal distribution). It is suitable for improving the cache efficiency by positively accepting and controlling the control from the
- the transfer and attribute control means further adjusts an address adjustment unit for adjusting the beginning and end of the address range to indicate the beginning data of a cache entry which is a management unit of cache data in the cache memory.
- the operation request unit may sequentially request the cache memory to perform the operation on a plurality of addresses included in the adjusted address range.
- the address adjustment unit adjusts the beginning and end of the address range so as to indicate the beginning data of the cache entry. It reduces the burden of cache management, which eliminates the need to manage size.
- the transfer and attribute control means further includes a command holding unit for holding a plurality of commands and an address range corresponding to each command, and the plurality of held frames.
- Command selector for selecting one of the commands, and the operation request unit sequentially requests the operation indicated by the command for a plurality of addresses belonging to the address range associated with the selected command. I do!
- the command selection unit selects another command before all the requests are made for the selected command, and the operation request unit re-executes the original command. If selected, the command may still be requested, and operations on the address may be requested sequentially.
- the transfer and attribute control means can hold and process a plurality of commands, for example, the plurality of commands can be given from, for example, each task when the processor performs multitask processing. In some cases it is suitable.
- the transfer and attribute control means further determines whether or not the processor has executed a specific instruction by the processor with respect to a predicted address defined for the next scheduled request. And an effective address generation unit that generates an effective address by adding or subtracting a predetermined offset value to the predicted address when a positive determination is made, and the operation request unit generates the effective address. Request the operation for the effective address.
- the transfer and attribute control means further includes a command holding unit for holding a plurality of commands and an address range related to each command
- the execution determining unit further comprises: For each command held, the processor determines whether a specific instruction has been executed by the processor with respect to the predicted address corresponding to that command, and the transfer and attribute control means further makes a positive determination.
- the command selecting unit has a command selecting unit for selecting one of the commands, and the effective address generating unit generates an effective address by adding or subtracting a predetermined value to a predicted address corresponding to the selected command.
- the operation request unit may request an operation indicated by the selected command on the generated effective address.
- the transfer and attribute control means further includes an address output unit for sequentially outputting an address for specifying each cache entry which is a management unit of cache data in the cache memory, and the operation request unit is And requesting the cache memory to perform a sequential operation on one or more cache entries including the cache entry specified by the output address, and the cache memory executes the sequential operation in response to the request.
- the sequential operation may be a write back operation! /.
- the command entry unit indicates that a single command indicating an operation for a single address and an operation for a plurality of addresses included in an address range are performed in synchronization with a specific instruction executed by the processor.
- the command may be given from the processor, and the operation request unit may request the cache memory to make an operation request according to each command based on a preset priority.
- the preset priority may be the order of the commands.
- the present invention can be realized as such a cache memory system, and a control method of the cache memory system having steps of processing executed by characteristic means included in such a cache memory system.
- a control method of the cache memory system having steps of processing executed by characteristic means included in such a cache memory system.
- the cache memory system of the present invention by causing the processor to execute the predetermined instruction, software can request transfer and attribute manipulation of cache data. Good for receiving and processing A cache memory system having a desirable configuration is obtained.
- FIG. 1 is a block diagram showing an example of the overall configuration of a computer system including a processor, a cache memory, a memory, and a TAC according to an embodiment of the present invention.
- FIG. 2 is a block diagram showing a configuration example of a cache memory.
- FIG. 3 is a diagram showing an example of updating of the use flag.
- Fig. 4 is a diagram showing how a cache entry is replaced when there is no weak flag
- Fig. 4 (b) is an explanatory diagram showing the role of the weak flag W in the replacement process. .
- FIG. 5 is a flowchart showing an example of operation primitive processing in a cache memory.
- FIG. 6 is a flow chart showing an example of the auto cleaner processing in the cache memory.
- FIG. 7 is a view showing a configuration example of a cache entry according to a modification.
- FIG. 8 is a diagram showing an example of an interface between a cache memory and a TAC.
- FIG. 9 is a block diagram showing a configuration example of a TAC.
- FIG. 10 is a diagram showing an example of an instruction for writing a command in an operation primitive register
- FIG. 10 (b) is a diagram showing an example of a command.
- FIG. 11 (a) is a diagram showing an example of an instruction to write a start address in a start address register
- FIG. 11 (b) is a diagram showing an example of an instruction to write a size in a size register
- FIG. 11 (c) is a diagram showing an example of an instruction for writing a command in a command register
- FIG. 11 (d) is a diagram showing an example of a command.
- Figure 12 shows an example of an instruction to write a command to the TAC control register.
- FIG. 12 (b) shows an example of the command.
- FIG. 13 is a conceptual diagram for explaining the contents of address adjustment.
- FIG. 14 is a block diagram showing a configuration of an address holding unit.
- FIG. 15 is a flowchart showing an example of area command control processing in the area command control unit.
- FIG. 16 is a flow chart showing an example of instruction interlocked command control processing in the instruction interlocked command control unit.
- FIG. 17 is a flowchart showing an example of auto cleaner control processing in the auto cleaner control unit.
- FIG. 18 is a flowchart showing an example of operation request processing in the operation request unit.
- FIG. 1 is a block diagram showing an overall configuration of a computer system including a processor 1, a memory 2, a cache memory 3 and a transfer and attribute controller (TAC) 4 in the embodiment of the present invention.
- the cache memory 3 and the TAC 4 in this embodiment correspond to the cache memory system of the present invention.
- the TAC 4 issues a command indicating transfer of cache data and attribute operation and an address specifying the target of the operation by the processor 1 executing a predetermined instruction. Request the cache memory 3 to perform the operation indicated by the command.
- the cache memory 3 performs caching of data in response to a memory access from the processor 1 as performed by a conventional general cache memory. Also, when memory access from processor 1 is not processed, six types of cache data transfer and attribute manipulation, and auto cleaner are executed according to the request from TAC4. These six operations are called action primitives. The action primitives and the auto cleaner will be described in detail later.
- Such a configuration of the cache memory system of the present invention actively accepts software power control when trying to overcome the cache efficiency limit by autonomous control by hardware with the aid of software.
- FIG. 2 is a block diagram showing a configuration example of the cache memory 3.
- the cache memory 3 includes an address register 20, a memory I / F 21, a demultiplexer 22, a decoder 30, four ways 31a to 31d (hereinafter abbreviated as ways 0 to 3), and four comparators 32a to 32d. And four AND circuits 33a to 33d, an OR circuit 34, selectors 35 and 36, a demultiplexer 37, and a control unit 38.
- demultiplexer 22 preferentially selects the access address to memory 2 given from processor 1, and no memory access from processor 1, sometimes from TAC4. Select the address given.
- Address register 20 is a register that holds the selected access address.
- This access address shall be 32 bits.
- the access address includes, in order from the most significant bit, a 21-bit tag address, a 4-bit set index (SI in the figure), and a 5-bit word index (WI in the figure).
- the tag address points to an area (the size is the number of sets X blocks) in the memory to be mapped to the way. The size of this area is a size of 2 k bytes determined by the address bits (A10 to AO) lower than the tag address, and is also the size of one way.
- Set index (SI) is the way 0 Point to one of multiple sets across three. Since this set number is a set index power bit, there are 16 sets.
- the cache entry specified by the tag address and the set index is a replacement unit, and when stored in the cache memory, it is called line data or line.
- the size of the line data is the size determined by the address bits lower than the set index, ie, 128 bytes. Assuming that 1 word is 4 bytes, 1 line data is 32 lines.
- the word index (WI) indicates a word in a plurality of words constituting line data.
- the least significant 2 bits (Al, AO) in the address register 20 are ignored during word access.
- the memory IZF 21 is an IZF for accessing the memory 2 from the cache memory 3, such as write back of data from the cache memory 3 to the memory 2 and loading of data to the memory dual power cache memory 3.
- the decoder 30 decodes 4 bits of the set index and selects one of 16 sets across 4 ways 0-3.
- ways 0 to 3 are four ways having the same configuration, and have a capacity of 4 ⁇ 2 k bytes.
- Each way has 16 cache entries.
- One cache entry has a Norid flag V, a 21-bit tag, 128-byte line data, a weak flag W, a use flag U, and a dirty flag D.
- a tag is a copy of a 21 bit tag address.
- Line data is a copy of 128-byte data in a block specified by a tag address and a set index.
- the Norit flag V indicates whether the data of the cache entry is valid.
- the weak flag W indicates whether the access from the processor is a low access frequency access or not, and for replacement control in the cache memory, the access order of the cache entry is forcibly made the oldest. In other words, it indicates that the cache entry should be accessed earlier than any other cache entry accessed.
- the weak flag W indicates the strongest replacement target even if it is evicted before other cache entries.
- the use flag U indicates whether or not the cache entry has been accessed, and is used instead of access order data between cache entries in the LRU method. More accurate The use flag U 1 means that there was an access, 0 does not. However, if the use flags for all four ways in one set are all 1, the ways other than the way to be set are reset to zero. In other words, the use flag U indicates the relative status of the accessed time being old or new or two. In other words, a cache entry with a usage flag U of 1 means that it is accessed more recently than a cache entry with a usage flag of 0.
- the dirty flag D is written back to the memory because the cache entry has not been written by the processor, that is, the data cached in the line exists but may differ from the data in the memory due to the writing. Indicates whether or not necessary.
- the comparator 32a compares whether the tag address in the address register 20 matches the tag of way 0 in the four tags included in the set selected by the set index.
- the comparators 32b to 32c are the same except that they correspond to the ways 31b to 31d.
- the AND circuit 33a determines whether or not the norid flag and the comparison result of the comparator 32a match. Let hO be the comparison result. When the comparison result hO is 1, it means that the line data corresponding to the tag address and the set index in the address register 20 match, that is, a hit occurs in the way 0. Comparison result If hO is 0, it means that it has mis-hit. The same applies to the AND circuits 33b to 33d except that they correspond to the ways 31b to 31d. The comparison result hi ⁇ ! ⁇ 3 means the force hit or missed in ways 1 to 3.
- the OR circuit 34 is the comparison result hO ⁇ ! Take the or of ⁇ 3. Let the result of this or be a hit. hit is
- the selector 35 selects the line data of the hit way among the line data of ways 0 to 3 in the selected set.
- the selector 36 selects one word indicated in the word index from the 32 words of line data selected by the selector 35.
- the write data may be in word units.
- the control unit 38 controls the entire cache memory 3.
- FIG. 3 shows an example of updating of the use flag by the control unit 38.
- the upper part of the figure, interruption, and the lower part show four cache entries constituting set N straddling ways 0 to 3.
- the 1 or 0 at the right end of each of the four cache entries is the value of the usage flag.
- These four usage flags U are denoted as UO to U3.
- the control unit 38 determines a cache entry to be replaced based on the usage flag and performs replacement. For example, in the upper part of FIG. 3, the control unit 38 determines either way 1 or 3 as a replacement target, and in the middle part of FIG. 3 determines method 3 as a replacement target. Decide which of the two to replace.
- Fig. 4 (a) This is a comparative example in the case where it is assumed that the weak flag does not exist, and it is a diagram showing a situation where the cache entry is replaced. Also in the figure, as in FIG. 3, four cache entries constituting set N straddling ways 0 to 3 are shown. The 1 or 0 at the right end of each of the four cash entries is the value of the usage flag. Also, let only data E with low access frequency be data A, B, C and D be data with high access frequency.
- processor 1 accesses data E in the first stage of FIG. 4 (a)
- a cache miss occurs. Due to this cache miss, for example, the cache entry of the frequently accessed data A is replaced with the frequently accessed data D, and the state of the fourth stage is established.
- the less frequently used data E is not selected for replacement and remains in the cache memory.
- FIG. 4 (b) is an explanatory view showing the role of the weak flag W in the replacement process.
- processor 1 accesses data E in the state of the first stage of FIG. 4 (b) (same as the first stage of FIG. 4 (a)), a cache miss occurs.
- the processor 1 sets the weak flag W to 1 in the cache entry of the data E.
- the cache entry of data E is evicted first at the next cache miss, and the state of the second stage is entered.
- the operation primitive is the following operation on a single address specified from TAC4.
- FIG. 5 is a flowchart showing an example of operation primitive processing in the cache memory 3. This action primitive processing is started when the designation I of the action primitive and the address A for specifying the action target are given from TAC 4 and executes the designated action primitive.
- control unit 38 If the control unit 38 is processing a memory access from the processor 1, it waits until the processing is completed (S101: YES), and while the memory access from the processor 1 is not being performed, The multiplexer 22 selects the address given by the TAC 4 (S102).
- Auto Cleaner is the following operation for a single address specified from TAC4.
- Each cache entry of the set indicated by the address (specifically, referring to FIG. 2, four set belonging to the set indicated by the set index SI included in the address) If the dirty flag D and the weak flag W are both set, the cache entry is written back.
- This operation is useful for balancing (temporal distribution) bus transactions.
- FIG. 6 is a flowchart showing an example of the auto-cleaner processing in the cache memory 3. This auto-cleaner process is started when TAC4 gives an auto-cleaner designation I and an address A for designating an operation target.
- control unit 38 If the control unit 38 is processing a memory access from the processor 1, it waits until the processing is completed (S201: YES), and while the memory access from the processor 1 is not being performed, The multiplexer 22 selects the address given by the TAC 4 (S202).
- Each cache entry belonging to the set designated by the set index included in the address is repeated, and the following is repeated (S 203 to S 207).
- the cache memory of the present invention is not limited to the configuration of the above embodiment, but various modifications are possible. Hereinafter, some of the modifications will be described.
- the 4-way 'set' associative cache memory is described as an example, but the number of ways may be any number. In the above embodiment, an example in which the number of sets is 16 has been described, but the number of sets may be several.
- the cache memory of the full-associative system or the direct map system may be used, for example, the cache memory of the set 'associative memory has been described as an example.
- the line has been described as a cache data replacement unit, but a subline, which is each portion obtained by dividing the line into four, may be used as a replacement unit. In that case, each cache entry holds 4 valid flags and 4 dirty flags respectively.
- FIG. 7 is a view showing an example of the configuration of a cache entry in that case. 1/4 of the line In addition to the brine, 1/2, 1/8 and 1/16 of the line may be sub-lines. In this case, each cache entry holds the same number of NORD flags and dirty flags as the subline. It may be switched depending on the instruction from TAC4 whether the replacement unit is a line or a sub-line.
- FIG. 8 is a diagram showing an example of an interface (signals to be transmitted and received) between the cache memory 3 and the TAC 4. The above variants are considered in this example.
- the 32-bit address specifies an operation target. Requests, request acceptance, and execution completion are used in the handshake for the issuance of requests. Fill, Touch, Write Back, Invalid Gate, Write Back / Deactivate, Aging, and Auto Cleaner specify the requested operation.
- the 3-bit active way specifies whether to make the active way active way for each way.
- the refill unit specifies whether the replacement unit is a line or a subline.
- FIG. 9 is a block diagram showing a configuration example of the TAC 4.
- the TAC 4 requests the cache memory 3 to perform an operation according to a command given from the processor 1, the command entry unit 40, the area command control unit 41, the command interlocking command control unit 42, the auto cleaner control unit 43, and the operation request.
- a section 44 is provided.
- the command given by TAC 1 from processor 1 is synchronized with a single command indicating an operation for a single address and a specific command executed by the setr for an operation for a plurality of addresses included in the address range.
- a single command indicating an operation for a single address and a specific command executed by the setr for an operation for a plurality of addresses included in the address range.
- write back sequentially the cache data and an instruction linked command indicating that the command is to be performed, an area command indicating that the operation is to be performed asynchronously with a specific instruction executed by the processor to a plurality of addresses included in the address range, and cache data sequentially.
- an auto cleaner command is synchronized with a single command indicating an operation for a single address and a specific command executed by the setr for an operation for a plurality of addresses included in the address range.
- the command entry unit 40 is a group of registers to which a command and an address are written when the processor 1 executes a predetermined instruction, and the operation primitive register 401, start address register 402, size register 403, command A register 404 and a TAC control register 405 are provided. These registers are directly accessible from the processor 1, for example, assigned to a predetermined memory address, and hold the contents written by the processor 1.
- the area command control unit 41 is a functional block that holds up to four area commands and generates a request according to the held commands.
- the area command control unit 41 includes an address adjustment unit 411, a command holding unit 412, and commands.
- a selection unit 413 is provided.
- Instruction interlocking command control unit 42 is a functional block that holds up to four instruction interlocking commands and generates a request according to the held command in synchronization with a specific instruction that processor 1 executes. Yes, address adjustment unit 421, command holding unit 422, execution determination unit 423
- a command selection unit 424 and an effective address generation unit 425.
- the auto cleaner control unit 43 is a functional block that generates an auto cleaner request, and includes a landing address output unit 431.
- the single command is written to the operation primitive register 401 and held.
- FIG. 10 (a) shows an example of an instruction for writing a single command in the operation primitive register 401.
- This instruction is a normal transfer instruction (mov instruction), and specifies a command as a source operand and an operation primitive register (PR) 401 as a destination operand.
- mov instruction normal transfer instruction
- PR operation primitive register
- FIG. 10 (b) shows an example of the command format.
- This command format consists of the operation target address and the specification of operation primitives.
- Command entry unit 40 outputs, to operation request unit 44, a request corresponding to the single command held in operation primitive register 401.
- FIG. 11 (a) shows an example of an instruction to write the start address in the start address register (SAR) 402.
- This instruction is also a normal transfer instruction as in FIG. 10 (a).
- the start address indicates the start address of the command operation target.
- the size register (SR) 403 is shown in 011 (b).
- This instruction is also a normal transfer instruction.
- the size indicates the size of the operation target.
- the unit of size may be a predetermined unit, which may be the number of bytes or the number of lines (the number of cache entries).
- FIG. 11 (c) shows an example of an instruction for writing a command in the command register (CR) 404.
- This instruction is also a normal transfer instruction.
- This command format includes an instruction interlock flag that specifies whether operation requests should be interlocked with the execution of a specific instruction (that is, whether the command is an instruction interlock command or an area command), specification of an operation primitive, and specification of an operation primitive. And an increment value indicating an interval of a plurality of addresses to be operated within the specified address range.
- this increment value is, for example, the size of a line
- desired operations can be sequentially performed on all cache data in the address range.
- the specific instruction is a load'store instruction with post-increment
- desired operations can be sequentially performed on the operation target of the instruction.
- command entry unit 40 When the contents described above are written in start address register 402, size register 403, and command register 404, command entry unit 40 outputs the command to area command control unit 41 as the area command, and performs instruction interlocking. If it is a command, it is output to the command linked command control section 42.
- the auto-cleaner command updates the value of the auto-cleaner flag, which is in a predetermined bit position in the TAC control register 405, and indicates whether to enable or disable the auto-cleaner.
- FIG. 12 (a) shows an example of an instruction for updating the auto cleaner flag (along with the entire contents of the TAC control register 405). This instruction is also a normal transfer instruction.
- FIG. 12 (b) shows an example of the command format. This command format corresponds to the format of the TAC control register and contains the new value of the autocleaner flag in the bit position.
- Command entry unit 40 outputs the value of the auto cleaner flag held in TAC control register 405 to auto cleaner control unit 43.
- the address adjustment unit 411 acquires the address range related to the area command from the command entry unit 40, adjusts both ends to point to the top data of the cache entry, and the command holding unit 412 Holding up to four area commands after adjusting the range, the command selection unit 413 selects one of the held area commands (for example, the oldest one) and , Generates a request according to the selected area command, and outputs the request to the operation request unit 44.
- the address adjustment unit 411 first adds the start address held in the start address register 402 and the size held in the size register 403.
- the addition result is an end address that points to the end of the address range.
- the size is expressed in a predetermined unit, and if the unit is, for example, a byte, it is added as a byte address, and if it is a line, it is added as a line address. .
- the address adjusting unit 411 adjusts the start address and the end address.
- FIG. 13 is a conceptual diagram for explaining the contents of the adjustment.
- the start address points at any position other than the beginning of the line N.
- the start address is adjusted to the start of the next line (N + 1) so that it is adjusted to the line start address a or adjusted to the start of the line N including the data of the start address to the line start address b.
- the line pointed to by the line start address is called the start line.
- the end address points to any position other than the beginning of line M.
- the end address is a force that is adjusted to the line end address a to point to the beginning of the previous line (M-1), or adjusted to the line end address b to point to the beginning of the line M that contains data on the end address. It is adjusted.
- the line pointed to by the simple end address is called an end line.
- start address and the end address may be lined inside or out line by line. After the outer line of the line unit, the outer line of the sub line unit and the inner line of the inner line are also possible.
- the processor 1 can designate any start address and any size regardless of the line size and the line boundary.
- the command holding unit 412 acquires the line start address and the line end address from the address adjustment unit 411, and acquires and holds the operation primitive and the increment value from the command register 404.
- FIG. 14 is a block diagram showing the configuration of the command holding unit 412.
- the command holding unit 412 is composed of four registers 4121 to 4124, and may preferably be a FIFO (First In First Out) type queue capable of reading the contents of each register.
- Each register holds the line start address and line end address acquired from the address adjustment unit 411, and the increment value and operation primitive acquired from the command register 404.
- the line start address is successively updated by adding an increment value for each request according to the control from the operation request unit 44, and is used as a current address.
- the command selection unit 413 selects one of the commands held in the command holding unit 412 (for example, the oldest one, that is, the head of the FIFO queue), and one request indicating the current address and the operation primitive. Are generated and output to the operation request unit 44.
- FIG. 15 is a flowchart showing an example of area command control processing in the area command control unit 41.
- the address adjustment unit 411 adjusts the address of the command (S302), and registers the address adjusted command in the command holding unit 412. (S303).
- This command does not hold the command yet Forces to be registered in a new register, if all registers hold commands, registration is performed by overwriting the oldest command. Note that even if the current address exceeds the line end address (which means that the operation for all target addresses has been requested), or if you delete the command in advance. Well, if all the registers hold commands, it is also conceivable to cause the processor 1 to generate an exception without registering new commands.
- command selection unit 413 selects the oldest command and outputs a request indicating the current address and the operation primitive to operation request unit 44. (S305).
- the command selecting unit 413 may select a command other than the oldest command. Specifically, for example, after providing a configuration in which the task that has issued each command and the current task currently being executed by the processor 1 are known, the command given from the current task is the oldest command. It is conceivable to select in preference to the others.
- This configuration is suitable, for example, when the processor 1 performs multitasking processing, and can process commands given from the current task with priority following processing of task switching. Also, since the command holding unit 412 holds the current address for each command, even if the original command is selected once after another command is selected, the original command is not deleted. You can still request an operation on an address.
- the address adjusting unit 421 acquires the address range related to the instruction interlocking command from the command entry unit 40, and adjusts both ends thereof to point to the head data of the cache entry respectively. , Holds up to four instruction-linked commands after adjusting the address range.
- the execution determination unit 423 determines whether the processor has executed a specific instruction with respect to the predicted address defined for each command held, and the command selection unit 424 makes a positive determination.
- One of the commands (for example, the oldest one) is selected, and a request for the effective address generated by the effective address generation unit 425 is output to the operation request unit 44.
- the address adjusting unit 421 and the command holding unit 422 are the same as the address adjusting unit 4 11 and the command holding unit 412, respectively, the description will be omitted.
- the execution judgment unit 423 uses the current address of each command held in the command holding unit 422 as a predicted address, and causes the processor 1 to use the predicted address as an operand (specifically, load with post-increment) It is determined whether an instruction and a store instruction with post-increment have been executed. In order to make such a determination, for example, the processor 1 supplies the execution determination unit 423 with a signal C indicating that a post-increment load instruction and a post-increment store instruction are being executed, and the execution determination unit 423 may compare the address appearing on the address bus with each current address held in the command holding unit 422 while the signal C is being supplied.
- the command selection unit 424 selects one of the commands (for example, the oldest one) that has been positively determined by the execution determination unit 423.
- the effective address generation unit 425 adds the offset value of one line to the current address (predicted address described above) of the command to make the command operation target. Generates an effective address. If the selected command indicates writeback, invalidation, writeback and invalidation, or oldest, the command operation target is obtained by subtracting the offset value for one line from the current address of the command. Generate an effective address that Then, the command selection unit 424 generates one request indicating the effective address and the specification of the operation primitive, outputs the request to the operation request unit 44, and increases the current address corresponding to the selected command by an increment value. Update by letting
- the replacement unit is a line
- the offset value for one line is used
- the replacement unit is a subline
- a smaller (for example, two sublines) offset value is used. I see.
- FIG. 16 is a flowchart showing an example of the instruction interlocking command control process in the instruction interlocking command control unit 42.
- the address adjustment unit 421 adjusts the address of the command (S 402), and the command after the address adjustment to the command holding unit 422 is performed.
- the execution determination unit 423 compares the operand address of the specific instruction with the current address of each command (S405). If there is a command whose address matches (S406: YES), the command selection unit 424 selects the oldest one of them (S407), and the effective address generation unit 425 generates an effective address for the selected command. Then, the command selection unit 424 outputs a request indicating the effective address and the operation primitive to the operation request unit 44, and updates the current address corresponding to the selected command by incrementing the increment value ( S408).
- the cleaning address output unit 431 sequentially outputs an address specifying each cache entry in the cache memory 3.
- the cleaning address output unit 431 may be a simple register that holds and outputs an address.
- Auto cleaner control unit 43 has an auto cleaner from TAC control register 405. While the flag value indicating that it is effective is acquired, a request for the auto cleaner operation for the address output from the cleaning address output unit 431 is output to the operation request unit 44.
- FIG. 17 is a flowchart showing an example of the auto cleaner control process in the auto cleaner control unit 43.
- the auto cleaner control unit 43 controls the auto cleaner for the address output from the cleaning address output unit 431.
- the operation is output to the operation request unit 44 (S 502).
- the operation request unit 44 is provided with up to four requests: a request according to a single command, a request according to an instruction interlocked command, a request according to an area command, and a request for auto cleaner operation. there is a possibility.
- the operation request unit 44 selects one request based on the preset priority, and transfers the selected request to the cache memory 3. This pre-set priority may be the order of the commands described above.
- the operation request unit 44 transfers a request according to an instruction interlocked command, a request according to an area command, and a request for an auto cleaner operation, an operation target next to the request is indicated. Control the current address of the command and the cleaning address.
- FIG. 18 is a flowchart showing an example of the operation request process in the operation request unit 44.
- the operation request unit 44 may delete the contents of the operation primitive register.
- the request is transferred to the cache memory 3 (S 604). If the current address of the command corresponding to the transferred request (incremented when the command is selected in the command selection section 424) exceeds the line end address, the command is erased and It is also good. If there is a request according to the area command (S 606: YES), the request is transferred to the cache memory 3 (S 607), and then the area command control unit 41 is used to transmit the power address of the area command. , And increment by increment value (S 608). If the current address exceeds the line end address by this update, the command may be deleted.
- the request is transferred to the cache memory 3 (S 610), and then the auto cleaner control unit 43 is controlled and the tallying address output unit 431 The address output from is increased by the unit address of the set.
- TAC 4 is given a command relating to cache data transfer and attribute manipulation as processor 1 executes a predetermined instruction, and according to the command, six types of operation primitives and auto cleaner manipulation are performed. Is requested to the cache memory 3, and the cache memory 3 executes an operation according to the request from the TAC 4 while performing conventional general caching according to the memory access from the processor 1.
- the six operation primitives and the auto cleaner operation are effective in improving the cache hit rate, reducing unnecessary bus transactions, and leveling (temporal distribution) of bus transactions, and the above-mentioned advance operation.
- Software power can also be requested by having the processor 1 execute prescribed instructions (see, for example, FIGS. 10, 11, and 12). Therefore, this configuration is suitable for performing these operations under active control from software in order to improve the efficiency of the cache.
- such specific instructions may be inserted into the program by the compiler.
- the compiler knows the program location to which data is initially buffered, such as by determining the lifetime of the data, and inserts a command before it to request a fill operation, and so on. If you know the program position where data will not be written after that and then insert a command that requires the oldest data, it may be possible.
- TAC4 fulfills only simple functions such as acquisition of commands from processor 1, queuing and selection of commands, sequential generation of requests for multiple addresses, and transfer of requests to cache memory 3. .
- the present invention is applicable to a cache memory that improves controllability from software, and can be applied to, for example, an on-chip cache memory, an off-chip cache memory, a data cache memory, an instruction cache memory, and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
Claims
Priority Applications (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| DE602006011292T DE602006011292D1 (de) | 2005-04-08 | 2006-02-08 | Cache-speichersystem und steuerverfahren dafür |
| EP06713284A EP1868101B1 (en) | 2005-04-08 | 2006-02-08 | Cache memory system, and control method therefor |
| JP2007521091A JP4090497B2 (ja) | 2005-04-08 | 2006-02-08 | キャッシュメモリシステム及びその制御方法 |
| CN2006800105539A CN101151600B (zh) | 2005-04-08 | 2006-02-08 | 高速缓冲存储器系统及其控制方法 |
| US11/816,858 US7953935B2 (en) | 2005-04-08 | 2006-02-08 | Cache memory system, and control method therefor |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2005-112839 | 2005-04-08 | ||
| JP2005112839 | 2005-04-08 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2006112111A1 true WO2006112111A1 (ja) | 2006-10-26 |
Family
ID=37114853
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2006/302141 Ceased WO2006112111A1 (ja) | 2005-04-08 | 2006-02-08 | キャッシュメモリシステム及びその制御方法 |
Country Status (8)
| Country | Link |
|---|---|
| US (1) | US7953935B2 (ja) |
| EP (1) | EP1868101B1 (ja) |
| JP (1) | JP4090497B2 (ja) |
| KR (1) | KR20070093452A (ja) |
| CN (1) | CN101151600B (ja) |
| DE (1) | DE602006011292D1 (ja) |
| TW (1) | TW200702993A (ja) |
| WO (1) | WO2006112111A1 (ja) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2012530979A (ja) * | 2009-12-22 | 2012-12-06 | インテル・コーポレーション | 所与の範囲のページのキャッシュフラッシュおよび所与の範囲のエントリのtlb無効化を行なうシステム、方法、および装置 |
| US8484423B2 (en) | 2009-06-23 | 2013-07-09 | International Business Machines Corporation | Method and apparatus for controlling cache using transaction flags |
| JP5536655B2 (ja) * | 2008-09-17 | 2014-07-02 | パナソニック株式会社 | キャッシュメモリ、メモリシステム及びデータコピー方法 |
| US20210365392A1 (en) * | 2019-02-21 | 2021-11-25 | Huawei Technologies Co., Ltd. | System on Chip, Access Command Routing Method, and Terminal |
Families Citing this family (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR100818920B1 (ko) * | 2006-02-10 | 2008-04-04 | 삼성전자주식회사 | 그래픽 객체의 처리 시 발생된 키 이벤트를 제어하는 장치및 그 방법 |
| WO2007098055A2 (en) * | 2006-02-17 | 2007-08-30 | Google Inc. | Encoding and adaptive, scalable accessing of distributed models |
| KR100985517B1 (ko) * | 2008-12-04 | 2010-10-05 | 주식회사 에이디칩스 | 캐시메모리 제어방법 |
| KR101502827B1 (ko) * | 2014-03-20 | 2015-03-17 | 주식회사 에이디칩스 | 컴퓨터 시스템에서의 캐시 무효화 방법 |
| KR102128475B1 (ko) * | 2014-03-27 | 2020-07-01 | 에스케이하이닉스 주식회사 | 반도체 메모리 장치 |
| US9779025B2 (en) | 2014-06-02 | 2017-10-03 | Micron Technology, Inc. | Cache architecture for comparing data |
| CN105243685B (zh) * | 2015-11-17 | 2018-01-02 | 上海兆芯集成电路有限公司 | 数据单元的关联性检查方法以及使用该方法的装置 |
| CN105427368B (zh) * | 2015-11-17 | 2018-03-20 | 上海兆芯集成电路有限公司 | 数据单元的关联性检查方法以及使用该方法的装置 |
| US10101925B2 (en) * | 2015-12-23 | 2018-10-16 | Toshiba Memory Corporation | Data invalidation acceleration through approximation of valid data counts |
| KR102649657B1 (ko) * | 2018-07-17 | 2024-03-21 | 에스케이하이닉스 주식회사 | 데이터 저장 장치 및 동작 방법, 이를 포함하는 스토리지 시스템 |
| US11281585B2 (en) | 2018-08-30 | 2022-03-22 | Micron Technology, Inc. | Forward caching memory systems and methods |
| US11086791B2 (en) * | 2019-08-29 | 2021-08-10 | Micron Technology, Inc. | Methods for supporting mismatched transaction granularities |
Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS5119453A (en) * | 1974-08-08 | 1976-02-16 | Fujitsu Ltd | Patsufua memoriseigyohoshiki |
| JPH0612327A (ja) | 1992-02-28 | 1994-01-21 | Motorola Inc | キャッシュメモリを有するデータプロセッサ |
| EP0602808A2 (en) | 1992-12-18 | 1994-06-22 | Advanced Micro Devices, Inc. | Cache systems |
| JPH07295882A (ja) | 1994-04-22 | 1995-11-10 | Hitachi Ltd | 情報処理装置、及び、情報処理システム |
| JPH11167520A (ja) * | 1997-12-04 | 1999-06-22 | Nec Corp | プリフェッチ制御装置 |
| JPH11272551A (ja) * | 1998-03-19 | 1999-10-08 | Hitachi Ltd | キャッシュメモリのフラッシュ制御方式およびキャッシュメモリ |
| EP1182566A1 (en) | 2000-08-21 | 2002-02-27 | Texas Instruments France | Cache operation based on range of addresses |
| JP2003223360A (ja) * | 2002-01-29 | 2003-08-08 | Hitachi Ltd | キャッシュメモリシステムおよびマイクロプロセッサ |
| JP2004326758A (ja) * | 2003-04-24 | 2004-11-18 | Internatl Business Mach Corp <Ibm> | 局所的なキャッシュ・ブロック・フラッシュ命令 |
Family Cites Families (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS6045855A (ja) | 1983-08-22 | 1985-03-12 | Fujitsu Ltd | 磁気ディスク装置の順次アクセス検出方法 |
| JPH0784879A (ja) | 1993-09-09 | 1995-03-31 | Toshiba Corp | キャッシュメモリ装置 |
| US5860110A (en) * | 1995-08-22 | 1999-01-12 | Canon Kabushiki Kaisha | Conference maintenance method for cache memories in multi-processor system triggered by a predetermined synchronization point and a predetermined condition |
| JP4067887B2 (ja) | 2002-06-28 | 2008-03-26 | 富士通株式会社 | プリフェッチを行う演算処理装置、情報処理装置及びそれらの制御方法 |
| JP2004118305A (ja) | 2002-09-24 | 2004-04-15 | Sharp Corp | キャッシュメモリ制御装置 |
| JP4009304B2 (ja) | 2003-09-19 | 2007-11-14 | 松下電器産業株式会社 | キャッシュメモリおよびキャッシュメモリ制御方法 |
| JP4044585B2 (ja) | 2003-11-12 | 2008-02-06 | 松下電器産業株式会社 | キャッシュメモリおよびその制御方法 |
| EP1686485A4 (en) | 2003-11-18 | 2008-10-29 | Matsushita Electric Industrial Co Ltd | CACHE MEMORY AND CONTROL PROCEDURE THEREFOR |
| KR100826757B1 (ko) | 2003-11-18 | 2008-04-30 | 마쯔시다덴기산교 가부시키가이샤 | 캐시 메모리 및 그 제어 방법 |
| WO2005066796A1 (ja) | 2003-12-22 | 2005-07-21 | Matsushita Electric Industrial Co., Ltd. | キャッシュメモリ及びその制御方法 |
| JP4521206B2 (ja) * | 2004-03-01 | 2010-08-11 | 株式会社日立製作所 | ネットワークストレージシステム、コマンドコントローラ、及びネットワークストレージシステムにおけるコマンド制御方法 |
| WO2005091146A1 (ja) | 2004-03-24 | 2005-09-29 | Matsushita Electric Industrial Co., Ltd. | キャッシュメモリ及びその制御方法 |
-
2006
- 2006-02-08 DE DE602006011292T patent/DE602006011292D1/de not_active Expired - Lifetime
- 2006-02-08 KR KR1020077018401A patent/KR20070093452A/ko not_active Ceased
- 2006-02-08 US US11/816,858 patent/US7953935B2/en not_active Expired - Fee Related
- 2006-02-08 WO PCT/JP2006/302141 patent/WO2006112111A1/ja not_active Ceased
- 2006-02-08 EP EP06713284A patent/EP1868101B1/en not_active Expired - Fee Related
- 2006-02-08 JP JP2007521091A patent/JP4090497B2/ja not_active Expired - Lifetime
- 2006-02-08 CN CN2006800105539A patent/CN101151600B/zh not_active Expired - Fee Related
- 2006-02-15 TW TW095105065A patent/TW200702993A/zh unknown
Patent Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS5119453A (en) * | 1974-08-08 | 1976-02-16 | Fujitsu Ltd | Patsufua memoriseigyohoshiki |
| JPH0612327A (ja) | 1992-02-28 | 1994-01-21 | Motorola Inc | キャッシュメモリを有するデータプロセッサ |
| EP0602808A2 (en) | 1992-12-18 | 1994-06-22 | Advanced Micro Devices, Inc. | Cache systems |
| JPH07295882A (ja) | 1994-04-22 | 1995-11-10 | Hitachi Ltd | 情報処理装置、及び、情報処理システム |
| JPH11167520A (ja) * | 1997-12-04 | 1999-06-22 | Nec Corp | プリフェッチ制御装置 |
| JPH11272551A (ja) * | 1998-03-19 | 1999-10-08 | Hitachi Ltd | キャッシュメモリのフラッシュ制御方式およびキャッシュメモリ |
| EP1182566A1 (en) | 2000-08-21 | 2002-02-27 | Texas Instruments France | Cache operation based on range of addresses |
| JP2003223360A (ja) * | 2002-01-29 | 2003-08-08 | Hitachi Ltd | キャッシュメモリシステムおよびマイクロプロセッサ |
| JP2004326758A (ja) * | 2003-04-24 | 2004-11-18 | Internatl Business Mach Corp <Ibm> | 局所的なキャッシュ・ブロック・フラッシュ命令 |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP5536655B2 (ja) * | 2008-09-17 | 2014-07-02 | パナソニック株式会社 | キャッシュメモリ、メモリシステム及びデータコピー方法 |
| US8484423B2 (en) | 2009-06-23 | 2013-07-09 | International Business Machines Corporation | Method and apparatus for controlling cache using transaction flags |
| JP2012530979A (ja) * | 2009-12-22 | 2012-12-06 | インテル・コーポレーション | 所与の範囲のページのキャッシュフラッシュおよび所与の範囲のエントリのtlb無効化を行なうシステム、方法、および装置 |
| JP2015084250A (ja) * | 2009-12-22 | 2015-04-30 | インテル・コーポレーション | 所与の範囲のページのキャッシュフラッシュおよび所与の範囲のエントリのtlb無効化を行なうシステム、方法、および装置 |
| US20210365392A1 (en) * | 2019-02-21 | 2021-11-25 | Huawei Technologies Co., Ltd. | System on Chip, Access Command Routing Method, and Terminal |
| US11748279B2 (en) * | 2019-02-21 | 2023-09-05 | Huawei Technologies Co., Ltd. | System on chip, access command routing method, and terminal |
Also Published As
| Publication number | Publication date |
|---|---|
| CN101151600B (zh) | 2012-02-22 |
| EP1868101A4 (en) | 2009-01-21 |
| TW200702993A (en) | 2007-01-16 |
| EP1868101B1 (en) | 2009-12-23 |
| CN101151600A (zh) | 2008-03-26 |
| KR20070093452A (ko) | 2007-09-18 |
| US20090100231A1 (en) | 2009-04-16 |
| JP4090497B2 (ja) | 2008-05-28 |
| JPWO2006112111A1 (ja) | 2008-11-27 |
| DE602006011292D1 (de) | 2010-02-04 |
| US7953935B2 (en) | 2011-05-31 |
| EP1868101A1 (en) | 2007-12-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP5536658B2 (ja) | バッファメモリ装置、メモリシステム及びデータ転送方法 | |
| EP1066566B1 (en) | Shared cache structure for temporal and non-temporal instructions and corresponding method | |
| WO2006112111A1 (ja) | キャッシュメモリシステム及びその制御方法 | |
| US20070186048A1 (en) | Cache memory and control method thereof | |
| US20100217937A1 (en) | Data processing apparatus and method | |
| US20110167224A1 (en) | Cache memory, memory system, data copying method, and data rewriting method | |
| US11036639B2 (en) | Cache apparatus and method that facilitates a reduction in energy consumption through use of first and second data arrays | |
| US8621152B1 (en) | Transparent level 2 cache that uses independent tag and valid random access memory arrays for cache access | |
| JP4008947B2 (ja) | キャッシュメモリ及びその制御方法 | |
| US7555610B2 (en) | Cache memory and control method thereof | |
| US6446168B1 (en) | Method and apparatus for dynamically switching a cache between direct-mapped and 4-way set associativity | |
| KR20050007907A (ko) | 동적으로 할당 또는 해제되는 버퍼를 가지는 캐쉬 메모리장치, 이를 구비한 디지털 데이터 처리 시스템 및 그 방법 | |
| JP5012016B2 (ja) | キャッシュメモリ装置、演算処理装置及びキャッシュメモリ装置の制御方法 | |
| JP2009093559A (ja) | プロセッサ、情報処理装置、プロセッサのキャッシュ制御方法 | |
| KR100851298B1 (ko) | 캐시 메모리 컨트롤러 및 이를 이용한 캐시 메모리 관리방법 | |
| JP2004240616A (ja) | メモリコントローラ及びメモリアクセス制御方法 | |
| JP4008946B2 (ja) | キャッシュメモリ及びその制御方法 | |
| JPH08335188A (ja) | ソフトウェア制御可能なキャッシュメモリ装置 | |
| WO2010098152A1 (ja) | キャッシュメモリシステムおよびキャッシュメモリ制御方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| ENP | Entry into the national phase |
Ref document number: 2007521091 Country of ref document: JP Kind code of ref document: A |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| WWE | Wipo information: entry into national phase |
Ref document number: 1020077018401 Country of ref document: KR |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 11816858 Country of ref document: US |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2006713284 Country of ref document: EP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 200680010553.9 Country of ref document: CN |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWW | Wipo information: withdrawn in national office |
Ref document number: DE |
|
| NENP | Non-entry into the national phase |
Ref country code: RU |
|
| WWW | Wipo information: withdrawn in national office |
Ref document number: RU |
|
| WWP | Wipo information: published in national office |
Ref document number: 2006713284 Country of ref document: EP |