TWI859909B

TWI859909B - Computer system based on wafer-on-wafer architecture and memory test method

Info

Publication number: TWI859909B
Application number: TW112120781A
Authority: TW
Inventors: 蔡鎮年
Original assignee: 鯨鏈科技股份有限公司
Priority date: 2023-06-02
Filing date: 2023-06-02
Publication date: 2024-10-21
Also published as: TW202449801A

Abstract

A memory test method for computer systems based on wafer-on-wafer architecture. The computer system is a three-dimensional wafer product formed by a memory wafer layer, a logic circuit layer and a substrate. When a memory test is performed, a memory device in the memory wafer layer is divided into a plurality of memory sub blocks with the same size. First, a same data table is created in at least two memory sub blocks. Then, a plurality of different initial values prepared in advance are provided for a workload proof operation that executed at the at least two memory sub blocks at same time, and the data tables of the at least two memory sub blocks are repetitively read and written for plural times to produce multiple operation results corresponding to each initial value. When the verification module obtains operation results from the arithmetic module, the operation results are compares with corresponding known answers, and the error rate of each of the at least two memory sub blocks under test is therefore estimated.

Description

Computer system and memory test method based on wafer stacking architecture

本申請是關於一種記憶體裝置的測試方法，尤其是有關於在以晶圓堆疊技術實作而成的計算機系統中測試記憶體的方法。The present application relates to a method for testing a memory device, and more particularly to a method for testing a memory in a computer system implemented using wafer stacking technology.

在這個年代，人工智能和區塊鏈的應用成為一種新的商機。區塊鏈可以廣泛應用於智能合約，數位身份，共享經濟等應用。In this era, the application of artificial intelligence and blockchain has become a new business opportunity. Blockchain can be widely used in applications such as smart contracts, digital identities, and shared economy.

然而一些區塊鏈平台為了各種安全性考量或是漏洞修補，經常會改變區塊鏈的演算法。除了增加運算難度之外，也經常刻意為了降低特定應用晶片的運算效率而做出特殊設計，例如增加記憶體吞吐量的要求，或是儲存裝置的容量要求。However, some blockchain platforms often change the blockchain algorithm for various security considerations or vulnerability repairs. In addition to increasing the difficulty of calculations, they often deliberately make special designs to reduce the computing efficiency of specific application chips, such as increasing the requirements for memory throughput or the capacity of storage devices.

因此，對於區塊鏈伺服器的開發者而言，也隨著必須要改變硬體架構，來適應對記憶體吞吐量的高標準要求。因此，全新的區塊鏈伺服器的硬體架構，是有待開發的。除此之外，適用於新硬體架構的記憶體控制方法和記憶體測試方法，都需要有對應的改良機制。Therefore, for blockchain server developers, they must also change the hardware architecture to adapt to the high standard requirements for memory throughput. Therefore, a completely new hardware architecture for blockchain servers needs to be developed. In addition, the memory control method and memory testing method applicable to the new hardware architecture need to have corresponding improvement mechanisms.

傳統的記憶體測試方法需要在晶片內部配置一個專用的測試模組，才能對記憶體進行測試。在基於晶圓堆疊架構的計算機系統中，測試全部的記憶體單元可能花費巨大的時間。因此本申請實施例提出一種基於晶圓堆疊架構的計算機系統及其記憶體測試方法，其利用計算機系統中原生的演算法搭配不同的初始值和已知解，並根據初始值的運算結果及已知解來快速判斷記憶體裝置的可用性，達到提升測試效率的目的。Traditional memory testing methods require a dedicated test module to be configured inside the chip in order to test the memory. In a computer system based on a wafer stacking architecture, testing all memory units may take a huge amount of time. Therefore, the present application embodiment proposes a computer system based on a wafer stacking architecture and a memory testing method thereof, which utilizes the native algorithm in the computer system with different initial values and known solutions, and quickly judges the availability of the memory device based on the calculation results of the initial values and the known solutions, thereby achieving the purpose of improving testing efficiency.

為了達成上述目的，本申請提出一種記憶體測試方法，其適用於基於晶圓堆疊架構的計算機系統中。所謂晶圓堆疊架構，即該計算機系統是由一記憶體晶體層，一邏輯電路層和一基底以WoW(Wafer on Wafer)技術形成的晶圓堆疊。該記憶體晶體層包含至少一記憶體裝置。該邏輯電路層透過多個連接墊與該至少一記憶體裝置連接，包含一韌體，一運算模組連接該韌體和該至少一記憶體裝置，及一判斷模組連接該運算模組。In order to achieve the above-mentioned purpose, the present application proposes a memory testing method, which is applicable to a computer system based on a wafer stacking architecture. The so-called wafer stacking architecture is a wafer stack formed by a memory crystal layer, a logic circuit layer and a substrate using WoW (Wafer on Wafer) technology. The memory crystal layer includes at least one memory device. The logic circuit layer is connected to the at least one memory device through a plurality of connection pads, and includes a firmware, a computing module connecting the firmware and the at least one memory device, and a judgment module connecting the computing module.

在進行記憶體測試時，基本上會將該至少一記憶體裝置分為多個記憶體子區塊來進行測試。首先在多個記憶體子區塊中的至少二個記憶體子區塊建立相同的資料表。接著提供預先準備好的多個不同的初始值，以及每一初始值對應的已知解。每一初始值是該運算模組進行工作量證明運算時所需的值。該工作量證明運算可對該至少二個記憶體子區塊的該資料表進行多次讀寫而產生對應每一初始值的多個運算結果。該判斷模組從該運算模組獲取該等運算結果後，並與對應的已知解比較，就能統計受測記憶體子區塊的錯誤率。When performing a memory test, the at least one memory device is basically divided into a plurality of memory sub-blocks for testing. First, the same data table is established in at least two of the plurality of memory sub-blocks. Then, a plurality of different pre-prepared initial values and a known solution corresponding to each initial value are provided. Each initial value is a value required by the operation module for performing a proof-of-work operation. The proof-of-work operation can read and write the data table of the at least two memory sub-blocks multiple times to generate a plurality of operation results corresponding to each initial value. After the judgment module obtains the operation results from the operation module and compares them with the corresponding known solutions, the error rate of the tested memory sub-block can be statistically calculated.

在進一步的實施例中，該判斷模組將該記憶體子區塊的錯誤率和一臨界值比較，以判斷該記憶體子區塊的可用性為可用或不可用。之後，該判斷模組根據可用性的判斷結果，將受測記憶體子區塊的編號記錄在該韌體中。In a further embodiment, the determination module compares the error rate of the memory sub-block with a critical value to determine whether the availability of the memory sub-block is available or unavailable. Afterwards, the determination module records the number of the tested memory sub-block in the firmware according to the determination result of the availability.

在具體的實施例中，該工作量證明運算可以是一種乙太雜湊演算法(Ethash)。該資料表是一有向無環圖(Directed Acyclic Graph; DAG)。In a specific embodiment, the proof-of-work operation can be an Ethereum hashing algorithm (Ethash). The data table is a directed acyclic graph (DAG).

為了使每一記憶體子區塊中的記憶體單元完整受到測試，可透過運算模組的程控配置，在該記憶體子區塊中產生大小與該記憶體子區塊一致的有向無環圖。In order to fully test the memory units in each memory sub-block, a directed acyclic graph with a size consistent with the memory sub-block can be generated in the memory sub-block through program-controlled configuration of the computing module.

在進一步的實施例中，所述預先準備的初始值和對應的已知解，可以是出廠時預存在韌體中，或是即時由外部輸入的值。在進行測試時，由判斷模組從韌體中或外部獲取已知解，和測試結果比對。In a further embodiment, the pre-prepared initial value and the corresponding known solution can be pre-stored in the firmware when leaving the factory, or can be a value input from the outside in real time. When testing, the judgment module obtains the known solution from the firmware or the outside and compares it with the test result.

為了達成上述目的，本申請也提出上述基於晶圓堆疊架構的計算機系統的實施例。In order to achieve the above objectives, the present application also proposes an embodiment of the computer system based on the wafer stacking architecture.

綜上所述，本申請實施例提出的計算機系統及記憶體測試方法，適用於晶圓堆疊架構的計算機系統。利用運算模組中原生的演算法搭配不同的初始值和已知解，同時對二個以上的記憶體子區塊執行測試程序，以根據運算模組的運算結果來判斷記憶體子區塊的可用性。本方法不需要在計算機系統中另外設計測試邏輯，有效簡化計算機系統的複雜度，而且藉由同時測試複數個記憶體子區塊，記憶體裝置的測試效率也指數提升。In summary, the computer system and memory testing method proposed in the embodiment of the present application are applicable to computer systems with wafer stacking architecture. The native algorithm in the computing module is used in combination with different initial values and known solutions to execute the test program on more than two memory sub-blocks at the same time, so as to judge the availability of the memory sub-blocks according to the computing results of the computing module. This method does not require the design of additional test logic in the computer system, which effectively simplifies the complexity of the computer system, and by testing multiple memory sub-blocks at the same time, the test efficiency of the memory device is also improved exponentially.

圖1是根據本申請實施例之立體晶圓產品100的晶圓堆疊架構的示意圖。立體晶圓產品100由至少一記憶體晶體層110，一邏輯電路層120，及一基底130層層堆疊。記憶體晶體層110中布建多個記憶體裝置112。記憶體裝置112可以是由記憶體顆粒組成的記憶體模組，例如動態隨機存取記憶體(DRAM)。邏輯電路層120是布建了多個邏輯電路122的晶圓層。該等邏輯電路122是各種晶片模組的通稱，例如，但不限定於應用程式特定積體電路(ASIC)、現場可程式化邏輯閘陣列(FPGA)、記憶體控制器或處理器。最下層的基底130除了提供基本的支撐，也提供額外的布線空間。每一層之間配置有多個連接墊102或104以提供訊號通道。本實施例的立體晶圓產品100是計算機系統200的半成品，經過切割後可產生多個獨立運作的計算機系統200。圖1的基底130雖然只顯示一個，但在實體設計中不限定於此，也可以是多個基底130並行排列。在一實施例中，每個計算機系統200可各包含若干個記憶體裝置112和若干個邏輯電路122，具備相同的立體晶圓結構。換句話說，每個計算機系統200中包含的記憶體裝置112和邏輯電路122，是事先各別布局於記憶體晶體層110和邏輯電路層120中，再以晶圓堆疊的形式製成的立體結構。在立體結構中，晶片組之間的電路導線不需要佔用多餘的面積，可直接以連接墊102和104做為訊號傳遞的路徑，使資料傳遞的效能問題有效被解決，以實現本申請的計算機系統200。FIG1 is a schematic diagram of a wafer stacking structure of a three-dimensional wafer product 100 according to an embodiment of the present application. The three-dimensional wafer product 100 is stacked layer by layer with at least one memory crystal layer 110, a logic circuit layer 120, and a substrate 130. A plurality of memory devices 112 are arranged in the memory crystal layer 110. The memory device 112 can be a memory module composed of memory particles, such as a dynamic random access memory (DRAM). The logic circuit layer 120 is a wafer layer in which a plurality of logic circuits 122 are arranged. The logic circuits 122 are a generic term for various chip modules, such as, but not limited to, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), memory controllers, or processors. The bottommost substrate 130 not only provides basic support, but also provides additional wiring space. Multiple connection pads 102 or 104 are configured between each layer to provide signal channels. The three-dimensional wafer product 100 of this embodiment is a semi-finished product of a computer system 200, and after cutting, multiple independently operating computer systems 200 can be produced. Although only one substrate 130 is shown in Figure 1, it is not limited to this in the physical design, and multiple substrates 130 can also be arranged in parallel. In one embodiment, each computer system 200 may include a plurality of memory devices 112 and a plurality of logic circuits 122, and have the same three-dimensional wafer structure. In other words, the memory devices 112 and logic circuits 122 included in each computer system 200 are pre-arranged in the memory crystal layer 110 and the logic circuit layer 120, respectively, and then manufactured in the form of wafer stacking to form a three-dimensional structure. In the three-dimensional structure, the circuit wires between the chipsets do not need to occupy extra area, and the connection pads 102 and 104 can be directly used as the path for signal transmission, so that the performance problem of data transmission is effectively solved to realize the computer system 200 of the present application.

在圖1的晶圓堆疊架構中，由於傳送線路的數量不再受到平面設計的限制，因此可以使用大量的專用接線來解決資料傳遞的效能問題。記憶體晶體層110與邏輯電路層120的間距變小，所以可以在同面積中布局較多的接口。頻寬是由接口數目乘以通道的頻率公式得到，因此更多的接口可以的得到更高的頻寬。得益於晶圓堆疊技術，需要大量記憶體存取的應用，如機器學習、人工智慧或區塊鏈應用，可以獲得指數型的效能提升。In the wafer stacking architecture of Figure 1, since the number of transmission lines is no longer limited by the planar design, a large number of dedicated wiring can be used to solve the performance problem of data transmission. The spacing between the memory crystal layer 110 and the logic circuit layer 120 becomes smaller, so more interfaces can be laid out in the same area. The bandwidth is obtained by multiplying the number of interfaces by the frequency formula of the channel, so more interfaces can get higher bandwidth. Thanks to wafer stacking technology, applications that require a large amount of memory access, such as machine learning, artificial intelligence or blockchain applications, can achieve exponential performance improvements.

圖2是本申請實施例的計算機系統200。計算機系統200是由圖1中的記憶體晶體層110，邏輯電路層120和基底130疊合後再切割而成的產品。計算機系統200中可包含從記憶體晶體層110中切出來的一或多個記憶體裝置210，以及從邏輯電路層120中切出來的多個晶片，例如韌體202，運算模組204和判斷模組206。運算模組204中可包含記憶體控制器的功能，透過圖1所示的多個連接墊102連接記憶體裝置210。舉例來說，記憶體裝置210中可包含多個記憶體陣列(BANK)，每個記憶體陣列包含多個記憶體單元(未圖示)。每個記憶體單元用於儲存位元資料0或1。每一行或一列記憶體單元可透過對應的多條連接墊102與運算模組204連接。記憶體內部電路已存在已知的標準規格，因此詳細實施方式不再贅述。FIG2 is a computer system 200 of an embodiment of the present application. The computer system 200 is a product formed by stacking and then cutting the memory crystal layer 110, the logic circuit layer 120 and the substrate 130 in FIG1. The computer system 200 may include one or more memory devices 210 cut out of the memory crystal layer 110, and multiple chips cut out of the logic circuit layer 120, such as firmware 202, computing module 204 and judgment module 206. The computing module 204 may include the function of a memory controller, and the memory device 210 is connected through the multiple connection pads 102 shown in FIG1. For example, the memory device 210 may include a plurality of memory banks, each of which includes a plurality of memory cells (not shown). Each memory cell is used to store bit data 0 or 1. Each row or column of memory cells may be connected to the computing module 204 via a plurality of corresponding connection pads 102. There are known standard specifications for the internal circuits of the memory, so the detailed implementation will not be repeated.

一般來說，常見的記憶體測試方式係將記憶體的測試邏輯內建在晶片中。舉例來說，當運算模組204收到執行測試的命令時，會自動去測試記憶體裝置，此方式稱為內建自測模組(Build-In-Self-Test；BIST)。但當記憶體裝置的數量逐漸增加時，執行一次完整BIST所需的時間（包含執行測試的時間以及讀取結果所需的時間）也會跟著增加，而此點在晶圓堆疊架構上更是影響甚鉅。Generally speaking, the common memory test method is to build the memory test logic into the chip. For example, when the computing module 204 receives a command to execute a test, it will automatically test the memory device. This method is called a built-in self-test module (Build-In-Self-Test; BIST). However, as the number of memory devices gradually increases, the time required to execute a complete BIST (including the time to execute the test and the time required to read the results) will also increase, and this point has a great impact on the wafer stacking architecture.

本申請提出的實施例，先將記憶體裝置210中的記憶體區塊分為多個記憶體子區塊212，再以運算模組204同時對至少二個記憶體子區塊212執行運算模組204的原生功能，就能快速推測所有記憶體子區塊212的資料錯誤率。在此實施例中，運算模組204的原生功能可以是一種工作量證明演算法。在本實施例中，如果一記憶體子區塊212的資料錯誤率大於一臨界值，則標記該記憶體子區塊212為不可用區塊。相對的，如果該記憶體子區塊212的資料錯誤率小於該臨界值，則標記該記憶體子區塊212為可用區塊。在記憶體測試完成後，可用區塊的資訊可被儲存在韌體202中，以利計算機系統200進行後續應用。In the embodiment proposed in the present application, the memory block in the memory device 210 is first divided into a plurality of memory sub-blocks 212, and then the computing module 204 executes the native function of the computing module 204 on at least two memory sub-blocks 212 at the same time, so as to quickly estimate the data error rate of all memory sub-blocks 212. In this embodiment, the native function of the computing module 204 can be a proof-of-work algorithm. In this embodiment, if the data error rate of a memory sub-block 212 is greater than a critical value, the memory sub-block 212 is marked as an unavailable block. In contrast, if the data error rate of the memory sub-block 212 is less than the critical value, the memory sub-block 212 is marked as an available block. After the memory test is completed, the information of the available block can be stored in the firmware 202 to facilitate subsequent applications of the computer system 200.

請參考圖3，在此實施例中，運算模組204更可定義出多個子運算模組2041，每一子運算模組2041可個別對不同記憶體子區塊212執行運算模組204的原生功能，以快速推測複數個記憶體子區塊212的資料錯誤率。在一實施例中，子運算模組2041的數量可與記憶體子區塊212的數量相同或不同，且本申請不以此為限制。3 , in this embodiment, the operation module 204 may further define a plurality of sub-operation modules 2041, and each sub-operation module 2041 may execute the native function of the operation module 204 on different memory sub-blocks 212 to quickly estimate the data error rates of the plurality of memory sub-blocks 212. In one embodiment, the number of sub-operation modules 2041 may be the same as or different from the number of memory sub-blocks 212, and the present application is not limited thereto.

上述之工作量證明演算法是一種基於雜湊函數的區塊求解演算法。輸入任意初始值，經過雜湊函式，會得到對應的一個結果。只要初始值變動一個位元，就會引起雪崩效應，所以幾乎無法反推。因此藉由尋找具有指定特徵的運算結果，讓使用者進行大量的窮舉運算，就可以達成工作量證明。常見的工作量證明演算法，包含SHA-256, Ethash, Scrypt, Equihash, CryptoNode或基於Memory Hard Function的演算法等。以下僅以Ethash為例說明，其他演算法詳細運作細節不再重複介紹。The above-mentioned proof-of-work algorithm is a block-solving algorithm based on a hash function. Input any initial value, and after passing through the hash function, a corresponding result will be obtained. As long as the initial value changes by one bit, it will cause an avalanche effect, so it is almost impossible to reverse. Therefore, by looking for calculation results with specified characteristics, users can perform a large number of exhaustive calculations to achieve proof of work. Common proof-of-work algorithms include SHA-256, Ethash, Scrypt, Equihash, CryptoNode, or algorithms based on Memory Hard Function. The following only uses Ethash as an example, and the detailed operation details of other algorithms will not be repeated.

乙太雜湊(Ethereum Hash; Ethash)是一種從Dagger-Hashimoto演算法改動而成的工作量證明演算法。主要原理為藉著極大量的隨機查表，加重記憶體的負載，以扺制應用程式特定晶片ASIC的加速效果。Ethash使用一個初始值為1GB的資料表，及一個初始值為16MB的假亂數表(cache)。每經過一個包含30000個區塊的間隔，該資料表和假亂數表就會重新計算。這個30000區塊的間隔稱為一期(epoch)。每期epoch所產生的內容都會增大，因此1GB與16MB都只是基本值。本申請實施例的計算機系統會儲存整個資料表和假亂數表。計算機系統對一區塊進行工作量證明運算時，先將一個隨機數(nonce)填入區塊頭，並以SHA-3形式不斷查表尋求混合值MIX以計算該區塊的解。Ethereum Hash (Ethash) is a proof-of-work algorithm modified from the Dagger-Hashimoto algorithm. The main principle is to increase the memory load through a large number of random table lookups to counteract the acceleration effect of application-specific chip ASICs. Ethash uses a data table with an initial value of 1GB and a pseudo-noise table (cache) with an initial value of 16MB. The data table and pseudo-noise table will be recalculated after every interval of 30,000 blocks. This interval of 30,000 blocks is called an epoch. The content generated by each epoch will increase, so 1GB and 16MB are just basic values. The computer system of the embodiment of the present application will store the entire data table and pseudo-noise table. When a computer system performs a proof-of-work calculation on a block, it first fills a random number (nonce) into the block header and continuously looks up the table in SHA-3 form to find the mixed value MIX to calculate the solution of the block.

乙太雜湊演算法在運行過程中使用的資料表，稱為有向無環圖(Directed Acyclic Graph；DAG)。子運算模組2041進行工作量證明演算法時，會隨機的從DAG中讀取64次資料，每一筆資料是128 bytes。子運算模組2041會先隨機選擇一個初始值，然後從DAG中隨機讀取一筆資料。子運算模組2041將該初始值與從該DAG中讀取的該筆資料合併後，可透過雜湊函數SHA-3轉換為一中間產物。接著，子運算模組2041再次從DAG中隨機讀取一筆資料，與該中間產物混合並執行下一次雜湊函數SHA-3。如此重複64次之後，子運算模組2041可以得到相應的記憶體子區塊212的工作量證明。The data table used by the Ethereum hashing algorithm during operation is called a directed acyclic graph (DAG). When the sub-operation module 2041 performs the proof-of-work algorithm, it will randomly read 64 data from the DAG, and each data is 128 bytes. The sub-operation module 2041 will first randomly select an initial value, and then randomly read a data from the DAG. After the sub-operation module 2041 combines the initial value with the data read from the DAG, it can be converted into an intermediate product through the hash function SHA-3. Then, the sub-operation module 2041 randomly reads a piece of data from the DAG again, mixes it with the intermediate product and executes the next hash function SHA-3. After repeating this 64 times, the sub-operation module 2041 can obtain the proof of work of the corresponding memory sub-block 212.

以下將搭配圖2及圖3說明本申請之記憶體測試方法。在本實施例中，在進行記憶體測試時，基本上會將該記憶體裝置210分為多個記憶體子區塊212，各別進行測試。首先在複數個記憶體子區塊212中建立相同的資料表，複數個記憶體子區塊212例如為至少兩個記憶體子區塊212。在一實施例中，可對全域的記憶體子區塊212建立相同的資料表，且本申請不以此為限制。接著，使運算模組204中的複數個子運算模組2041同時對複數個記憶體子區塊212執行測試程序。即，以一個子運算模組2041對建立有資料表的一個記憶體子區塊212執行測試程序，且同時有複數個子運算模組2041個別地對複數個記憶體子區塊212執行測試程序。所述測試程序為對每一個子運算模組2041提供預先準備好的多個不同的初始值#N，以及每一初始值#N對應的已知解#A。每一初始值#N是子運算模組2041進行工作量證明運算時所需的值。在一實施例中，子運算模組2041也可以使用外部輸入的雜湊值#H做為初始值。於每次工作量證明運算，子運算模組2041可對該記憶體子區塊212中的資料表進行多次讀寫而產生對應每一初始值#N的多個運算結果。該判斷模組206從子運算模組2041獲取該等運算結果後，並與對應的已知解#A比較，就能統計受測記憶體子區塊212的錯誤率。The memory testing method of the present application will be described below with reference to FIG. 2 and FIG. 3. In the present embodiment, when performing a memory test, the memory device 210 is basically divided into a plurality of memory sub-blocks 212, and each of the memory sub-blocks 212 is tested separately. First, the same data table is established in a plurality of memory sub-blocks 212, and the plurality of memory sub-blocks 212 are, for example, at least two memory sub-blocks 212. In one embodiment, the same data table can be established for all memory sub-blocks 212, and the present application is not limited thereto. Then, the plurality of sub-operation modules 2041 in the operation module 204 execute the test program on the plurality of memory sub-blocks 212 at the same time. That is, a sub-operation module 2041 executes a test program on a memory sub-block 212 with a data table, and at the same time, a plurality of sub-operation modules 2041 execute the test program on a plurality of memory sub-blocks 212 individually. The test program provides a plurality of different pre-prepared initial values #N and a known solution #A corresponding to each initial value #N for each sub-operation module 2041. Each initial value #N is a value required for the sub-operation module 2041 to perform a workload proof operation. In one embodiment, the sub-operation module 2041 can also use an externally input hash value #H as an initial value. In each workload proof operation, the sub-operation module 2041 can read and write the data table in the memory sub-block 212 multiple times to generate multiple operation results corresponding to each initial value #N. After the judgment module 206 obtains the operation results from the sub-operation module 2041 and compares them with the corresponding known solution #A, it can calculate the error rate of the tested memory sub-block 212.

該判斷模組206計算出該記憶體子區塊212的錯誤率後，和一臨界值#T比較，以判斷該記憶體子區塊212的可用性為可用或不可用。之後，該判斷模組206將可用或不可用的記憶體子區塊212的編號#R記錄在韌體202中。舉例來說，在某些應用中，臨界值#T只需要80%。在較嚴格的應用中，臨界值#T可能要求99%或99.999%。After calculating the error rate of the memory sub-block 212, the judgment module 206 compares it with a critical value #T to determine whether the availability of the memory sub-block 212 is available or unavailable. Afterwards, the judgment module 206 records the number #R of the available or unavailable memory sub-block 212 in the firmware 202. For example, in some applications, the critical value #T only requires 80%. In more stringent applications, the critical value #T may require 99% or 99.999%.

在本實施例中，藉由使受測記憶體子區塊212建立相同的資料表，本申請之記憶體測試方法可僅使用與所述資料表相對應的初始值#N和對應的已知解#A來判斷受測記憶體子區塊212的可用性，有效減少記憶體測試方法的複雜度。同時，藉由同時使複數個記憶體子區塊212進行測試，更大幅減少記憶體裝置的整體測試時間，達到提升測試效率的目的。In this embodiment, by establishing the same data table for the tested memory sub-block 212, the memory testing method of the present application can only use the initial value #N corresponding to the data table and the corresponding known solution #A to determine the availability of the tested memory sub-block 212, thereby effectively reducing the complexity of the memory testing method. At the same time, by testing multiple memory sub-blocks 212 at the same time, the overall testing time of the memory device is further reduced, thereby achieving the purpose of improving the testing efficiency.

在本實施例中所提到的該工作量證明運算可以是一種乙太雜湊演算法。該資料表即為前述的有向無環圖DAG。為了使每一記憶體子區塊212中的記憶體單元完整受到測試，可透過運算模組204的程控配置，在該記憶體子區塊212中產生大小與該記憶體子區塊212一致的有向無環圖DAG。The workload proof operation mentioned in this embodiment can be an Ethereum hashing algorithm. The data table is the aforementioned directed acyclic graph DAG. In order to fully test the memory units in each memory sub-block 212, a directed acyclic graph DAG with a size consistent with the memory sub-block 212 can be generated in the memory sub-block 212 through the program configuration of the computing module 204.

進一步地，所述預先準備的初始值#N和對應的已知解#A，可以是出廠時預存在韌體202中，或是即時由外部輸入的值。Furthermore, the pre-prepared initial value #N and the corresponding known solution #A may be pre-stored in the firmware 202 at the time of leaving the factory, or may be values inputted from the outside in real time.

在本申請的計算機系統200的實施例中，有向無環圖DAG的初始大小可以是1GB。但有向無環圖DAG的大小會隨著區塊鏈上區塊的數量增加而增加。亦即，有向無環圖DAG的大小並非永遠是一個固定值，在實際應用上可隨著需求而增加或減少。In the embodiment of the computer system 200 of the present application, the initial size of the directed acyclic graph DAG may be 1 GB. However, the size of the directed acyclic graph DAG increases as the number of blocks on the blockchain increases. In other words, the size of the directed acyclic graph DAG is not always a fixed value, and may increase or decrease as required in actual applications.

假設計算機系統200中的記憶體裝置總大小為4GB，我們可將記憶體裝置分為8192個記憶體子區塊212，每一個記憶體子區塊212的大小相同，為512千位元(Kilo Bytes; KB)。同時，將Ethash演算法中的有向無環圖DAG大小也配置為512KB。如此，跑一次Ethash演算法得到一個工作量證明的步驟，相當於從大小為512KB的記憶體子區塊212中，隨機讀取了64*128 bytes = 8KB的資料。在更進一步的做法中，每個記憶體子區塊212的分割大小可以彈性調整。舉例來說，如果一記憶體子區塊212經過測試後發現錯誤率偏高，可以將其切為更小的記憶體子區塊再各別進行測試，以縮小不可用的子區塊範圍，增加可用容量。Assuming that the total size of the memory device in the computer system 200 is 4GB, we can divide the memory device into 8192 memory sub-blocks 212, and each memory sub-block 212 has the same size of 512 kilobytes (KB). At the same time, the size of the directed acyclic graph DAG in the Ethash algorithm is also configured to be 512KB. In this way, running the Ethash algorithm once to obtain a proof of work step is equivalent to randomly reading 64*128 bytes = 8KB of data from the memory sub-block 212 of size 512KB. In a further approach, the segmentation size of each memory sub-block 212 can be flexibly adjusted. For example, if a memory sub-block 212 is found to have a high error rate after testing, it can be cut into smaller memory sub-blocks and tested separately to reduce the scope of unusable sub-blocks and increase available capacity.

本實施例可預先準備多組已知結果的初始值，做為測試的樣本。當運算模組204的子運算模組2041使用一個記憶體子區塊212得到一個運算結果時，可將此運算結果與已知結果做比較。若兩者相同，則代表這一次隨機讀取的8KB的資料都是正確的。本實施例可重複利用不同的初始值來使子運算模組2041對同一記憶體子區塊212進行多次運算，以增加記憶體子區塊212中被存取的記憶體單元數量。可以理解的是，由於記憶體子區塊212中儲存的是一個有向無環圖DAG，使用一定數量以上的不同初始值對同一記憶體子區塊212進行多次測試運算後，便可以確認記憶體子區塊212中所有記憶體單元受到測試。同樣地，上述流程也可應用於記憶體裝置中8192個區域。測試出所有記憶體子區塊212中的可用之區域之後，該記憶體裝置便可用於後續之計算以及應用。In this embodiment, multiple sets of initial values of known results can be prepared in advance as test samples. When the sub-operation module 2041 of the operation module 204 uses a memory sub-block 212 to obtain an operation result, the operation result can be compared with the known result. If the two are the same, it means that the 8KB data randomly read this time are all correct. In this embodiment, different initial values can be repeatedly used to make the sub-operation module 2041 perform multiple operations on the same memory sub-block 212 to increase the number of memory units accessed in the memory sub-block 212. It is understandable that, since a directed acyclic graph DAG is stored in the memory sub-block 212, after multiple test operations are performed on the same memory sub-block 212 using a certain number of different initial values, it can be confirmed that all memory units in the memory sub-block 212 are tested. Similarly, the above process can also be applied to 8192 areas in the memory device. After testing all available areas in the memory sub-block 212, the memory device can be used for subsequent calculations and applications.

圖4為本申請實施例所述的記憶體測試方法，其包括步驟S100及S200。於步驟S100，在至少兩個記憶體子區塊中建立資料表。在此步驟中，資料表為一有向無環圖DAG，且至少兩個記憶體子區塊212中的資料表為相同。在一實施例中，可在記憶體裝置210中的所有記憶體子區塊212建立相同的資料表，且本申請不以此為限制。於步驟S200，使建立資料表的記憶體子區塊同時進行測試。在此步驟中，運算模組204的複數個子運算模組2041可同時對已建立資料表且相應的記憶體子區塊212進行測試程序。也就是同時可有複數個記憶體子區塊212執行測試程序，藉此，可大幅減少記憶體裝置的測試時間，達到提升測試效率的目的。FIG4 is a memory testing method according to an embodiment of the present application, which includes steps S100 and S200. In step S100, a data table is established in at least two memory sub-blocks. In this step, the data table is a directed acyclic graph DAG, and the data tables in at least two memory sub-blocks 212 are the same. In one embodiment, the same data table can be established in all memory sub-blocks 212 in the memory device 210, and the present application is not limited thereto. In step S200, the memory sub-blocks for which the data table is established are tested simultaneously. In this step, the plurality of sub-operation modules 2041 of the operation module 204 can simultaneously perform the test procedure on the corresponding memory sub-blocks 212 for which the data tables have been established. That is, the plurality of memory sub-blocks 212 can execute the test procedure at the same time, thereby greatly reducing the test time of the memory device and achieving the purpose of improving the test efficiency.

進一步的，步驟S200更包括步驟S210至步驟S260，如圖5所示。Furthermore, step S200 further includes step S210 to step S260, as shown in FIG5 .

在步驟S210中，計算機系統200開始進行記憶體測試。子運算模組2041對記憶體裝置210中的其中一記憶體子區塊212進行測試，依序使用多個預先準備的不同起始值，對該記憶體子區塊212進行工作量證明運算。在步驟S220中，每次工作量證明運算的結果，與對應的已知解進行比對，以判斷運算結果是否正確，並儲存多個運算結果的正確和錯誤記錄。在步驟S230中，判斷是否完成所有初始值的工作量運算。如果是，則繼續往下執行步驟S240。如果尚有未使用的初始值，則重覆執行步驟S210。In step S210, the computer system 200 starts to perform a memory test. The sub-operation module 2041 tests one of the memory sub-blocks 212 in the memory device 210, and sequentially uses a plurality of different pre-prepared starting values to perform a workload proof operation on the memory sub-block 212. In step S220, the result of each workload proof operation is compared with the corresponding known solution to determine whether the operation result is correct, and the correct and incorrect records of multiple operation results are stored. In step S230, it is determined whether the workload operation of all initial values is completed. If so, the execution continues to step S240. If there are unused initial values, step S210 is repeated.

在步驟S240中，計算記憶體子區塊212的資料錯誤率。由於已在步驟S210至S230中獲取了多個運算結果的正確和錯誤記錄，所以可以統計出記憶體子區塊212的資料錯誤率。在步驟S250中，判斷模組206可根據資料錯誤率標記該記憶體子區塊212為可用或不可用。In step S240, the data error rate of the memory sub-block 212 is calculated. Since the correct and error records of multiple calculation results have been obtained in steps S210 to S230, the data error rate of the memory sub-block 212 can be calculated. In step S250, the judgment module 206 can mark the memory sub-block 212 as available or unavailable according to the data error rate.

在步驟S260中，判斷模組206將判斷為可用或不可用的記憶體子區塊212的標記資訊儲存在韌體中。可以理解的是，上述步驟僅為示例性說明，其中的執行順序可以合理的調動。本記憶體測試方法，可以在計算機系統200開機時進行，或是在計算機系統200處於閒置狀態時進行。本記憶體測試方法也可以在使用者要求時才觸發。In step S260, the determination module 206 stores the marking information of the memory sub-block 212 determined to be available or unavailable in the firmware. It is to be understood that the above steps are only exemplary, and the execution order can be reasonably adjusted. The memory test method can be performed when the computer system 200 is turned on, or when the computer system 200 is in an idle state. The memory test method can also be triggered only when the user requests it.

綜上所述，本申請實施例提出記憶體測試方法，以及適用於晶圓堆疊架構的計算機系統。利用運算模組204中原生的演算法搭配不同的初始值和已知解，即可直接根據運算模組204的運算結果來判斷記憶體裝置的可用性，不需要在計算機系統200中另外設計測試邏輯。同時，使受測記憶體子區塊212建立相同的資料表，本申請之記憶體測試方法可僅使用與所述資料表相對應的初始值#N和對應的已知解#A來判斷受測記憶體子區塊212的可用性，有效減少記憶體測試方法的複雜度。另外，本申請計算機系統的定義有多個子運算模組2041，因此可同時使複數個記憶體子區塊212進行測試程序，更大幅減少記憶體裝置的整體測試時間，達到提升測試效率的目的。In summary, the embodiment of the present application proposes a memory test method and a computer system suitable for a wafer stacking architecture. By utilizing the native algorithm in the operation module 204 in combination with different initial values and known solutions, the availability of the memory device can be directly determined based on the operation result of the operation module 204, without the need to design additional test logic in the computer system 200. At the same time, the same data table is established for the tested memory sub-block 212. The memory test method of the present application can only use the initial value #N and the corresponding known solution #A corresponding to the data table to determine the availability of the tested memory sub-block 212, effectively reducing the complexity of the memory test method. In addition, the computer system of the present application is defined with multiple sub-computing modules 2041, so that multiple memory sub-blocks 212 can be tested at the same time, which greatly reduces the overall testing time of the memory device and achieves the purpose of improving the testing efficiency.

需要說明的是，在本文中，術語「包括」、「包含」或者其任何其他變體意在涵蓋非排他性的包含，從而使得包括一系列要素的過程、方法、物品或者裝置不僅包括那些要素，而且還包括沒有明確列出的其他要素，或者是還包括為這種過程、方法、物品或者裝置所固有的要素。在沒有更多限制的情況下，由語句「包括一個……」限定的要素，並不排除在包括該要素的過程、方法、物品或者裝置中還存在另外的相同要素。It should be noted that, in this article, the term "includes", "comprising" or any other variant thereof is intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of more restrictions, an element defined by the phrase "comprising a ..." does not exclude the existence of other identical elements in the process, method, article or device including the element.

上面結合圖式對本申請的實施例進行了描述，但是本申請並不局限於上述的具體實施方式，上述的具體實施方式僅僅是示意性的，而不是限制性的，本發明所屬技術領域之通常知識者在本申請的啓示下，在不脫離本申請宗旨和申請專利範圍所保護的範圍情況下，還可做出很多形式，均屬本申請的保護之內。The embodiments of the present application are described above in conjunction with the drawings, but the present application is not limited to the above-mentioned specific embodiments. The above-mentioned specific embodiments are merely illustrative and not restrictive. Under the inspiration of the present application, a person skilled in the art to which the present invention belongs can make many forms without departing from the purpose of the present application and the scope of protection of the patent application, all of which are within the protection of the present application.

100:立體晶圓產品100: 3D wafer products

102:連接墊102:Connection pad

104:連接墊104:Connection pad

110:記憶體晶體層110:Memory crystal layer

112:記憶體裝置112: Memory device

120:邏輯電路層120: Logic Circuit Layer

122:邏輯電路122:Logic Circuit

130:基底130: Base

200:計算機系統200:Computer System

202:韌體202: Firmware

204:運算模組204: Computational Module

2041:子運算模組2041: Sub-computation module

206:判斷模組206: Judgment module

210:記憶體裝置210: Memory device

212:記憶體子區塊212: Memory sub-block

S100、S200、S210~S260:步驟S100, S200, S210~S260: Steps

#R:編號#R:Number

#N:初始值#N: Initial value

#H:雜湊值#H: hash value

#T:臨界值#T: critical value

#A:已知解#A: Known solution

此處所說明的圖式用來提供對本申請的進一步理解，構成本申請的一部分，本申請的示意性實施例及其說明用於解釋本申請，並不構成對本申請的不當限定。在圖式中：圖1是根據本申請實施例的立體晶圓產品架構示意圖。圖2是根據本申請實施例的計算機系統架構示意圖。圖3是根據本申請實施例的運算模組架構示意圖。圖4是根據本申請實施例的記憶體測試方法流程圖。圖5是根據本申請實施例的記憶體子區塊測試方法流程圖。 The figures described here are used to provide a further understanding of the present application and constitute a part of the present application. The schematic embodiments of the present application and their description are used to explain the present application and do not constitute an improper limitation on the present application. In the figures: Figure 1 is a schematic diagram of a three-dimensional wafer product architecture according to an embodiment of the present application. Figure 2 is a schematic diagram of a computer system architecture according to an embodiment of the present application. Figure 3 is a schematic diagram of a computing module architecture according to an embodiment of the present application. Figure 4 is a flow chart of a memory test method according to an embodiment of the present application. Figure 5 is a flow chart of a memory sub-block test method according to an embodiment of the present application.

200:計算機系統 200:Computer system

110:記憶體晶體層 110: Memory crystal layer

120:邏輯電路層 120: Logic circuit layer

130:基底 130: Base

202:韌體 202: Firmware

204:運算模組 204: Computation module

206:判斷模組 206: Judgment module

210:記憶體裝置 210: Memory device

212:記憶體子區塊 212: Memory sub-block

#R:編號 #R:Number

#N:初始值 #N: Initial value

#H:雜湊值 #H: hash value

#T:臨界值 #T: critical value

#A:已知解 #A: known solution

Claims

A memory testing method is applied in a computer system; wherein the computer system comprises at least one memory device, a firmware, a computing module connected to the firmware and the at least one memory device, and a judgment module connected to the computing module; wherein the memory testing method comprises: dividing the at least one memory device into a plurality of memory sub-blocks; establishing the same data table in at least two of the plurality of memory sub-blocks; and making the at least two memory sub-blocks simultaneously perform a testing procedure to judge the availability of the at least two memory sub-blocks, wherein the computing module A plurality of sub-operation modules are defined, each of which performs the test procedure on a memory sub-block; the test procedure includes: providing a plurality of different initial values and a known solution corresponding to each initial value; the operation module performs multiple proof-of-work operations on a memory sub-block, and each proof-of-work operation reads and writes the data table multiple times according to one of the initial values to generate multiple operation results corresponding to each initial value; and the judgment module obtains the operation results from the operation module and compares them with the corresponding known solutions to calculate the error rate of the memory sub-block.

The memory testing method as described in claim 1 further comprises: the judgment module compares the error rate of the memory sub-block with a critical value to judge the availability of the memory sub-block.

The memory testing method as described in claim 2 further comprises: the judgment module records the number of the memory sub-block in the firmware according to the judgment result of the availability.

A memory testing method as described in claim 1, wherein: the proof-of-work operation is a block solving algorithm based on a hash function; and the data table is a directed acyclic graph.

A memory testing method as described in claim 4, wherein: the size of the directed acyclic graph is consistent with the size of the memory sub-block; and the computer system is based on a wafer stacking architecture, comprising a memory crystal layer, a logic circuit layer and at least one substrate stacked into a three-dimensional structure.

A computer system based on a wafer stacking architecture comprises: at least one memory device; a firmware; a computing module connected to the firmware and the at least one memory device to test the at least one memory device; and a judgment module connected to the computing module; wherein: when the computing module divides the at least one memory device into a plurality of memory sub-blocks, when the computing module tests at least two of the plurality of memory sub-blocks, the same data table is established in the at least two memory sub-blocks, and the at least two memory sub-blocks are tested simultaneously. A proof-of-work operation is performed using multiple initial values in sequence, so that the data table in the at least two memory sub-blocks is read and written multiple times to generate multiple operation results corresponding to the initial values, wherein each initial value corresponds to a known solution; and the judgment module obtains the operation results from the operation module and compares them with the corresponding known solutions to calculate the error rate of the at least two memory sub-blocks; wherein the operation module defines multiple sub-operation modules, each of which performs the proof-of-work operation on a memory sub-block using the multiple initial values in sequence.

A computer system based on a wafer stacking architecture as described in claim 6, wherein the judgment module compares the error rate of the at least two memory sub-blocks with a critical value to judge the availability of the memory sub-blocks.

A computer system based on a wafer stacking architecture as described in claim 6, wherein: the proof-of-work operation is a block solving algorithm based on a hash function; and the data table is a directed acyclic graph.

The computer system based on the wafer stacking architecture as described in claim 6 further comprises a memory crystal layer, a logic circuit layer and at least one substrate stacked into a three-dimensional structure; wherein: the logic circuit layer is connected to the at least one memory device and the substrate through a plurality of connection pads; the firmware, the computing module, and the judgment module are configured in the logic circuit layer; and the at least one memory device is configured in the memory crystal layer.