JPH0773162A

JPH0773162A - Performance monitor of information processor

Info

Publication number: JPH0773162A
Application number: JP5218480A
Authority: JP
Inventors: Koji Kinoshita; 耕二木下; Hiroyuki Kasai; 洋行河西
Original assignee: NEC Corp; NEC Computertechno Ltd
Current assignee: NEC Corp; NEC Computertechno Ltd
Priority date: 1993-09-02
Filing date: 1993-09-02
Publication date: 1995-03-17

Abstract

PURPOSE:To facilitate the tuning up of a program by quantitatively grasping a memory access load which owes much to the intuition of a program preparing person in a super computer, etc. CONSTITUTION:Memory units 3-0 to 3-3 can simultaneously be accessed through a memory control part 2 from CPU 1. The memory access control part 2 decodes memory access requirement from CPU 1 so as to judge to which one of the memory units 3-0 to 3-3 a request to be sent in accordance with the kind and the address of the request to be accessed. Counters 6-0 to 6-3 counting access to the memory units 3-0 to 3-3 and a counter 5 counting a signal obtained by ORing request signals to the memory units 3-0 to 3-3 are provided and all these counters can be referred to from CPU 1.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は情報処理装置の性能評
価，性能測定のための性能モニタに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a performance monitor for performance evaluation and performance measurement of an information processing device.

【０００２】[0002]

【従来の技術】近年、配列計算を主たる処理対象とす
る、いわゆるベクトル計算機が多く使われ始めている。
ベクトル計算機を有効に使いこなすためには、その特徴
を熟知し、場合によってはベクトル計算機が最大性能を
発揮できるようにプログラムを書き換える必要がある。
その熟知すべき特徴の一つに、プログラムのメモリに対
する負荷が挙げられる。2. Description of the Related Art In recent years, so-called vector computers, which mainly deal with array calculations, have begun to be used.
In order to effectively use the vector computer, it is necessary to be familiar with its characteristics and rewrite the program so that the vector computer can maximize its performance.
One of the characteristics to be familiar with is the load on the program memory.

【０００３】特に、最近の傾向として高速化のために、
メモリを共有したマルチプロセッサ構成をとることが多
いが、このようなメモリ共有型のマルチプロセッサ構成
のシステムでは、メモリ競合により深刻な処理性能定価
を来すことがあることが知られている。この性能定価を
回避するには、十分なメモリスループットを確保するこ
とも一つの方法ではあるが、そのためには開発，製造共
に莫大な費用がかかり、価格性能比で必ずしも優位に立
てるとは限らない。Particularly, as a recent tendency, for speeding up,
In many cases, a multiprocessor configuration in which a memory is shared is adopted, but it is known that in such a system of a memory sharing type multiprocessor configuration, a serious processing performance fixed price may be caused due to memory competition. One way to avoid this performance fixed price is to secure sufficient memory throughput, but this requires enormous cost in both development and manufacturing, and it does not always lead to a superior price-performance ratio. .

【０００４】したがって、プログラムをメモリ競合が生
じにくいように変更することにより性能定価を避ける方
が実現する上有利となる。しかしながら、そのために
は、プログラムのどの部分でメモリ競合が生じ易いかを
プログラム作成者に知らしめる必要があるが、従来、こ
のような情報を知るしめる手段がなかった。Therefore, it is advantageous in terms of realization to avoid the fixed price of the performance by changing the program so that the memory competition does not easily occur. However, in order to do so, it is necessary to let the program creator know in which part of the program the memory contention is likely to occur, but heretofore, there has been no means for knowing such information.

【０００５】[0005]

【発明が解決しようとする課題】上述したように、従来
はプログラムのどの部分でメモリ競合が生じ易い情報を
プログラム作成者に知らしめる手段がなく、プログラム
作成者の勘に頼っていることが多いため、対応の費用で
処理性能の定価を回避することができないという問題点
がある。As described above, conventionally, there is no means for notifying the program creator of information in which part of the program the memory contention is likely to occur, and the program creator often relies on the intuition of the program creator. Therefore, there is a problem that the fixed price of the processing performance cannot be avoided with the corresponding cost.

【０００６】[0006]

【課題を解決するための手段】本発明の性能モニタは、
同時に独立して動作可能な複数のメモリユニットから成
るメモリ装置と、１台以上の演算処理装置を具備する情
報処理装置の性能モニタであって、前記演算処理装置か
らの前記メモリ装置へのアクセス要求を、該当する前記
メモリユニットに送出するように制御するメモリアクセ
ス制御手段と、いずれかの前記メモリユニットに要求信
号が送出されると歩進する前記演算処理装置対応の計数
手段とを含むことを特徴とする。The performance monitor of the present invention comprises:
A performance monitor of an information processing apparatus including a memory device including a plurality of memory units that can operate independently at the same time, and an information processing device including one or more arithmetic processing devices, the access request to the memory device from the arithmetic processing device. To a corresponding memory unit, and a counting unit corresponding to the arithmetic processing unit that advances when a request signal is sent to any of the memory units. Characterize.

【０００７】[0007]

【実施例】本発明の第１の実施例を示す図１を参照する
と、本実施例はＣＰＵ１と、メモリアクセス制御部２
と、４つのメモリユニット３−０，３−１，３−２およ
び３−３と、オア回路４と、５つのカウンタ５，６−
０，６−１，６−２および６−３とから構成されてい
る。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Referring to FIG. 1 showing a first embodiment of the present invention, the present embodiment has a CPU 1 and a memory access control unit 2.
, Four memory units 3-0, 3-1, 3-2 and 3-3, an OR circuit 4, and five counters 5, 6-
It is composed of 0, 6-1, 6-2 and 6-3.

【０００８】ＣＰＵ１は命令を解釈し実行する演算処理
装置であり、メモリを参照する命令を実行する場合およ
び命令をメモリからフェッチする場合に結線１０１を介
してメモリアクセス制御部２にメモリアクセス要求を送
出する。The CPU 1 is an arithmetic processing unit that interprets and executes instructions, and issues a memory access request to the memory access control unit 2 via a connection 101 when executing an instruction that refers to a memory and when fetching an instruction from the memory. Send out.

【０００９】メモリアクセス制御部２はＣＰＵ１からメ
モリアクセス要求を受取ると、ＣＰＵ１からの要求アド
レスに対応したメモリユニット３−０〜３−３にそれぞ
れ結線１０２−０〜１０２−３を介して要求信号を送出
する。When the memory access control unit 2 receives a memory access request from the CPU 1, the memory access control unit 2 requests the memory units 3-0 to 3-3 corresponding to the request address from the CPU 1 via connection lines 102-0 to 102-3, respectively. Is sent.

【００１０】ＣＰＵ１からのメモリアクセス要求にはス
カラデータのアクセス要求とベクトルデータのアクセス
要求とがある。スカラデータのアクセス要求の場合に
は、ＣＰＵ１から送られるてくるアドレスをそのまま用
いて、アクセスするメモリユニットを選択し、該当する
メモリユニットに要求信号およびアドレスを送出する。Memory access requests from the CPU 1 include scalar data access requests and vector data access requests. In the case of a scalar data access request, the address sent from the CPU 1 is used as it is, the memory unit to be accessed is selected, and the request signal and address are sent to the corresponding memory unit.

【００１１】一方、ベクトルデータのアクセス要求の場
合には、ＣＰＵ１から供給される先頭アドレスと要素間
間隔をもとにしてメモリアクセス制御部２でベクトルデ
ータを構成している各要素のアドレスを生成し、生成さ
れたアドレスに対応するメモリユニットに対して要求信
号およびアドレスを送出する。もっとも、スカラデータ
の場合とベクトルデータの場合とを問わず、メモリに対
する書込みの場合には書込みデータも送出する。要求信
号は要素毎に送出され、要素間間隔により最大４要素同
時に送出される。On the other hand, in the case of a vector data access request, the memory access control unit 2 generates the address of each element forming the vector data based on the start address and the inter-element spacing supplied from the CPU 1. Then, the request signal and the address are sent to the memory unit corresponding to the generated address. However, regardless of whether it is scalar data or vector data, the write data is also sent when writing to the memory. The request signal is sent for each element, and up to four elements are sent at the same time depending on the inter-element spacing.

【００１２】メモリユニット３−０〜３−３は、メモリ
アクセス制御部２からそれぞれ１０２−０〜１０２−３
を介して送られてくる指示に基いて動作する。ＣＰＵ１
からみた番地付けは図３に示されるようにされており、
メモリアクセス制御装置２は、この番地付けを前提とし
てアクセスするメモリユニットを決定している。また、
図３から明らかなように、メモリ上連続したアドレスに
配置されたベクトルデータがアクセスされる場合は同時
に４要素アクセスできる。The memory units 3-0 to 3-3 are connected to the memory access control unit 2 by 102-0 to 102-3, respectively.
It operates based on the instructions sent via. CPU1
The addressing as seen is as shown in Fig. 3,
The memory access control device 2 determines the memory unit to be accessed on the premise of this addressing. Also,
As is clear from FIG. 3, when vector data arranged at consecutive addresses on the memory are accessed, four elements can be simultaneously accessed.

【００１３】結線１０２−１〜１０２−３を介してメモ
リユニット３−０〜３−３に送出される要求信号は、ま
たオア回路４にも供給され、オア回路４から結線１０３
を介してカウンタ５に入力し結線１０３が論理‘１’に
なるとカウンタ５は歩進される。すなわち、結線１０２
−０〜１０２−３のいずれかを介してメモリユニット３
−０〜３−３に要求信号が送出されたタイミングでカウ
ンタ５は歩進される。カウンタ５の値は結線１０４を介
してＣＰＵ１に供給され、ＣＰＵ１がカウンタ５の値を
参照することができる。The request signal sent to the memory units 3-0 to 3-3 via the wirings 102-1 to 102-3 is also supplied to the OR circuit 4, and the OR circuit 4 connects to the wiring 103.
Is input to the counter 5 and the connection 103 becomes logic "1", the counter 5 is incremented. That is, the connection 102
Memory unit 3 via any one of −0 to 102-3
The counter 5 is incremented at the timing when the request signal is sent to -0 to 3-3. The value of the counter 5 is supplied to the CPU 1 via the connection 104, and the CPU 1 can refer to the value of the counter 5.

【００１４】結線１０２−０〜１０２−４はまた、それ
ぞれカウンタ６−０〜６−３にも供給され、カウンタ６
−０〜６−３はそれぞれ結線１０２−０〜１０２−３が
論理‘１’になった時歩進される。カウンタ６−０〜６
−３はそれぞれ結線１０５−０〜１０５−３を介してＣ
ＰＵ１に供給され、ＣＰＵ１がカウンタ６−０〜６−３
の値を参照することができる。The connections 102-0 to 102-4 are also supplied to the counters 6-0 to 6-3, respectively.
-0 to 6-3 are stepped when the connections 102-0 to 102-3 become logic "1". Counter 6-0 to 6
-3 is C via connection 105-0 to 105-3, respectively
It is supplied to PU1, and CPU1 causes counters 6-0 to 6-3.
You can refer to the value of.

【００１５】ＣＰＵ１は図示されない毎クロックサイク
ルに歩進されるタイマを内蔵しており、このタイマによ
って規定される一定時間でのカウンタ５および６−０〜
６−３の値を参照することにより、そのプログラムによ
るメモリ負荷を算定することができる。プログラム作成
者は、こうして得られたメモリ負荷を基にしてメモリ負
荷を減らすようにプログラムを検討することができる。The CPU 1 has a timer (not shown) that is incremented in every clock cycle, and the counters 5 and 6-0 at fixed time intervals defined by this timer.
The memory load of the program can be calculated by referring to the value of 6-3. The program creator can consider the program so as to reduce the memory load based on the memory load thus obtained.

【００１６】次に、図２は本発明の第２の実施例を示す
ブロック図である。本実施例は、２つのＣＰＵ１１−０
および１１−１と、メモリアクセス制御部１２と、４つ
のメモリユニット１３−０，１３−１，１３−２および
１３−３と、オア回路１４と、２つのアンド回路１５−
０および１５−１と、２つのカウンタ１６−０および１
６−１とで構成される。Next, FIG. 2 is a block diagram showing a second embodiment of the present invention. This embodiment has two CPUs 11-0.
And 11-1, a memory access control unit 12, four memory units 13-0, 13-1, 13-2 and 13-3, an OR circuit 14, and two AND circuits 15-.
0 and 15-1 and two counters 16-0 and 1
6-1 and 6-1.

【００１７】ＣＰＵ１１−０および１１−１は第１の実
施例におけるＣＰＵ１と同等の機能を有する演算処理装
置であり、それぞれ結線２０１−０および２０１−１を
介してメモリアクセス制御部１２にメモリアクセス要求
を送出する。The CPUs 11-0 and 11-1 are arithmetic processing units having the same functions as the CPU 1 in the first embodiment, and access the memory access controller 12 via the connections 201-0 and 201-1 respectively. Send the request.

【００１８】メモリアクセス制御部２は、ＣＰＵ１１−
０およひＣＰＵ１１−１から送られてきたメモリアクセ
ス要求を調停して結線２０２−０〜２０２−３を介して
メモリユニット１３−０〜１３−３にそれぞれ要求信号
を送出する。メモリユニット１３−０〜１３−３は、第
１の実施例におけるメモリユニット３−０〜３−３と同
様に図３に示されるような番地付けがなされており、メ
モリアスケス制御部１２は、この番地付けがＣＰＵ１１
−０と１１−１に共通であるとの前提としてアクセスす
るメモリユニットを決定する。The memory access control unit 2 includes a CPU 11-
0 and the memory access request sent from the CPU 11-1 are arbitrated and a request signal is sent to each of the memory units 13-0 to 13-3 via the connections 202-0 to 202-3. The memory units 13-0 to 13-3 are assigned addresses as shown in FIG. 3 similarly to the memory units 3-0 to 3-3 in the first embodiment. Address is CPU11
The memory unit to be accessed is determined on the assumption that the memory unit is common to 0 and 11-1.

【００１９】メモリアクセス制御部１２は、ＣＰＵ１１
−０または１からのリクエストのいずれか一方を処理
し、同時にはメモリユニット１３−０〜３に要求信号を
送出しない。要求信号送出元ＣＰＵ番号は結線２０３−
０〜１を介して出力されそれぞれアンド回路１５−０お
よび１５−１に供給される。ＣＰＵ１１−０のアクセス
要求に対する要求信号が送出された時は２０３−０が論
理‘１’に、ＣＰＵ１１−１のアクセス要求に対する要
求信号が送出された時は２０３−１が論理‘１’にな
り、それぞれアンド回路１５−０，１５−１を活性化す
る。The memory access control unit 12 has a CPU 11
Either one of the requests from 0 or 1 is processed, and no request signal is sent to the memory units 13-0 to 13-3 at the same time. The request signal transmission source CPU number is the connection 203-
It is output via 0 to 1 and supplied to AND circuits 15-0 and 15-1, respectively. When the request signal for the access request of the CPU 11-0 is sent, 203-0 becomes the logic "1", and when the request signal for the access request of the CPU 11-1 is sent, the logic 203-1 becomes the logic "1". , AND circuits 15-0 and 15-1 are activated, respectively.

【００２０】結線２０２−０〜２０２−３を介してメモ
リユニット１３−０〜１３−３に供給される要求信号
は、第１の実施例と同様にしてオア回路１４にも供給さ
れ、４つの信号の論理和がとられて結線２０４を介して
アンド回路１５−０および１５−１に供給される。アン
ド回路１５−０および１５−１はメモリユニットへの要
求信号の要求元ＣＰＵ番号を示す２０３−０および２０
３−１と、メモリユニット１３−０〜１３−３のいずれ
かにアクセス要求があったことを示す２０４の論理積を
とり、それぞれカウンタ１６−０および１６−１にそれ
ぞれ結線２０５−０および２０５−１を介して供給され
る。カウンタ１６−０および１６−１はそれぞれ２０５
−０および２０５−１が論理‘１’になると歩進される
カウンタで、それぞれＣＰＵ１１−０およびＣＰＵ１１
−１によるメモリアクセス要求によりメモリユニット１
３−０〜１３−３に対して要求信号を送出した回数を示
している。The request signal supplied to the memory units 13-0 to 13-3 via the connections 202-0 to 202-3 is also supplied to the OR circuit 14 in the same manner as in the first embodiment, and four request signals are supplied. The signals are ORed and supplied to the AND circuits 15-0 and 15-1 via the connection 204. AND circuits 15-0 and 15-1 indicate the request source CPU numbers of the request signals to the memory units 203-0 and 20-3.
3-1 and the logical product of 204 indicating that there is an access request to any of the memory units 13-0 to 13-3 are taken and connected to the counters 16-0 and 16-1, respectively, by connecting wires 205-0 and 205. -1 is supplied. Counters 16-0 and 16-1 are each 205
-0 and 205-1 are counters that are incremented when they become logic "1".
-1 by the memory access request by the memory unit 1
It shows the number of times the request signal is sent to 3-0 to 13-3.

【００２１】カウンタ１６−０および１６−１はそれぞ
れ結線２０６−０および２０６−１を介してＣＰＵ１１
−０および１１−１に供給され、ＣＰＵ１１−０および
１１−１から１６−０および１６−１の値を参照するこ
とができる。ＣＰＵ１１−０および１１−１は、図示さ
れない内蔵タイマと、カウンタ１６−０および１６−１
からそれぞれが実行しているプログラムのメモリ負荷を
算定することができる。The counters 16-0 and 16-1 are connected to the CPU 11 via connections 206-0 and 206-1, respectively.
The values of the CPUs 11-0 and 11-1 to 16-0 and 16-1 can be referred to. The CPUs 11-0 and 11-1 have built-in timers (not shown) and counters 16-0 and 16-1.
It is possible to calculate the memory load of the programs executed by each.

【００２２】以上述べた２つの実施例は本発明の望まし
い例であり、本発明がこれらの実施例に限定されるもの
ではないことはいうまでもない。Needless to say, the two embodiments described above are preferable examples of the present invention, and the present invention is not limited to these embodiments.

【００２３】[0023]

【発明の効果】以上説明したように、本発明はメモリに
対して、アクセス要求を送出する毎に計数する計数手段
を有することにより、プログラムのメモリ負荷状態を知
らしめることができ、プログラム修正に供することがで
き、結果としてベクトル計算機の性能を引出すことが可
能になる。As described above, according to the present invention, the memory load state of the program can be notified by having the counting means for counting each time an access request is sent to the memory, so that the program can be corrected. It is possible to obtain the performance of the vector computer as a result.

[Brief description of drawings]

【図１】本発明の第１の実施例のブロック図である。FIG. 1 is a block diagram of a first embodiment of the present invention.

【図２】本発明の第２の実施例のブロック図である。FIG. 2 is a block diagram of a second embodiment of the present invention.

【図３】本発明におけるメモリの番地付けの一例を示す
図である。FIG. 3 is a diagram showing an example of addressing of a memory according to the present invention.

[Explanation of symbols]

１，１１−０，１１−１ＣＰＵ２，１２メモリアクセス制御部３−１〜３−３，１３−０〜１３−３メモリユニッ
ト４，１４オア回路５，６−０〜６−３，１６−０〜１６−１カウンタ１５アンド回路。1, 11-0, 11-1 CPU 2, 12 Memory access control unit 3-1 to 3-3, 13-0 to 13-3 Memory unit 4, 14 OR circuit 5, 6-0 to 6-3, 16 -0 to 16-1 Counter 15 AND circuit.

Claims

[Claims]

1. A performance monitor for an information processing apparatus comprising a memory device comprising a plurality of memory units that can operate independently at the same time, and one or more arithmetic processing devices, wherein the memory from the arithmetic processing devices is a memory device. Memory access control means for controlling an access request to the device to be sent to the corresponding memory unit, and counting means corresponding to the arithmetic processing device, which advances when a request signal is sent to one of the memory units. A performance monitor of an information processing apparatus, comprising:

2. The performance monitor for an information processing apparatus according to claim 1, further comprising: counting means corresponding to the memory unit, which advances when a request signal is sent to the memory unit.