JPH02238558A

JPH02238558A - Boot system for parallel computer

Info

Publication number: JPH02238558A
Application number: JP6006089A
Authority: JP
Inventors: Toshiyuki Shimizu; 俊幸清水; Hiroaki Ishihata; 石畑　宏明
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1989-03-13
Filing date: 1989-03-13
Publication date: 1990-09-20
Anticipated expiration: 2013-05-13
Also published as: JP2749105B2

Abstract

PURPOSE:To increase the number of processor elements by executing successively the boot programs received from a communication port via a CPU at the side of each processor element after a host computer detects that all processor elements has accesses to the communication port. CONSTITUTION:In a reset state an address decoder 4c decodes the address data received from a CPU 4a and has an access to a communication port 4b. A host computer 1 sends a boot program to each processor element PE 4 when the computer 1 detects that all processors PE 4 has accesses to the port 4b. Then each PE 4 carries out successively the boot programs received from the port 4b via the CPU 4a. Thus the computer 1 gives a program to each PE 4 in an initial program load IPL state. As a result, each PE requires no ROM storing an IPL program and the number of elements PE is increased.

Description

【発明の詳細な説明】［概要］１個のホスト計算機と複数個のプロセッサエレメン１・
とがバスを介して接続された分散メモリ型並列計算機の
ブート方式に関し、分散メモリ型並列計算機のプロセッサエレメントの数を
増やせるという利点を十分に生かすことができるように
することを目的とし、各プロセッサエレメント内に、ＣＰＵと、バスと接続さ
れた通信ポートと、プロセッザエレメントのリセット時
にＣＰＵから出力されるアドレスをデコードして通信ポ
ートをアクセスするアドレスデコーダとを具備し、全て
のプロセッサエレメントが通信ポートをアクセスしたこ
とをホスト計算機側で検知したら、ホスト計算機から各
プロセッサエレメントに対してブートプログラムを送出
し、各プロセッサエレメント側では、通信ポートから人
力されるブーｌ・プログラムをＣＰＵにより順次実行す
るように構成する。[Detailed Description of the Invention] [Summary] One host computer and a plurality of processor elements 1.
The purpose of the present invention is to provide a boot method for distributed memory parallel computers that are connected via a bus, so that each processor can take full advantage of the advantage of increasing the number of processor elements in a distributed memory parallel computer The element is equipped with a CPU, a communication port connected to the bus, and an address decoder that decodes the address output from the CPU when the processor element is reset and accesses the communication port, so that all processor elements communicate. When the host computer detects that the port has been accessed, the host computer sends a boot program to each processor element, and each processor element sequentially executes the Boolean program manually input from the communication port by the CPU. Configure it as follows.

［産業上の利用分野コ本発明は１個のホスト計算機と複数個のプロセッサエレ
メントとがバスを介して接続された分散メモリ型並列計
算機のブー１・方式に関する。[Industrial Field of Application] The present invention relates to a distributed memory parallel computer system in which one host computer and a plurality of processor elements are connected via a bus.

近年、コンピュータシステムの高速化が要求されている
。高速化の一つの実現法として、並列５１算機が用いら
れる。ここで、並列計算機とは、プログラムを実行する
計算要素（プロセッサエレメント；Ｐｒｏｃｅｓｓｏｒ
　　Ｅｌｅｍｅｎｔ，以下略してＰＥと記す）を複数個
結合して一つの計算機を構成したものである。この種の
並列計算機には、大きく分けて２つの実現方法が考えら
れる。In recent years, there has been a demand for faster computer systems. A parallel 51 computer is used as one method for achieving higher speeds. Here, a parallel computer is a computational element (processor element) that executes a program.
A single computer is constructed by combining a plurality of Elements (hereinafter abbreviated as PE). Broadly speaking, there are two possible implementation methods for this type of parallel computer.

一つは複数のＰＥで大きなメモリを共有する共有メモリ
型並列計算機であり、もう一つはＰＥ毎に独立したメモ
リをもつ分散メモリ型並列計算機である。後者の分散メ
モリ型並列計算機は、ＰＥの数を大きくすることが可能
であるという特徴をもっテイル。ところが、一つのＰＥ
のハード量が大きくなるとこの特徴を生かすことができ
なくなる。One is a shared memory type parallel computer in which a large memory is shared by multiple PEs, and the other is a distributed memory type parallel computer in which each PE has an independent memory. The latter distributed memory type parallel computer has the characteristic that it is possible to increase the number of PEs. However, one PE
If the amount of hardware increases, this feature cannot be utilized.

このため、ＰＥのハートウエア量は小さく抑える必要が
ある。また、ＰＥＯ数が増加するに伴い、効率的なブー
ト方法が要求されている。Therefore, it is necessary to keep the amount of hardware in the PE small. Additionally, as the number of PEOs increases, efficient boot methods are required.

［従来の技術］第７図は、従来の分散メモリ型並列計算機の構成ブロッ
ク図である。１個のホス１・５１算機１と複数個のＰＥ
２とがバス３を介して接続されている。[Prior Art] FIG. 7 is a block diagram of a conventional distributed memory parallel computer. One host 1/51 calculator 1 and multiple PEs
2 are connected via a bus 3.

第８図は各ＰＥの内部構成例（従来）を示す図である。FIG. 8 is a diagram showing an example (conventional) internal configuration of each PE.

ＰＥは、図に示すようにバス３と接続された通信ポート
２ａ，ＲＡＭ２ｂ，ＣＰＵ２ｃ，ＲＯＭ２ｄ及びこれら
を接続する内部バス２ｅより構成されている。ＲＯＭ２
ｄ内にはブートアップ（ＩＰＬ．イニシャル・プログラ
ムロ一ド）用のプログラムが格納されている。As shown in the figure, the PE is composed of a communication port 2a connected to a bus 3, a RAM 2b, a CPU 2c, a ROM 2d, and an internal bus 2e connecting these. ROM2
A boot-up (IPL. initial program load) program is stored in d.

このように構成された分散メモリ型並列計算機のブート
アップ（ＩＰＬ）は、ＰＥ内に用意されたＲＯＭ２ｄに
格納されたプログラムによって行われる。第９図は、従
来のブートアップの手順を示すフローチャートである。Boot-up (IPL) of the distributed memory type parallel computer configured in this way is performed by a program stored in the ROM 2d prepared in the PE. FIG. 9 is a flowchart showing a conventional boot-up procedure.

先ず、ユーザがホス１・計算機１を初期化する（Ｓ１）
。その後、ホス１・計算機１はＩＰＬを開始する（Ｓ２
）。次に、ホストｎＩ算機］はＩＰＬの一つの手順とし
てＰＥ２を初期化する（Ｓ３）。First, the user initializes the host 1 and computer 1 (S1)
. After that, the host 1 and computer 1 start IPL (S2
). Next, the host nI computer initializes PE2 as one procedure of IPL (S3).

各ＰＥ２はＲＯＭ２ｄ内に格納されているＩＰＬプログ
ラムの実行を開始する（Ｓ４）。次に、ホスト計算機１
はＰＥ２に対してＯＳ等をバス３を介して送信し、各Ｐ
Ｅ２はＩＰＬの実行の過程で、ＯＳ等をホスト計算機１
からバス３を介して受信する（Ｓ５）。そして、各ＰＥ
２はＩＰＬを終了し、動作を開始し、ホスト計算機１は
ＩＰＬを終了し、動作を開始する（Ｓ６）。ここで、動
作とは本来の並列処理動作をいう。Each PE2 starts executing the IPL program stored in the ROM2d (S4). Next, host computer 1
transmits the OS etc. to PE2 via bus 3, and
E2 installs the OS, etc. on the host computer 1 during the IPL execution process.
from the bus 3 (S5). And each PE
2 ends IPL and starts operation, and host computer 1 ends IPL and starts operation (S6). Here, the operation refers to the original parallel processing operation.

［発明が解決しようとする課題］従来の方式では、各ＰＥ毎にブートプログラムを書込ん
だＲＯＭを用意し、そのプログラムによりブートアップ
（ＩＰＬ）を行っていた。しかしながら、この構成をと
るとＲＯＭの周辺等を含め、ある程度のハードウエアが
必要となる。また、ＰＥの数だけＲＯＭ等を用意せねば
ならず、システム作製時のコス１・アップ及び動作時の
信頼性の低下にもつながる可能性がある。以上により、
従来システムではＰＥの台数を増やせるという利点を十
分に生かすことができなかった。[Problems to be Solved by the Invention] In the conventional system, a ROM in which a boot program is written is prepared for each PE, and boot-up (IPL) is performed using the program. However, this configuration requires a certain amount of hardware including the ROM and the like. Furthermore, it is necessary to prepare as many ROMs as there are PEs, which may lead to an increase in cost when manufacturing the system and a decrease in reliability during operation. Due to the above,
In the conventional system, it was not possible to take full advantage of the advantage of increasing the number of PEs.

本発明はこのような課題に鑑みてなされたものであって
、分散メモリ型並列計算機のＰＥの数を増やせるという
利点を十分に生かすことができる並列計算機のブート方
式を提供することを１」的としている。The present invention has been made in view of the above-mentioned problems, and an object thereof is to provide a boot method for a parallel computer that can fully utilize the advantage of increasing the number of PEs in a distributed memory parallel computer. It is said that

［課題を解決するための手段］第１。図は本発明方式の原理ブロック図である。[Means to solve the problem] First. The figure is a block diagram of the principle of the system of the present invention.

第７図と同一のものは、同一の符号を付して示す。Components that are the same as those in FIG. 7 are designated by the same reference numerals.

図において、］はホス１・計算機、３はバス、４はバス
に接続された複数個のＰＥである。各ＰＥＪ内には、Ｃ
ＰＵ４ａと、バス３と接続された通信ポート４ｂと、Ｐ
Ｅのリセット時にＣＰＵ４ａから出力されるアドレスを
デコードして通信ポー１・４ｂをアクセスするアドレス
デコーダ４ｃと、ＲＡＭ４ｄより構成されている。図で
は１つのＰＥについてその内部構成を示しているが、他
のＰＥについても同様である。In the figure, ] is a host 1/computer, 3 is a bus, and 4 is a plurality of PEs connected to the bus. Within each PEJ, C
PU4a, communication port 4b connected to bus 3, and P
The address decoder 4c decodes the address output from the CPU 4a at the time of resetting the E, and accesses the communication ports 1 and 4b, and a RAM 4d. Although the figure shows the internal configuration of one PE, the same applies to other PEs.

［作用］リセット時には、アドレスデコーダ４ｃはｃＰＵ４ａか
ら出力されるアドレスデータをデコードして通信ポー１
・４ｂをアクセスするようにする。[Function] At the time of reset, the address decoder 4c decodes the address data output from the cPU 4a and opens the communication port 1.
・Enable access to 4b.

そして、全てのＰＥ４が通信ポート４ｂにアクセスした
ことをホスト計算機１側で検知したら、ホス１・計算機
１。から各ＰＥ４に対してブー１・プログラムを送出し
、各ＰＥ４側では、通信ボー１・４ｂから入力されるブ
ー１・プログラムをＣＰＵ４ａにより順次実行するよう
にする。このような構成とすることにより、ＩＰＬ時の
プログラムは各ＰＥ４に対してホス１・計算機１がら提
供されるので、各ＰＥ４内にＩＰＬプログラムを格納し
たＲＯＭが不要となる。従って、木発明方式によれば分
散メモリ型並列計算機のＰＥＯ数を増やせるという利点
を十分に生かすことができる。Then, when the host computer 1 detects that all PEs 4 have accessed the communication port 4b, the host 1/computer 1. The Boo 1 program is sent to each PE 4 from the CPU 4a, and on each PE 4 side, the Boo 1 program input from the communication boards 1 and 4b is sequentially executed by the CPU 4a. With this configuration, the program during IPL is provided to each PE4 from the host 1 and computer 1, so a ROM storing the IPL program is not required in each PE4. Therefore, according to the tree invention method, the advantage of being able to increase the number of PEOs in a distributed memory parallel computer can be fully utilized.

［実施例］以下、図面を参照して本発明の実施例を従来例と対比し
つつ詳細に説明する。[Embodiments] Hereinafter, embodiments of the present invention will be described in detail in comparison with conventional examples with reference to the drawings.

本発明はＰＥのＣＰＵから見えるアドレス空間のデコー
ドの方法を工夫することにより、ＲＯＭを必要としない
方式としたものである。第２図はアドレス空間を示す図
であり、（イ）は従来のアドレス空間を、（口）は本発
明によるアトルス空間をそれぞれ示している。ここでは
、次の仮定をしている。先ずＰＥのＣＰＵはポスＩ・計
算機がら初期化（リセット）されると、アトルスｏｏｏ
ｏ（＃は１６進を示す）から命令を取出し、実行を開始
する。バスからのデータは、ポー１・（アドレスＦＯＯ
Ｏ．）を読むことにより受取る。アドレスは全て１６進
であり、図に示す値は例示である。The present invention provides a system that does not require a ROM by devising a method for decoding the address space visible from the CPU of the PE. FIG. 2 is a diagram showing an address space, in which (a) shows a conventional address space, and (b) shows an atrus space according to the present invention. Here, we make the following assumptions. First, when the PE CPU is initialized (reset) as a post I/computer, atlus ooo
The instruction is taken from o (# indicates hexadecimal) and execution begins. Data from the bus is port 1 (address FOO
O. ) by reading. All addresses are in hexadecimal, and the values shown in the figure are examples.

従来のアドレス空間は、（イ）に示すようにアドレスｏ
ｏｏｏ．〜２０００＃はＲＯＭに割当てられており、こ
こにＩＰＬプログラムが格納されていた。後（７）　２
　０　０　０　＃　−　Ｆ　Ｏ　Ｏ　Ｏ　＃まテｉ；ｉ
　Ｒ　Ａ　Ｍ領域とＰＯＲＴ　（ポー１・）領域が適宜
割当てられていた。そして、ＰＥか初期化されると、Ｐ
Ｅ内（７）ＣＰＵＬｉ７ドｌ／　ス０　０　０　０　＃
から２０００ｍの間に置かれたＲＯＭに格納されている
ＩＰＬによって動作に必要なＯＳ等（これらはポス１・
計算機により作成される）をポー１・から読出し、ＲＡ
Ｍに書込んでいく。The conventional address space has an address o as shown in (a).
ooo. ~2000# was allocated to the ROM, and the IPL program was stored here. After (7) 2
0 0 0 # - F O O O #matei;i
A RAM area and a PORT (port 1) area were appropriately allocated. Then, when PE is initialized, P
In E (7) CPULi7 dollar l/s0 0 0 0 #
The IPL stored in the ROM placed between 2000m from
(created by a computer) from port 1 and RA
Write to M.

これに対し、本発明の場合には（口）に示すようにアド
レス００００＃から２０００ｍまでをボＩ・のアドレス
としている。従って、ＰＥが初期化されてＰＥ内のＣＰ
Ｕがｏｏｏｏ＃がらアドレスを出力すると、第１図に示
したアドレスデコダがこのアドレスをデコードしてボー
１・ア１・レスに変換し、通信ポートをアクセスするよ
うにする。On the other hand, in the case of the present invention, the addresses 0000# to 2000m are used as the addresses of voice I., as shown in (mouth). Therefore, when the PE is initialized, the CP in the PE
When U outputs an address as oooo#, the address decoder shown in FIG. 1 decodes this address and converts it to baud 1, a1, address, and accesses the communication port.

この間に、ホスト計算機からＩＰＬのブロクラムを各Ｐ
Ｅに対して送出し、各ＰＥでは通信ボー１・を経由して
ＣＰＵにそのプログラムを！ｊえ、Ｉ　Ｉ）Ｌを実行さ
せるのである。従って、本発明によればＩＰＬプログラ
ムを格納したＲＯＭは必要ないことになる。During this time, the IPL blockram is transferred from the host computer to each PC.
E, and each PE sends the program to the CPU via communication board 1. It is to execute II)L. Therefore, according to the present invention, a ROM storing an IPL program is not required.

次に、ホスｌ・計算機がＰＥに送るデータとＰＥのＣＰ
Ｕが実行する命令の関係を更に詳細に説明する。ここで
は、ＰＥのＣＰＵが実行する命令を以下のように定義す
る。Next, the data that the host/computer sends to the PE and the PE's CP
The relationship between the instructions executed by U will be explained in more detail. Here, the instructions executed by the CPU of the PE are defined as follows.

ＳＴ　　ＡＤＤＲ　　．アドレス（ＡＤＤＲ）にレジス
タの値を書込むＬＤ　　ＡＤＤＲ　　，アドレスの（Ａ　Ｄ　Ｄ　Ｒ）
の値をレジスタに読込むまた、ホスト計算機がＰＥに送るＯＳのデータ列をＯＳ
Ｏ，ＯＳＩ，・・ＯＳＺ　（ＯＳＺが最終データ１デー
タの個数は１００＃と仮定）と書き表すものとすると、
従来方式によりホスト計算機がＰＥに送るデータとＰＥ
のＣＰＵが実行する命令列は、第３図に示すようなもの
となる。時刻ｔ，からｔ２までの間がホスト計算機から
各ＰＥにＯＳを送信しているンーケンスである。STADDR. Write the register value to address (ADDR) LD ADDR , address (ADDR)
In addition, the OS data string that the host computer sends to the PE is read into the register.
Assuming that OSZ is written as O, OSI, ... OSZ (assuming that the number of final data 1 data is 100#),
Data and PE sent from the host computer to the PE using the conventional method
The instruction sequence executed by the CPU is as shown in FIG. The period from time t to t2 is a sequence in which the host computer transmits the OS to each PE.

第４図はホスト計算機かＰＥに送るデータとＰＥのＣＰ
Ｕが実行する命令例（本発明）を示す図である。従来例
では、第３図に示すようにホス１・計算機からはＯＳ命
令のみが与えられているたけであったが、第４図の本発
明の場合にはホスト計算機はＯＳ命令のみならずＬＤ　
　ＦＯＯＯｓなる命令とＳＴ　　２０ＤＯｍなる命令を
送っている。Figure 4 shows the data sent to the host computer or PE and the PE's CP.
It is a figure which shows the example of an instruction (this invention) which U executes. In the conventional example, only OS instructions are given from the host 1/computer as shown in FIG. 3, but in the case of the present invention shown in FIG. 4, the host computer receives not only OS instructions but also LD.
It is sending commands FOOOs and ST 20DOm.

これら命令は、従来方式では内蔵のＲＯＭから与えられ
ていたものである。ＰＥ側ではこのＬＤ命令が通信ポー
トから受取られるとＣＰＵの命令として実行される。つ
まり、ＰＥ側ではＣＰＵから出力されるアドレス０００
０１〜２０００ｍを全て通信ポートのアドレスＦＯＯＯ
＃に変換し、通信ポートから入力されるデータをＣＰＵ
が実行すべき命令として取り込み実行を進めていくもの
である。In the conventional system, these instructions are given from the built-in ROM. On the PE side, when this LD command is received from the communication port, it is executed as a CPU command. In other words, on the PE side, the address 000 output from the CPU
All communication port addresses from 01 to 2000m are FOOO
Convert the data input from the communication port to # and send it to the CPU.
The command is taken in as an instruction to be executed and the execution proceeds.

第４図において、時刻ｔ１からｔ２の範囲がホスト計算
機から各ＰＥにＯＳを送信しているシケンスである。前
述したように、ＰＥのＣＰＵが実行する命令もホス１・
計算機からＰＥに送り出されている。このことは言い換
えれば、従来ＲＯＭに格納していたＩＰＬをホスト計算
機から送り出した命令で行っていることになる。従って
、各ＰＥ内にＲＯＭを用意する必要がなくなったのであ
る。このことが可能となったのは、前述したアドレスの
デコードの工夫がポイントとてある。更に説明する。Ｐ
Ｅ内のＣＰＵは、初期化後アドレスｏｏｏｏ．から実行
を始める。In FIG. 4, the range from time t1 to t2 is the sequence in which the OS is transmitted from the host computer to each PE. As mentioned above, the instructions executed by the PE CPU are also executed by the host 1.
It is sent from the computer to the PE. In other words, the IPL, which was conventionally stored in the ROM, is performed using an instruction sent from the host computer. Therefore, it is no longer necessary to prepare a ROM in each PE. This was made possible due to the above-mentioned address decoding technique. I will explain further. P
After initialization, the CPU in E is at address oooo. Start execution from.

そして、アドレスｏｏｏｏエから命令を読込み、実行し
、次にはアドレス０００１おから命令を読込み実行する
。このように、アドレスを１つずつ更新しながら実行す
る。従来は、アドレス００００あからのアドレス空間に
ＩＰＬを書込んだＲＯＭを置くことにより、ＩＰＬを実
行していた。本発明では、この空間に通信ポートを割当
て、ＣＰＵが初期化後、命令をアドレスｏｏｏｏ．から
読込もうとすると、通信ポートのデータ、即ちホスト計
算機からバスを介して送られてくる命令が読込まれるこ
とになる。Then, the instruction is read from address oooo and executed, and then the okara instruction from address 0001 is read and executed. In this way, the process is executed while updating the addresses one by one. Conventionally, IPL was executed by placing a ROM in which IPL was written in the address space starting from address 0000. In the present invention, a communication port is allocated to this space, and after the CPU is initialized, instructions are sent to address oooo. If an attempt is made to read from the host computer, the data of the communication port, that is, the command sent from the host computer via the bus, will be read.

第５図は本発明の一実施例を示す構成ブロック図である
。第１図と同一のものには、同一の符号を付して示す。FIG. 5 is a block diagram showing an embodiment of the present invention. Components that are the same as those in FIG. 1 are designated by the same reference numerals.

図では、ＰＥを１個しか示していないが、実際にはバス
３に複数個接続されている。Although only one PE is shown in the figure, a plurality of PEs are actually connected to the bus 3.

ホスト計算機１は、ＣＰＵｉａ，メモリ１ｂ及びバス３
を介してＰＥ４との接続制御を行うインターフェイス部
］Ｃより構成されている。このインターフエイス部１ｃ
には、バス３が接続される他に制御線５が接続されてい
る。この制御線５は各ＰＥ４とも接続されている。ＰＥ
４において、４ｅはバス３を介してホスト計算機１との
接続制御を行うインターフェイス部、４ｆはＰ　Ｅ　Ｊ
　内の内部バスである。第１図で示した通信ポート４ｂ
はインターフエイス部４ｅに含まれる。このように構成
されたシステムの動作を説明すれば、以下のとおりであ
る。The host computer 1 includes a CPUia, a memory 1b, and a bus 3.
The PE 4 is configured with an interface section]C that controls connection with the PE4 via the interface section C. This interface section 1c
In addition to the bus 3, a control line 5 is also connected to the . This control line 5 is also connected to each PE4. P.E.
4, 4e is an interface unit that controls connection with the host computer 1 via the bus 3, and 4f is a P E J
There is an internal bus inside. Communication port 4b shown in Figure 1
is included in the interface section 4e. The operation of the system configured as described above will be explained as follows.

第６図は本発明によるブートシーケンスを示す図である
。以下、このシーケンス図に沿って第５図に示すシステ
ムの動作を説明する。先ず、インターフェイス部］Ｃを
介してホス１・計算機１からＰＥへのリセット信号が出
力される（■）。一方、ＰＥ４側では、インターフェイ
ス部４ｅを介して送られたきたリセット信号を受けて内
部の状態をリセットして初期化する（　（１））。リセ
ットされると、ＣＰＵ４ａはアドレス００００＃から命
令をフエツチして実行するようになっている。そこで、
ＣＰＵ４ａはアドレスｏｏｏｏ．をアドレスデータとし
て出力する。このアドレスデータはアドレスデコーダ４
ｃによってインターフェイス部４ｅ内の通信ポートをア
クセスする信号に変換される。この結果、通信ポートが
アクセスされる（　（２））。しかしながら、制御線５
を介してＡＣＫ信号（確認信号）がまだ有効になってい
ないのでそのままホールド状態となる（　（３））。FIG. 6 is a diagram showing a boot sequence according to the present invention. The operation of the system shown in FIG. 5 will be explained below along with this sequence diagram. First, a reset signal is output from the host 1/computer 1 to the PE via the interface section C (■). On the other hand, on the PE4 side, upon receiving the reset signal sent via the interface section 4e, the internal state is reset and initialized ((1)). When reset, the CPU 4a fetches and executes instructions from address 0000#. Therefore,
The CPU 4a has the address oooo. is output as address data. This address data is sent to address decoder 4.
c is converted into a signal for accessing the communication port in the interface section 4e. As a result, the communication port is accessed ((2)). However, control line 5
Since the ACK signal (acknowledgment signal) has not yet become valid, the device remains in a hold state ((3)).

一方、ホスト計算機側では、全てのＰＥ４が通信ポート
４ｂをアクセスするのをインターフエイス部１ｃを介し
てＣＰＵ１ａにより監視している。On the other hand, on the host computer side, the CPU 1a monitors access by all PEs 4 to the communication port 4b via the interface unit 1c.

そして、全てのＰＥが通信ポートをアクセスするのを待
ってＰＥの第１命令を通信ボート４ｂに出力する（■）
。また、それと同時に制御線５のＡＣＫ信号を有効にす
る（■）。Then, wait until all PEs access the communication ports and output the PE's first command to the communication port 4b (■)
. At the same time, the ACK signal on the control line 5 is enabled (■).

ＰＥ側ではＡＣＫ信号が有効になるまでホールドされて
いたが、ＡＣＫ信号が有効になったのを受けてＣＰＵ４
ａが第１命令を読込み実行する（　（４））．次に、Ｃ
ＰＵ４ａが第２の命令をフェッチするためのアドレス信
号０　０　０　１−　ｓを出力すると、このデータは再
度アドレステコーダ４ｃにより通信ポー１・４ｂをアク
セスする信号に変換され、通信ポート４ｃをアクセスす
る（　（５））。On the PE side, the ACK signal was held until it became valid, but when the ACK signal became valid, the CPU4
a reads and executes the first instruction ((4)). Next, C
When the PU 4a outputs the address signal 0 0 0 1-s for fetching the second instruction, this data is again converted by the address encoder 4c into a signal for accessing the communication ports 1 and 4b, and the address signal 0 0 1-s is used to access the communication port 4c. ((5)).

この時、ＡＣＫ信号は無効状態になっているので、ＡＣ
Ｋ信号が有効になるまでホールドされる（（６））。At this time, the ACK signal is in an invalid state, so the AC
It is held until the K signal becomes valid ((6)).

ホスト計算機側では、全てのＰＥが通信ボーＩ・をアク
セスするのを待って、ＰＥの次の命令を通信ポート４ｂ
に出力する（■）。それと同時に、制御線５のＡＣＫ信
号を有効にする（■）。The host computer side waits for all PEs to access the communication port 4b, and then sends the PE's next command to the communication port 4b.
Output to (■). At the same time, the ACK signal on the control line 5 is enabled (■).

ＰＥ側では、第２命令を通信ポート４ｂを介して読込み
実行する（　（７））。このようにしてＰＥ側では、Ｃ
ＰＵ４ａがフエツチする命令（命令アドレス）がＩＦＦ
Ｆ＃を越えない間、（５）（６），　　（７）を繰り返
す（　（８））。一方、ＰＥ側ではブートシーケンスを
終了するまで■，■を繰返す（■）。ＯＳをホスト計算
機から送る場合には、第４図で示したように、ホスト計
算機は、ＰＥが実行すべき命令に合わせて、ＯＳのデー
タを送ればよい。On the PE side, the second command is read and executed via the communication port 4b ((7)). In this way, on the PE side, C
The instruction (instruction address) fetched by PU4a is IFF
Repeat (5), (6), and (7) until F# is not exceeded ((8)). On the other hand, on the PE side, ■ and ■ are repeated until the boot sequence is completed (■). When the OS is sent from the host computer, the host computer only needs to send the OS data in accordance with the instructions to be executed by the PE, as shown in FIG.

［発明の効果］以上、詳細に説明したように、本発明によればＰＥがリ
セットされてからＰＥ内のＣＰＵが命令フエツチ用に出
力するアドレスをデコードして通信ポー１・をアクセス
する信号に変換してやり、ＩＰＬのための命令を通信ポ
ート経由でポス１・剖算機から貰って実行する構成とす
ることにより、ＰＥ内のＲＯＭを不要とすることができ
る。従って、本発明によれば分散メモリ型並列計算機の
ＰＥの数を増やせるという利点を十分に生かすことがで
きるようになる。[Effects of the Invention] As described above in detail, according to the present invention, after the PE is reset, the CPU in the PE decodes the address output for instruction fetch and converts it into a signal for accessing the communication port 1. The ROM in the PE can be made unnecessary by converting the data and receiving the command for IPL from the POS 1 autopsy machine via the communication port. Therefore, according to the present invention, it is possible to fully utilize the advantage of being able to increase the number of PEs in a distributed memory parallel computer.

[Brief explanation of drawings]

第１図は本発明方式の原理ブロック図、第２図はアドレ
ス空間を示す図、第３図はホスト計算機がＰＥに送るデータとＰＥのＣＰ
Ｕが実行する命令例（従来）を示す図、第４図はホスト
計算機がＰＥに送るデータとＰＥのＣＰＵが実行する命
令例（本発明）を示す図、第５図は本発明の一実施例を
示す構成ブロック図、第６図は本発明にょるブートシーケンスを示す図、第７図は従来の分散型並列計算機の構成ブロック図、第８図は各ＰＥの内部構成例（従来）を示す図、第９図
は従来のブー１・アップの手順を示すフローチャートで
ある。第１図において、１はホスト計算機、３はバス、４はＰＥ，４ａはＣＰＵ，４ｂは通信ポート、４ｃはアドレスデコーダ、４ｄはＲＡＭである。］　７従来■分徹メモリ型並列計算機■構成プロ・ンク図第７
　閤Figure 1 is a principle block diagram of the method of the present invention, Figure 2 is a diagram showing the address space, and Figure 3 is a diagram showing the data sent by the host computer to the PE and the CP of the PE.
FIG. 4 is a diagram showing an example of an instruction executed by U (conventional), FIG. 4 is a diagram showing data sent from a host computer to a PE, and an example of an instruction (present invention) executed by the PE's CPU, and FIG. 5 is an example of an implementation of the present invention. FIG. 6 is a configuration block diagram showing an example of the boot sequence according to the present invention. FIG. 7 is a configuration block diagram of a conventional distributed parallel computer. FIG. 8 is an example of the internal configuration of each PE (conventional). The figure shown in FIG. 9 is a flowchart showing the conventional boo1-up procedure. In FIG. 1, 1 is a host computer, 3 is a bus, 4 is a PE, 4a is a CPU, 4b is a communication port, 4c is an address decoder, and 4d is a RAM. ] 7 Conventional ■Distributed memory type parallel computer■Configuration diagram No. 7
閤

Claims

[Claims] In a distributed memory parallel computer in which one host computer (1) and a plurality of processor elements (4) are connected via a bus (3), in each processor element (4) , a CPU (4a), a communication port (4b) connected to the bus (3), and an address decoder that decodes the address output from the CPU (4a) when the processor element is reset and accesses the communication port (4b). (4c), and all processor elements (4) have communication ports (4b
) is detected on the host computer (1) side, the host computer (1) sends a boot program to each processor element (4), and each processor element (4) side sends a boot program to the communication port (4).
The boot program input from b) is sent to the CPU (4a)
A boot method for a parallel computer, characterized in that it is configured to be executed sequentially.