JPH02240765A

JPH02240765A - Data communication system for computer

Info

Publication number: JPH02240765A
Application number: JP1063091A
Authority: JP
Inventors: Hiroki Miura; 三浦　宏喜
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1989-03-14
Filing date: 1989-03-14
Publication date: 1990-09-25
Anticipated expiration: 2012-09-24
Also published as: JP2657090B2

Abstract

PURPOSE:To improve the performance of a data communication system by transferring selectively the processed data obtained from a data processing part to the processors set in the direction of a vertical or horizontal line via a communication control part. CONSTITUTION:A network control part NC receives a packet from a processor PE or another PE and detects the PE* included in the first word of the packet. The processor numbers X and Y showing the addresses of matrix number forms of the processors in the address that receives the packet are written into the PE*. Thus the value of the PE* is compared with the processor numbers (x) and (y). Then the packet is sent to the east E from the west W or vice versa as long as X = x is not satisfied. Then the packet is sent to the north N from the south S or vice versa when X = x is satisfied and unless Y = y is satisfied. Then the packet is used when the data are processed in a processor if X = x and Y = y are satisfied. As a result, a self-routing is attained for transfer of data in the least distance.

Description

【発明の詳細な説明】（イン　産業上の利用分野本発明は、計算機、特にデータ駆動計算機のデータ通信
システム、該システムに用いるプロセッサ、並びにデー
タ通信方法に関する。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a data communication system for a computer, particularly a data-driven computer, a processor used in the system, and a data communication method.

（ロ）従来の技術近年、実用的な並列処理計算機の実現に向けて研究が進
められており、本願発明者は、既にデータ駆動計算機と
その言語処理系ソフトウェアの開発、及びそれらの評価
を終了している。(b) Conventional technology In recent years, research has been progressing toward the realization of practical parallel processing computers, and the inventor has already completed the development and evaluation of a data-driven computer and its language processing software. are doing.

［田中他：　「データ駆動計算機ＳＰＭの試作」、情報
処理学会第３６口金国大会講演論文集７　Ｂ　−５゜西用他＝　「データ駆動計算機ＳＰＭのコンパイラ」、
同７Ｂ−６゜田中他：　「データ駆動計算機ＳＰＭの性能評価（１）
」情報処理学会第３７口金国大会講演論文集ｌＮ−４゜岡本他：　「データ駆動計算機ＳＰＭの性能評価（２）
」同ｌＮ−５゜〕一般に、データ駆動計算機は、種々の命令がデータの流
れを示すアークによって接続されるデータ７０−グラフ
をプログラムとして実行する６のであり、言い替えれば
、「処理可能なデータから処理を実行していく」という
ような非ノイマン型の思想に従い簡単な実行規則によっ
て演算処理が行なノ）れる。[Tanaka et al.: "Prototype of data-driven computer SPM", Information Processing Society of Japan 36th Annual Conference Proceedings 7B-5゜Nishiyo et al. = "Compiler for data-driven computer SPM",
7B-6゜Tanaka et al.: “Performance evaluation of data-driven computer SPM (1)
"Information Processing Society of Japan 37th National Conference Proceedings lN-4゜Okamoto et al.: "Performance evaluation of data-driven computer SPM (2)
] In general, a data-driven computer executes as a program a data graph in which various instructions are connected by arcs indicating the flow of data. Arithmetic processing is performed according to simple execution rules in accordance with the non-Neumann-type philosophy of "executing processing."

斯様なデータ駆動計算機は、主としてデータ対検出機構
、演算処理機構、プログラム記憶機構の三つの構成要素
からなり、その実行処理の概略は以下のとおりである。Such a data-driven computer mainly consists of three components: a data pair detection mechanism, an arithmetic processing mechanism, and a program storage mechanism, and the outline of its execution processing is as follows.

まず、データ駆動型計算機ではパケットと呼ばれるーま
とまりのデータ集合を単位として使用しており、該パケ
ットは処理対象データ、データフローグラフの接続情報
（ノード番号）並びに命令コードなどから構成される。First, a data-driven computer uses a set of data called a packet as a unit, and the packet is composed of data to be processed, connection information (node number) of a data flow graph, instruction code, and the like.

このデータ対検出機構では演算が可能なオペランドパケ
ットの組を検出して出力する。そして検出されたオペラ
ンドパケットの組は演算処理機構″：′処理される。こ
の結果パケットはプログラム記憶機構で新たなノード番
号を付与されデータ対検出機構に送られる。斯る処理を
繰り返し続けることにより一連の処理が実行される。This data pair detection mechanism detects and outputs a set of operand packets that can be operated on. The detected set of operand packets is then processed by the arithmetic processing mechanism.The resultant packet is given a new node number in the program storage mechanism and sent to the data pair detection mechanism.By continuing to repeat this process, A series of processing is executed.

本願発明者は、現在、上述の如きデータフロー計算機に
於て、特にそのプロセッサアーキテクチャに種々の改良
を加えた高並列データ駆動計算機Ｅ　Ｄ　Ｄ　Ｅ　Ｎ　
（Ｅｎｈａｎｃｅｄ　Ｄａｔａ　Ｄｒｉｖｅｎ　ＥＮｇ
ｉｎｅ）の開発を進めている。このＥＤＤＥＮでは、ｌ
チップのＣ＼１０５−ＬＳＩによって実現する要素プロ
セッサを、最大ｌＴ１２４台接続した大規模データ駆動
計算機の稼働を目指し、また、ＰＥ数台の小規模システ
ム、ＰＥ数十台の中規模システムなど柔軟な構成がとれ
るようにし、信号処理、画像処理、グラフィックス、各
種シミュレーション、ＣＡＤなどの広範な分野に適応さ
れることが目標となっている。The inventor of the present application is currently developing a highly parallel data-driven computer E D D E N in which various improvements have been made to the above-mentioned data flow computer, especially its processor architecture.
(Enhanced Data Driven
ine) is currently being developed. In this EDDEN, l
We aim to operate a large-scale data-driven computer that connects up to 124 IT element processors realized by C\105-LSI chips, and we also aim to operate flexible systems such as small-scale systems with several PEs and medium-scale systems with several dozen PEs. The goal is to make it configurable and apply it to a wide range of fields such as signal processing, image processing, graphics, various simulations, and CAD.

（ハ）発明が解決しようとする課題 −ｈ述の如く、多数の要素プロセッサを接続したシステ
ムの構築のためには、各要素プロセッサ中での演算処理
とプロセッサ間通信処理との独立化、並びにプロセッサ
間通信のためのネットワークシステムの最適化等が要求
され、これ等の実現によって高性能計算機が得られる。(c) Problems to be solved by the invention - As mentioned in h, in order to construct a system in which a large number of element processors are connected, it is necessary to make the arithmetic processing in each element processor and the communication processing between processors independent, and Optimization of network systems for inter-processor communication is required, and by realizing these optimizations, high-performance computers can be obtained.

（ニ）課題を解決するための手段本発明のデータ通信システムは、多数のプロセッサを行
列配置し、各縦方向のプロセッサ列を循環的に結合する
複数の縦通信線と各横方向のプロセッサ行を循環的に結
合する複数の横通信線とでプロセッサ間のデータ通信を
行う計算機のデータ通信システムであり、各プロセッサ
は少なくともデータ処理部と通信制御部とからなり、該
通信制御部が該データ処理部から得られる処理データを
縦方向線、或は横方向線のいずれかの方向のプロセッサ
に選択的に転送する通信制御、並びに、該通信制御部が
隣接プロセッサから縦方向線あるいは横方向線を介して
得られるデータを該プロセッサのデータ処理部に供給す
るか、又は縦方向線あるいは横方向線のいずれかの方向
のプロセッサに選択的に転送する通信制御を司るもので
ある。(d) Means for Solving the Problems The data communication system of the present invention arranges a large number of processors in rows and columns, and has a plurality of vertical communication lines that cyclically connect each vertical processor column and each horizontal processor row. A data communication system for a computer that performs data communication between processors using a plurality of horizontal communication lines that cyclically connect the data. Communication control that selectively transfers processing data obtained from a processing unit to a processor in either a vertical line or a horizontal line, and the communication control unit transfers processing data from an adjacent processor to a vertical line or a horizontal line. It is in charge of communication control for supplying data obtained through the processor to the data processing section of the processor, or selectively transferring data to the processor in either the vertical direction or the horizontal direction.

本発明のデータ通信シテスムのプロセッサは、行方向及
び列方向夫々四方の隣接プロセッサとの結合の為に４個
の双方向の入出力ポートを備え、各ポートに通信データ
の基本的情報量に該当する記憶容量を持つ入力レジスタ
、並びに出力レジスタを具備したものである。The processor of the data communication system of the present invention is equipped with four bidirectional input/output ports for connection with four adjacent processors in each row direction and column direction, and each port corresponds to the basic information amount of communication data. It is equipped with an input register and an output register with a storage capacity of

本発明のデータ通信方法は、行列番号と対応付けられて
行列結合された複数のデータフロー型のプロセッサ間で
通信データの送受信を行うデータ通信方法であり、上記
通信データには、送信先プロセッサに対応付けられた列
番号が送信先行列番号として書き込まれており、上記各
プロセッサは、該プロセッサ自身の行列番号と該プロセ
ッサに転送されて来た通信データの送信先行列番号とを
比較し、両番号が一致する時の通信データを該プロセッ
サでデータ処理し、不一致の時の通信データを隣接プロ
セッサに転送するものである。The data communication method of the present invention is a data communication method in which communication data is transmitted and received between a plurality of data flow type processors that are matrix-coupled in association with matrix numbers, and the communication data includes The associated column number is written as the destination column number, and each processor compares its own column number with the destination column number of the communication data transferred to the processor, and Communication data when the numbers match are processed by the processor, and communication data when the numbers do not match is transferred to an adjacent processor.

（ホ）作用本発明のデータ通信シテスムによれば、多数のプロセッ
サがトーラス接続さノＬるネットワークシステムを採用
すると共に、各プロセッサに主にプロセッサ間通信のた
めの通信制御部をデータ処理部とは独立して設けたもの
であるので、プロセッサのＬＳＩ化実現の際に、上下ト
ーラス接続によりビン数削限、−様構造が図れ、上記通
信制御部の独立性を保った型のＬＳＩ内蔵により、シス
テム全体の小型化、低価格化が望める。(e) Operation According to the data communication system of the present invention, a network system in which a large number of processors are connected in a torus is adopted, and each processor has a communication control unit mainly for inter-processor communication as a data processing unit. Since these are provided independently, when realizing the processor as an LSI, it is possible to reduce the number of bins by connecting the upper and lower torus, and to create a --like structure. , the entire system can be made smaller and lower in price.

また、本発明システムのプロセッサは、四方の隣接プロ
セッサとの入出力を行う４個の入出力ポートに夫々通信
データの基本的情報量、即ち１パケット分の情報量を持
つ入力レジスタと出力レジスタとを１対にして備えてい
るので、データ転送毎に必ずパケ・ｌト単位でプロセッ
サ間のデータ転送が完ｒできる。従って、パケット単位
の途中でデータ転送が停帯する事がないので、停帯デー
タが他のデータの通信を妨げると云ったデッドロック現
象を回避が可能となる。In addition, the processor of the system of the present invention has an input register and an output register each having a basic information amount of communication data, that is, an information amount of one packet, at four input/output ports that perform input/output with neighboring processors on four sides. Since the processors are provided as a pair, data transfer between processors can be completed in packet units every time data is transferred. Therefore, since data transfer does not stall in the middle of a packet, it is possible to avoid a deadlock phenomenon in which stalled data interferes with communication of other data.

さらに、本発明のデータ通信方法によれば、通信データ
中に送信先プロセッサ番号（対応行列番号）が書き込ま
れているので、各プロセッサでは内部的に発生したデー
タ、あるいは他のプロセッサから転送されて来たデータ
の送信先プロセッサ番号を検知してこのデータを四方の
隣接プロセンサの内、いずれのプロセッサに転送すべき
かがプロセッサ自身で判断できる。従って、データは各
プロセッサの転送動作により、最短ルートで宛先プロセ
ッサに通信できるセルフルーティングを実現できる。Furthermore, according to the data communication method of the present invention, the destination processor number (corresponding matrix number) is written in the communication data, so each processor uses internally generated data or data transferred from other processors. By detecting the destination processor number of the incoming data, the processor itself can determine which processor among the four adjacent processors this data should be transferred to. Therefore, self-routing can be realized in which data can be communicated to the destination processor through the shortest route by the transfer operation of each processor.

（へ）実施例第１図に本発明実施例としての高並列データ駆動計算機
のシステムを示し、第２図に要素プロセッサの構成を示
す。(f) Embodiment FIG. 1 shows a highly parallel data-driven computer system as an embodiment of the present invention, and FIG. 2 shows the configuration of element processors.

まず第２図の要素プロセッサ（ＰＥ）は、基本的にはプ
ログラム記憶（ＰＳ）、発火制御・カラー管理部（ＦＣ
ＣＭ）、命令実行部（ＥＸＥ）、及びキューメモリ（Ｑ
）が巡回パイプライン（リング）構造に接続された構成
としている。First, the element processor (PE) shown in Fig. 2 basically consists of a program storage (PS), a firing control/color management section (FC
CM), instruction execution unit (EXE), and queue memory (Q
) are connected in a circular pipeline (ring) structure.

プログラム記憶（ＰＳ）はノード番号の更新、定数付与
、及び結果のコピーを行う。発火制御・カラー管理部（
ＦＣＣＭ）は、前述の２段階の待ち合わせ記憶方式で発
火制御及びカラーの獲得・解放の管理を行う。命令実行
部（ＥＸＥ）は、浮動小数点・整数演算、条件判定、分
岐、簡易定数発生などの命令、及びそれらの複合命令を
実行する。The program storage (PS) updates node numbers, assigns constants, and copies results. Ignition control/color management department (
FCCM) performs firing control and management of color acquisition and release using the above-mentioned two-stage queue storage method. The instruction execution unit (EXE) executes instructions such as floating point/integer arithmetic, condition determination, branching, and simple constant generation, and composite instructions thereof.

キュー（Ｑ）はリング上でのあらゆるデータ流変動を吸
収する緩衝記憶である。緩衝記憶が必要となるのは、■
コピー、■リングへの強制的入力、■リングからの出力
遅延、■（ＦＣＣＭ）における待ちリストのサーチ、な
どが生じた時である。本要素プロセッサ（ＰＥ）には、
キュー（Ｑ）のデータ滞在量に応じて■〜■の動作モー
ドを動的に変更する機能を付加し、これによって並列度
の制御を行う。The queue (Q) is a buffer that absorbs any data flow fluctuations on the ring. Buffer memory is required because ■
This is when copying, (1) forced input to the ring, (2) delay in output from the ring, (2) search of waiting list in (FCCM), etc. occur. This elemental processor (PE) includes:
A function is added to dynamically change the operation modes of ① to ② according to the amount of data retained in the queue (Q), thereby controlling the degree of parallelism.

また、キュー（Ｑ）がやむなくオーバーフローした時に
は、外部データメモリ（ＥＤＭ）上に外部キューを形成
してこれを吸収し、プログラム実行の継続を図る。Furthermore, when the queue (Q) inevitably overflows, an external queue is created on the external data memory (EDM) to absorb the overflow and continue program execution.

ネットワーク制御部（ＮＣ）は、東西南北４系統の通信
ポートを保持し、最大１０２４プロセツサ（ＰＥ）のト
ーラス結合網に基づくルーティング制御を行う。ベクト
ル演算制御部（ＶＣ）は、ベクトル演算関連命令、及び
通常のメモリアクセス命令の実行制御を行う。該制御部
（ＶＣ）と、入力制御部（ＩＣ）及び出力制御部（ＯＣ
）の間には構造体（ベクトル）通信用のバイパス線を設
ける。外部データメモリ（ＥＤＭ）は、構造体等を格納
するデータメモリであり、容量は５１２ＫＢｙｔｅ（１
２８に語Ｘ３２ｂｉｔ）程度とする。タロツク方式は同
期式であるが、上記ネットワーク制御部（ＮＣ）内部は
自己開明式で動作するものとする。The network control unit (NC) maintains four communication ports for north, south, east, and west, and performs routing control based on a torus connection network of a maximum of 1024 processors (PE). A vector calculation control unit (VC) controls the execution of vector calculation related instructions and normal memory access instructions. The control section (VC), the input control section (IC) and the output control section (OC)
) is provided with a bypass line for structure (vector) communication. External data memory (EDM) is a data memory that stores structures, etc., and has a capacity of 512 KB (1
28 words x 32 bits). Although the tarock system is a synchronous system, it is assumed that the inside of the network control section (NC) operates in a self-discovery system.

斯様な要素プロセッサ（１’Ｅ）を多数用いたＥＤＤＥ
Ｎの基本的な構成は第１図に示すようにｎＸｎ台の要素
プロセッサをトーラス結合網で接続することを基本とす
る。該トーラス結合網とは、多数のプロセッサを行列配
置し、各縦方向、即ち南北方向（Ｎ−５）のプロセッサ
列を循環的に結合する複数の縦通信線と各横方向、即ち
東西方向（Ｗ−Ｅ）のプロセッサ行を循環的に結合する
複数の横通信線とで任意のプロセッサ間のデータ通信を
可能としたものである。EDDE using many such element processors (1'E)
The basic configuration of N is based on connecting nXn element processors in a torus connection network, as shown in FIG. The torus connection network is a network in which a large number of processors are arranged in rows and columns, and a plurality of vertical communication lines cyclically connect processor rows in each vertical direction, that is, north-south direction (N-5), and each horizontal direction, that is, east-west direction (N-5). Data communication between arbitrary processors is possible using a plurality of horizontal communication lines that cyclically connect the processor rows of W-E).

本実施例システムでは、ネットワークとのデータのやり
とりは、南北方向（Ｎ−５）の任意の通信ノンタにネッ
トワークインタフェース（ＮＩＦ）を挿入することによ
って行う。該インタフェース（ＮＩＦ）、及び要素プロ
セッサ１６〜６４台を１枚のプロセッサポート上に実装
し、トーラス接続リンクをプリント基板上に形成する。In the system of this embodiment, data exchange with the network is performed by inserting a network interface (NIF) into any communication terminal in the north-south direction (N-5). The interface (NIF) and 16 to 64 element processors are mounted on one processor port, and a torus connection link is formed on a printed circuit board.

小・中規模システムの構成としては、ホスト計算機とし
て汎用のＥＷＳまたはパソコンを用い、それらのバスイ
ンタフェースを介してネットワークインタフェース（Ｎ
ＩＦ）に接続する。実装形態としては、１〜４枚のプロ
セッサボードと１枚のバスインタフェースボードを、Ｅ
ＷＳ等のラックに直接挿入することにする。The configuration of a small/medium-sized system uses a general-purpose EWS or a personal computer as the host computer, and a network interface (N
IF). In terms of implementation, one to four processor boards and one bus interface board are mounted on an E.
We will insert it directly into the rack of WS etc.

大規模システムの構成としては、応用分野に応じて、次
の２種類の構成法が考えられる。As for the configuration of a large-scale system, the following two types of configuration methods can be considered depending on the field of application.

■　クラスタ接続前述のプロセッサボードを１つのクラスタとして、クラ
スタ間をクラスタインタフェースを介して接続する。ク
ラスタインタフェースは、各クラスタ内のデータの収集
・分配の管理を行う。■Cluster connection The processor boards described above are connected as one cluster, and the clusters are connected via a cluster interface. The cluster interface manages the collection and distribution of data within each cluster.

■　大型トーラス接続１０２４台（３２Ｘ３２台）の要素プロセッサをトーラ
ス結合網で接続する。実装形態としては、１枚のプリン
ト基板に南北（Ｎ−５）方向の３２台の要素ブロセ・ノ
サとＮＩＦとを実装し、東西（Ｗ−Ｅ）方向のリンクは
マザーボード１に形成する。■ Large torus connection 1024 (32 x 32) element processors are connected by a torus connection network. As a mounting form, 32 elements Brose-Nosa and NIF are mounted in the north-south (N-5) direction on one printed circuit board, and links in the east-west (W-E) direction are formed on the motherboard 1.

上述の構成のデータ駆動計算機で用いられるデータパケ
ットには、大別して、プログラム実行に使用する実行パ
ケットとプログラム実行以外に使用される非実行パケッ
トがあり、第４図（ａ）〜（ｅ）にその実例を示してい
る。尚、パケット形式は、構造体本体を保持したパケッ
ト以外は固定長とし、プロセッサ（ＰＥ）内のパイプラ
インリング上では３３ビット×２語、ネットワーク上で
は１８ビット×４語構成を採用している。The data packets used in the data-driven computer configured as described above can be roughly divided into execution packets used for program execution and non-execution packets used for purposes other than program execution. An example of this is shown. The packet format is a fixed length except for the packet that holds the structure body, and is 33 bits x 2 words on the pipeline ring in the processor (PE) and 18 bits x 4 words on the network. .

以下に、第４図のパケットフォーマットに於ける各フィ
ールドの内容を示す。The contents of each field in the packet format of FIG. 4 are shown below.

ＨＤ（ｌｂｉｔ月２８５パケツトの際の１語目（ヘッダ
）と２語目（テイル）の識別子。ヘッダの時　１“ＥＸ
（ｌｂｉｔ戸パイプラインリング上からＰＥ外部へ出力
されるパケットを識別するフラグ５ＩＯＤＥ（２ｂｉｔ戸実行パケット、非実行パケット
等のパケットの種類を識別する識別コード５−ＣＯＤＥ（３ｂｉｔ）：〜ｌ０ＤＥと合わせてパケ
ットに対する処理を規定する識別コードＯＰＣＯＤＥ−Ｍ（５ｂｉｔ）：メイン命令コード。命
令実行部（ＥＸＥ）における命令の種類を規定する。ま
た、ｎ５ｙｎｃの際に同期処理を行うデータの数を保持
する。HD (lbit) Identifier of the first word (header) and second word (tail) in the case of 285 packets per month.If it is a header, 1"EX
(lbit Flag 5IODE for identifying packets output from the pipeline ring to the outside of the PE (2bit Identification code 5-CODE (3 bits) for identifying the type of packet such as execution packet, non-execution packet, etc.: ~ Combined with l0DE Identification code OPCODE-M (5 bits): Main instruction code. Specifies the type of instruction in the instruction execution unit (EXE). Also holds the number of data to be synchronously processed during n5ync. .

０ＰＣＯＤＥ−５（６ｂｉ　ｔ）：サブ命令コード。メ
イン命令コードで規定された命令を更に詳細に規定する
。0PCODE-5 (6 bits): Sub instruction code. The commands defined by the main instruction code will be defined in more detail.

聞ＯＤＥ＊（最大１１ｂｉｔ）：データフローグラフの
ノード番号Ｃ０ＬＯＲ（４ｂｉ　ｔ）：カラー識別子。サブルーチ
ンコールによるプログラム共用、時系別データに対する
処理なと、同一データフローグラフを多重実行する際に
環境を識別する職別番号。ODE* (maximum 11 bits): Data flow graph node number C0LOR (4 bits): Color identifier. A job number that identifies the environment when sharing programs through subroutine calls, processing time-based data, and executing the same data flow graph multiple times.

ＰＨ１（ＬＯｂｉｔ）：ＰＥ番号。最大１０２４台のＰ
Ｅを識別するための識別番号。PH1 (LObit): PE number. Maximum of 1024 P
Identification number for identifying E.

ＤＡＴＡ（３２ｂｉ　ｔ）：　３２ビツトの整数あるい
は浮動少数点数。DATA (32bit): 32-bit integer or floating point number.

ＨＴ（１ｂｉｔ）二語数が４語以上のパケットの際に、
ヘッダ及びテイルと中間の語とを識別するフラグヘッダ
またはテイルの時に°ｌ　となる。HT (1 bit) When the number of two words is 4 or more words,
A flag that identifies headers, tails, and intermediate words is °l for headers or tails.

ＲＱ（１ｂｉｔ）：ネットワーク上を転送３れるパケッ
トに付加するフラグで、ネットワーク上で１３！転送さ
れる度に値が反転するため、語の存在を認識できる。更
に、値が反転することが、パケットを前方へ転送するた
めの転送要求信号となる。また、ＨＴフラグと合わせて
、ヘッダとテイルとを識別できる。RQ (1 bit): A flag added to packets transferred on the network. The existence of the word can be recognized because the value is inverted each time it is transferred. Furthermore, inversion of the value becomes a transfer request signal for forwarding the packet. In addition, the header and tail can be identified together with the HT flag.

ＡＤＤＲＥＳＳ（１６ｂｉ　ｔ　）　：各メモリのデー
タのロード／ダンプなどの際に、メモリアドレスを格納
する。ADDRESS (16bit): Stores the memory address when loading/dumping data in each memory.

以上の基本構成を持つ本発明実施例の計算機の特徴的な
構成は、要素プロセッサ（ＰＥ）での本来のデータ処理
のための各機構とは独立して動作するネットワーク制御
部（ＮＣ）にある。The characteristic configuration of the computer according to the embodiment of the present invention having the above basic configuration is the network control unit (NC) that operates independently of each mechanism for original data processing in the element processor (PE). .

該ネットワーク制御部（ＮＣ）は第４図（ｃ）及び同図
（ｅ）の如きパケットを当該プロセッサ（ＰＥ）から受
けとり、又は他のプロセッサ（ＰＥ）から受けとって、
そのパケットの第１語口にある（ＰＥ＠Ｉを検知する。The network control unit (NC) receives the packets shown in FIGS. 4(c) and 4(e) from the processor (PE) or from another processor (PE),
Detects (PE@I at the first beginning of the packet.

この［ＰＥ番］には該パケットが転送されるべき宛先の
プロセッサの行列番号形式の宛先のプロセッサ番号（ｘ
、ｙ）が書き込まれているので、この値と当該プロセッ
サの番号（ｘ、　　ｙ）との比較を行われる。This [PE number] is the destination processor number (x
, y) have been written, this value is compared with the number (x, y) of the processor in question.

この比較処理により、例えば、Ｘ＝ｘでない限り、パケ
ットを西（Ｗ）から東（Ｗ）へ、あるいは束（Ｅ）から
西（Ｗ）へ転送する。Through this comparison process, for example, unless X=x, the packet is transferred from the west (W) to the east (W) or from the bundle (E) to the west (W).

Ｘ＝ｘであれば、ｙ＝ｙでない限りパケットは市（Ｓ）
から北（Ｎ）へ、あるいは北（Ｎ）がら南（Ｓ）・＼転
送する。If X=x, the packet is sent to city(S) unless y=y
Transfer from north (N) to south (S) or from north (N) to south (S).

そして、Ｘ＝ｘ且つＹ＝ｙとなった時にこのパケットが
当該プロセッサ内でのデータ処理に供せられるのである
。Then, when X=x and Y=y, this packet is subjected to data processing within the processor.

従って、データパケットはトーラス結合された多数の行
列プロセッサ間を、まず、東西方向に転送され、その後
南北方向に転送される事になり、これによって、最低路
離転送のセルフルーティングを実現している。Therefore, data packets are transferred between a large number of torus-coupled matrix processors, first in the east-west direction, and then in the north-south direction, thereby realizing self-routing with minimum path distance transfer. .

第３図に上述の如きトーラスネットワーク上でのセルフ
ルーティングを実現する為のネットワーク制御部（ＮＣ
）のデートシステムを模式的に示し、同図に従い、その
ルーティングアルゴリズムを示す。尚、第３図に於て、
（ＲＮＩ）（ＲＮＯ）は孔入出力ホートを構成する入力
シフトレジスタ、及び出力シフトレジスタであり、４段
のレジスタ（ｒ）からなる。同様に（Ｒ５Ｉ）（Ｒ５Ｏ
）は重大出力ポート、（ＲＷｌ）（Ｒ＾゛０）は四人出
力ポート、（ＲＥＩ）（ＲＥＯ）は東入出力ポートを構
成している。又、「ＯＪは合流、「◎」は分岐を示して
いる。Figure 3 shows the network control unit (NC) for realizing self-routing on the torus network as described above.
) is schematically shown, and its routing algorithm is shown according to the same figure. In addition, in Figure 3,
(RNI) (RNO) is an input shift register and an output shift register that constitute a hole input/output port, and consists of four stages of registers (r). Similarly (R5I) (R5O
) constitutes the critical output port, (RWl) (R^゛0) constitutes the four-person output port, and (REI) (REO) constitutes the east input/output port. Further, "OJ" indicates a confluence, and "◎" indicates a branch.

ルーティングアルゴリズムは以下のとおり。The routing algorithm is as follows.

■、自分のＰＥ番号を（ｘ、ｙ）、ネットワークをｐＸ
ｑ（ｑ：Ｎ−＊Ｓ方向、ｑ：Ｗ−４Ｅ方向）のトーラス
、パケットの行き先ＰＥ番号を（ｘ、ｙ）とし、 △Ｘミ（Ｘ　−ｘ　）ｍｏｄ　ｑ　　ｌ△ｘ１≦ｑ／２
△ｙミ（Ｙ−１）ｍｏｄｐ　　ｌ△ｙ１≦ｐ／２とする
。■, your PE number (x, y), network pX
The torus of q (q: N-*S direction, q: W-4E direction), the destination PE number of the packet is (x, y), and △Xmi (X - x ) mod q l△x1≦q/2
Δymi(Y-1) modp lΔy1≦p/2.

１１、ＰＥ番号は、ＮからＳの方向に順にｙ＝ｏ、１．
２、・・・　ｐＷからＥの方向に順にｘ＝Ｏ１ｌ、２、・・・、ｑ　　　　とする。11. The PE numbers are y=o, 1.
2,... p Let x=O1l, 2,..., q in the direction from W to E.

１１■、＼ｌ０ＤＥはパケットのタグの＼ｌ０ＤＥフィ
ールドの値を意味する。（〜ｌ０ＤＥ−（１０はホスト
へのパケットである。）（１）Ｒ１ △ｙ＝ＱのときパケットをＰへ出力 Δｙ≠０．のときパケットをＳへ出力（２）Ｒ２ △Ｘ≠０のときパケットをＷへ出力 △ｘ＝Ｏかつ△ｙ〉０のときパケットをＳへ出力 △ｘ＝０かつ△ｙ＝ＱかつＭＯＤＥ≠、００のときパケ
ットをＰへ出力 Δｘ　＝　ＯかつΔｙ＝０がッＭＯＤＥ＝００のときパ
ケットをＮへ出力 Δｘ＝０かつ△ｙくＯのときパケットをＮへ出力（３）Ｒ３ Δｘ＋ＱのときパケットをＥへ出力 Δｘ＝０かつ△ｙ〉０のときパケットをＳへ出力 Δｘ＝ＯかツＬ　ｙ　＝　ＯかつＭ　ＯＤ　Ｅ≠００の
ときパケットをＰへ出力 △Ｘ＝Ｏかつ△ｙ＝Ｑかつ！＋ｌ０ＤＥ＝００のときパ
ケットをＮ・入出力 △ｘ＝Ｏかつ△ｙ〈０のときパケットをＮへ出力（４）Ｒ４ Δ〜＝＝０かつＭＯＤＥ≠００のときパケットをＰへ出
力ニ−・≠０またはＭＯＤＥ＝　００のときパケットをＮ
へ出力（５）Ｒ５ △ｘ＞ＯのときパケットをＥへ出力 △ｘ＝Ｏかつ△ｙ〉０のときパケットをＳへ出力 △ｘ＝ＯかつΔ）ｒ≦ＯのときパケットをＮへ出力 △ｘ＜０のときパケットをＷへ出力 ■、パケットのヘッダが到着したときにルーティングを
行い、以降のデータはパケットのテイルが到着するまで
、同じ経路に出力する。11■, \l0DE means the value of the \l0DE field of the tag of the packet. (~l0DE-(10 is the packet to the host.) (1) R1 When △y=Q, output the packet to P. When Δy≠0., output the packet to S. (2) R2 When △X≠0. When △x=O and △y>0, output the packet to S. When △x=0 and △y=Q and MODE≠, 00, output the packet to P. △x = O and △y=0. When MODE=00, output the packet to N. When Δx=0 and △y〉O, output the packet to N. (3) R3 When Δx+Q, output the packet to E. When Δx=0 and △y〉0, output the packet to N. Outputs the packet to S when Δx=O or L y = O and MOD E≠00 Outputs the packet to P when ΔX=O and Δy=Q and!+l0 When DE=00, outputs the packet to N・Input/output Δx When =O and △y〈0, output the packet to N (4) R4 When Δ~==0 and MODE≠00, output the packet to P. When ≠0 or MODE=00, output the packet to N.
Output to (5) R5 When △x>O, output the packet to E. When △x=O and △y>0, output the packet to S. When △x=O and Δ)r≦O, output the packet to N. When Δx<0, output the packet to W.■, Routing is performed when the header of the packet arrives, and subsequent data is output to the same route until the tail of the packet arrives.

Ｖ、Ｐ　Ｅ番号（Ｘ、Ｙ）とネットワークのサイズは、
あらかじめ設定できるものとする。ただし、ｐ、（１は
２のべき乗に限る。また、△Ｘ、△ｙを計算するときに
、モジユロをとらないモード（格子状ネットワークに対
応）ことら可能とする。V, PE number (X, Y) and network size are:
It shall be possible to set it in advance. However, p, (1 is limited to a power of 2. Also, when calculating △X and △y, it is possible to use a mode that does not take the modulus (corresponding to a lattice network).

Ｖｌ　、　Ｉ’　Ｅをリング状に接続する場合ら、Ｎ　
−Ｓを結線すれば、上のルーティングアルゴリズムでル
ーティングできる。When connecting Vl and I'E in a ring, N
If -S is connected, routing can be performed using the above routing algorithm.

以ヒがセルフルーティングアルゴリズムの１例であるが
、これに限られるものでない。The following is an example of a self-routing algorithm, but it is not limited thereto.

一方、第３図のネットワーク制御部（ＮＣ）の入出力ポ
ートの構成は、図示の如く、４段の各１８ピントのシフ
トレジスタ（ｒ）・・・の入力ポート、同じく４段のシ
フトレジスタ（ｒ）・・・の出力ポートを備えているの
で、１４図（ｃ）、（ｅ）の４語形式のパケットがその
まま全て入力ポート、あるいは出力ポートに格納できる
事になる。この事は、トーラスネットワークのように双
方向通信が必要な双方向通信路上で、一方向の前にパケ
ットがつかえている状態でこの方向のパケット転送が停
止していても、パケット単位がポート部で中断して停止
すると云うデッドロックの原因の一つを解消する事にな
る。即ち、例えば入出力ポートの人出側あるいは出力側
に１まとまりのパケットが完全に格納されるので、当該
プロセッサで、他のパケットの他の方向への転送が可能
となる。On the other hand, the configuration of the input/output ports of the network control unit (NC) in FIG. 3 is as shown in the diagram. r)..., so the four-word format packets shown in FIGS. 14(c) and (e) can all be stored as they are in the input port or output port. This means that on a two-way communication path that requires two-way communication, such as a torus network, even if a packet is stuck in front of one direction and packet transfer in that direction is stopped, the packet unit is transferred to the port. This eliminates one of the causes of deadlock, which is when the process is interrupted and stopped. That is, for example, since one set of packets is completely stored on the input or output side of the input/output port, the processor can transfer other packets in other directions.

（ト）発明の効果本発明のよれば、システム全体の小型化、低価格化のた
めに、通信制御機構をもＰＥチップに内蔵でき、プロセ
ッサの基本的な結合状態により、チップのピン数制限、
プロセッサ間距離が小さい、セルフルーティングが可能
、−棟構造、デッドロック回避が可能、実装が容易とな
るデータ通信システム、その為のプロセッサ、並びにデ
ータ通信方法を実現する事ができる。(G) Effects of the Invention According to the present invention, in order to reduce the size and cost of the entire system, a communication control mechanism can also be built into the PE chip, and the number of pins on the chip is limited by the basic connection state of the processor. ,
It is possible to realize a data communication system in which the distance between processors is small, self-routing is possible, a block structure is possible, deadlock can be avoided, and implementation is easy, a processor therefor, and a data communication method.

[Brief explanation of drawings]

第１図は本発明のデータ通信システムを示すシステム図
、第２図は本発明のプロセッサの蜆略構成を示すブロッ
ク図、第３図は本発明プロセッサの要部ゲート構成の模
式図、第４図（ａ）乃至（ｅ）はパケット構成図である
。（ＰＥ）・・・要素プロセッサ、（ＥＸＥ）・・・命令
実行部、（ＥＤ〜１）・・・外部データメモリ、（ＮＣ
）・・・ネットワーク制９１１部。FIG. 1 is a system diagram showing the data communication system of the present invention, FIG. 2 is a block diagram showing the schematic configuration of the processor of the present invention, FIG. 3 is a schematic diagram of the main gate configuration of the processor of the present invention, and FIG. Figures (a) to (e) are packet configuration diagrams. (PE)...Element processor, (EXE)...Instruction execution unit, (ED~1)...External data memory, (NC
)...Network system 911 part.

Claims

[Claims]

(1) A large number of processors are arranged in rows and columns, and multiple vertical communication lines cyclically connect each vertical processor row and multiple horizontal communication lines cyclically connect each horizontal processor row. In a data communication system for a computer that performs data communication, each processor includes at least a data processing section and a communication control section, and the communication control section converts processed data obtained from the data processing section into vertical lines or Communication control that selectively transfers data to a processor in either direction of a horizontal line, and the communication control unit transfers data obtained from an adjacent processor via a vertical line or a horizontal line to a data processing unit of the processor. 1. A data communication system for a computer, characterized in that it manages communication control for supplying or selectively transferring data to a processor in either a vertical line or a horizontal line.

(2) In the processor of the data communication system for a computer according to claim 1, the processor is provided with four bidirectional input/output ports for coupling with four adjacent processors in each of the row and column directions, and each port A processor comprising an input register and an output register having a storage capacity corresponding to the basic amount of communication data.

(3) In a data communication method in which communication data is sent and received between multiple data flow type processors that are matrix-coupled in association with matrix numbers, the above communication data includes information that is associated with a destination processor. The row number is written as the destination row number, and each processor compares its own row number with the destination row number of the communication data transferred to the processor, and if both numbers match. A data communication method in which communication data at a time is processed by the processor, and communication data at a time when there is a mismatch is transferred to an adjacent processor.