JPS63219082A

JPS63219082A - parallel image processing processor

Info

Publication number: JPS63219082A
Application number: JP26640887A
Authority: JP
Inventors: Yoshiki Kobayashi; 芳樹小林; Tadashi Fukushima; 忠福島; Yoshiyuki Okuyama; 奥山　良幸
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1987-10-23
Filing date: 1987-10-23
Publication date: 1988-09-12
Also published as: JPH0260028B2

Abstract

PURPOSE:To form an architecture suitable to LSI by providing an image data input port, plural shift registers, an arithmetic circuit, etc., and giving a product sum summation memory commonly to plural processor units. CONSTITUTION:When a product sum summation memory 15 is commonly given to processor units PE12#1-#4 and an image f14 is inputted from an image data input port 24 at time 1, images f11-f14 are given through a shift register 11 to respective PEs. A load W11 is read from the memory 15, the product with the input image is obtained and held at a cumulative arithmetic circuit 20. Next, an image f15 is inputted at time 2, images f12-f15 are given to respective PEs, the product with a next load W12 is obtained and the cumulative processing with a previous value is executed at the circuit 20. Thereafter, the same processing is executed in accordance with respective times, and a space product sum arithmetic result is continuously outputted through a partial sum output shift register 21 and a partial sum cumulative arithmetic circuit 14. Thus, the architecture suitable to the LSI can be formed.

Description

【発明の詳細な説明】本発明は、空間積和演算等の局所近傍画像処理を実行す
る並列画像処理プロセッサに係り、特にＬＳＩ化に適し
たアーキテクチャを有する並列画像処理プロセッサに関
する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a parallel image processing processor that performs local neighborhood image processing such as spatial product-sum operations, and particularly to a parallel image processing processor having an architecture suitable for LSI implementation.

画像処理プロセッサは、通産省大型プロジェクト「パタ
ーン情報処理システム」　（昭和５５年１０月に研究開
発成果発表論文集が発行されている。）にて開発されて
いるように、画像データを並列処理し高速化を図ろうと
しているものが多い。Image processing processors process image data in parallel at high speeds, as developed in the Ministry of International Trade and Industry's large-scale project "Pattern Information Processing System" (a collection of research and development results was published in October 1980). There are many things that we are trying to change.

画像データは２次元の広がりをもつため、全ての画像デ
ータを並列処理することは困難である。しかし、ノイズ
除去や輪郭抽出機能を実現する空間積和演算等のように
、近傍の画像データ間の演算が多いため、例えば画像の
ｍ行×ｎ列の局所的なデータを並列処理する例が多い。Since image data has a two-dimensional spread, it is difficult to process all image data in parallel. However, since there are many calculations between neighboring image data, such as spatial product-sum calculations that realize noise removal and contour extraction functions, for example, it is difficult to process local data in m rows by n columns of an image in parallel. many.

このような局所並列形画像処理は、前記文献あるいは木戸出正継二画像処理ハードウェアの動向：情報処理コ
ンピュータビジョン研究会資料８−６（１’９８０年９
月）にて総括的に説明されているが、ＣＣＤアナログ処
理形を除いてＬＳＩ化されたものはない。従来のアーキ
テクチャのプロセッサをそのままＬＳＩ化するには、 ■　集積度 ■　ピン数の点で困難がある。This type of locally parallel image processing is described in the above-mentioned literature or Masatsugu Kido, Trends in Image Processing Hardware: Information Processing Computer Vision Study Group Material 8-6 (September 1'980).
Although it has been comprehensively explained in 2011, there is no LSI version except for the CCD analog processing type. There are difficulties in converting a processor with a conventional architecture into an LSI as it is in terms of: 1) the degree of integration, and 2) the number of pins.

本発明の目的は、ＬＳＩ化に適したアーキテクチャを有
する並列画像処理プロセッサを提供するにある。An object of the present invention is to provide a parallel image processing processor having an architecture suitable for LSI integration.

本発明の特徴は、画像データ供給源がらの画像データを
取込み局所並列画像データ処理を行なう並列画像処理プ
ロセッサにおいて、画像データ入力ポートと、前記画像
データ入力ポートがらの画像データを順次取込む複数個
の第１のシフトレジスタと、前記各第１のシフトレジス
タの内容を入力して画像処理演算を行なう複数個のプロ
セッサエレメントと、前記各プロセッサエレメント内の
演算結果を各プロセッサエレメントごとに累積加算する
複数個の第１の演算回路と、前記複数個の第１の演算回
路の演算結果を取込む第２のシフトレジスタと、前段の
基本モジュールにおける演算結果データを入力する演算
結果データ入力ポートと、前記演算結果データと前記第
２のシフトレジスタ内にセットされた前記第１の演算回
路の演算結果の加算を行なう第２の演算回路と、前記第
２の演算回路の演算結果データを出力する演算結果デー
タ出力ポートとからなる画像処理プロセッサ基本モジュ
ールを、複数組並列配置した並列画像処理プロセッサに
ある。A feature of the present invention is that in a parallel image processing processor that takes in image data from an image data source and performs local parallel image data processing, it includes an image data input port and a plurality of parallel image processing processors that sequentially take in image data from the image data input ports. a first shift register, a plurality of processor elements that input the contents of each of the first shift registers and perform image processing operations, and cumulatively add the operation results in each of the processor elements for each processor element. a plurality of first arithmetic circuits, a second shift register that receives the arithmetic results of the plurality of first arithmetic circuits, and an arithmetic result data input port that inputs the arithmetic result data of the preceding basic module; a second arithmetic circuit that adds the arithmetic result data and the arithmetic result of the first arithmetic circuit set in the second shift register; and an arithmetic operation that outputs the arithmetic result data of the second arithmetic circuit. The parallel image processing processor includes a plurality of sets of image processing processor basic modules each consisting of a result data output port and a result data output port arranged in parallel.

以下、本発明を図示する実施例を用いて説明する。尚、
第１図〜第１０図は最近考えられている並列画像処理技
術の説明図、第１１図及び第１２図は本発明の一実施例
である。Hereinafter, the present invention will be explained using illustrative embodiments. still,
1 to 10 are explanatory diagrams of recently considered parallel image processing techniques, and FIGS. 11 and 12 are one embodiment of the present invention.

第１図は典型的な画像処理システムの構成を示すもので
、画像入力装置として工業用テレビジョンカメラ５２画
像記憶装置として画像メモリ３゜及びこの内容を表示す
るＣＲＴモニタ４が設けられている。画像メモリ３の画
像情報が画像処理プロセッサ２により処理され、この結
果がまた画像メモリ３に格納されたり、あるいはシステ
ム全体を制御する管理プロセッサ１に与えられる。FIG. 1 shows the configuration of a typical image processing system, which includes an industrial television camera 5 as an image input device, an image memory 3 as an image storage device, and a CRT monitor 4 for displaying the contents thereof. The image information in the image memory 3 is processed by the image processor 2, and the results are also stored in the image memory 3 or provided to the management processor 1 which controls the entire system.

代表的な画像処理機能として空間積和演算がある。これ
は第２図に示すように、例えば４×４画素の局所画像デ
ータｆ！１〜ｆ４４に対し、定められた荷重Ｗｌｌ〜Ｗ
４４を乗算し総和をとるものである。A typical image processing function is spatial product-sum operation. As shown in FIG. 2, this is, for example, 4×4 pixel local image data f! 1 to f44, the determined loads Wll to W
44 and calculates the sum.

これによりノイズ除去輪郭強調等の画像処理が行える。This results in noise removal Contour enhancement Image processing such as

このような１例えば４Ｘ４画素の局所画像データを処理
する画像処理プロセッサとして、第３図に示すような４
個のプロセッサエレメント（ＰＥ＃１〜＃４）１２をも
つ画像処理プロセッサ基本モジュール１０を４モジユ一
ル組合せた並列画像処理プロセッサ（タイプＩと呼ぶ）
２−Ｉとしている。画像メモリ３からは、局所画像デー
タが１列分（第３図ではｆ１４〜ｆ４４）並列に与えら
れ、その演算結果（第３図ではｇ）が画像メモリ３に格
納される。As an image processing processor that processes local image data of 4×4 pixels, for example, a 4×4 image processor as shown in FIG.
A parallel image processing processor (referred to as type I) that combines four image processing processor basic modules 10 each having 12 processor elements (PE#1 to #4).
2-I. One column of local image data (f14 to f44 in FIG. 3) is given in parallel from the image memory 3, and the calculation result (g in FIG. 3) is stored in the image memory 3.

基本モジュール１０は、処理対象の行の画像データを取
込む画像データ入力ポート２４、内部処理結果を出力す
る演算結果データ出力ポート３５をもつ。画像データｆ
１４が入力されたとき、シフトレジスタ１１を介して１
画素毎隣接した画素ｆ　１１　ｆ　ｓｘ、　　ｆ　１１
も対応するＰＥ＃４〜１に入力さ・れる。画素ｆｚｔは
、空間積和演算のサイズを４×４以上に拡張する場合の
ために、画像データ出力ポート２５から出力される。Ｐ
Ｅ１２には、シフトレジスタ１１からの処理対象の画像
データｆと、荷重記憶メモリ１５からの荷重データＷが
与えられ１乗算が実行される。この結果が４個のＰＥ１
２の結果を加算する演算回路１３により部分和がとられ
る。演算結果入力ポート３０から入力される部分和が演
算回路１４により次々と累算され、演算結果出力ポート
３５より次段の基本モジュール１０に出力される。The basic module 10 has an image data input port 24 that takes in image data of a row to be processed, and a calculation result data output port 35 that outputs internal processing results. image data f
When 14 is input, 1 is input through shift register 11.
Pixel by pixel adjacent pixel f 11 f sx, f 11
is also input to the corresponding PE#4-1. Pixel fzt is output from the image data output port 25 in case the size of the spatial product-sum operation is expanded to 4×4 or more. P
The image data f to be processed from the shift register 11 and the load data W from the load storage memory 15 are given to E12, and multiplication by 1 is executed. This result is 4 PE1
A partial sum is calculated by the arithmetic circuit 13 which adds the results of the two results. Partial sums inputted from the calculation result input port 30 are accumulated one after another by the calculation circuit 14, and outputted from the calculation result output port 35 to the basic module 10 at the next stage.

このようにして、基本モジュール１０を４段重ねること
により、最終基本モジュールＩＯＤからが出力される。In this way, by stacking the basic modules 10 in four stages, the final basic module IOD outputs.

このタイムチャートを第４図に示す。前述した演算が基
本クロック時間Δｔ１内に実行され結果ｇが出力され、
次のΔｔ１では１画素分だけ移動した４×４絵素の入力
画像に対する結果ｇが出力されることになる。したがっ
て、次々と入力され　。This time chart is shown in FIG. The above-mentioned operation is executed within the basic clock time Δt1 and the result g is output,
At the next Δt1, the result g for the input image of 4×4 picture elements shifted by one pixel is output. Therefore, they are input one after another.

る画像データに対する全ての４×４絵素の空間積和演算
結果が次々と出力される。The spatial product-sum calculation results of all 4×4 picture elements for the image data are output one after another.

第５図の実施例は、前述の実施例のタイプ１画像処理プ
ロセッサ２−Ｉの基本クロック時間Δｔ１を、パイプラ
イン処理により短縮化した構成を示すものである。これ
をタイプ■のパイプラインバージョンの並列画像処理プ
ロセッサ２−ＩＰと呼ぶ。即ち、タイプＩでは基本クロ
ック時間Δｔ１は ■　画像データｆ＋、ｊのシフトレジスタ１１への入力
処理 ■　プロセッサエレメント１２による積和荷重Ｗｌ、Ｊ
と画像ｆｌ、Ｊとの乗算処理 ■　演算回路１３による部分和処理 ■　演算回路１４による部分和累算処理の全ての処理時
間の和以上である必要があった。The embodiment shown in FIG. 5 shows a configuration in which the basic clock time Δt1 of the type 1 image processing processor 2-I of the previous embodiment is shortened by pipeline processing. This is called a pipeline version parallel image processing processor 2-IP of type (2). That is, in Type I, the basic clock time Δt1 is: ■ Input processing of image data f+, j to the shift register 11 ■ Product-sum load Wl, J by the processor element 12
Multiplication processing of images fl and J by image fl, J; Partial sum processing by arithmetic circuit 13; Partial sum accumulation processing by arithmetic circuit 14.

これに対して、例えば第５図の例のように、■と■、■
と■、及び■と■の間にパイプラインレジスタ１６を介
在させることにより、その基本クロック時間Δｔ２を■
〜■の処理時間のうちの最大のもの（全ての和でない）
まで小さくすることが可能になる。このタイムチャート
を第６図に示す。On the other hand, for example, as in the example in Figure 5, ■, ■, ■
By interposing the pipeline register 16 between and ■, and between ■ and ■, the basic clock time Δt2 can be changed to ■
The maximum processing time of ~■ (not the sum of all)
It is possible to make it as small as possible. This time chart is shown in FIG.

時刻１で処理■、２で■、３で■、４で■が実行される
。時刻２では次の入力画像に対する処理■。Processing ■ is executed at time 1, ■ at time 2, ■ at time 3, and ■ at time 4. At time 2, the next input image is processed ■.

３で■、４で■、５で■が実行され、次々と各構成要素
をパイプライン的に動作させその処理速度を向上するこ
とができる。3, 4, and 5 are executed, and the processing speed can be improved by operating each component one after another in a pipeline manner.

第７図の実施例は、前述の並列画像処理プロセッサ２−
ＩＰの基本クロックΔｔ２を更に短縮化しうる構成を示
したもので、タイプ■のパイプラインースキューパージ
ョンの並列画像処理プロセッサ２−ＩＰＳと呼ぶ。第５
図のＩＰタイプでの基本クロック時間Δｔ２は、処理■
の部分和累積時間により制約される可能性が強い。とい
うのは基本モジュール１０をｎ段にした場合、Δｔ２は
演算回路１４での処理時間と演算結果３０．３５の入出
力時間との和のｎ倍の時間が必要になるからである。特
に基本モジュール１０をＬＳＩ化した場合は入出力遅延
時間は無視できない。このため、第５図のタイプｔｐに
更に部分和の累積のパスにパイプラインレジスタ１６を
入れ、基本モジュールｌ０Ａ−Ｄ間での演算もパイプラ
イン処理するようにしたもので、前述のΔし２の時間規
制を１／ｎにしている。この第７図のＩＰＳタイプでは
、第８図のタイムチャートで示すように、同時刻３で各
基本モジュールＩＯＡ〜Ｄの部分和が算出され累積の部
分でのタイミングが合わなくなる。第７図のＩＰＳでは
、このタイミング合せのための可変段数スキュー補正用
シフトレジスタ１７を画像データ入力ボート２４に直後
に設置している。各基本モジュールｌ０Ａ−Ｄの累積パ
スでのパイプライン段数は１段であるため、可変段数ス
キュー補正用シフトレジスタ１７の段数は、基本モジュ
ールＩＯＡ・・・・・・・・・Ｏ段〃　　　Ｂ・・・・
・・・・・１段Ｃ・・・・・・・・・２段Ｄ・・・・・・・・・３段に設定される。このようにして第８図のタイムチャート
における不整合（・・・部）が補正され、連続したΔｔ
３時間でのパイプライン動作が可能となる。The embodiment of FIG. 7 is based on the parallel image processing processor 2-
This shows a configuration in which the IP basic clock Δt2 can be further shortened, and is called a type (2) pipeline-skew version parallel image processing processor 2-IPS. Fifth
The basic clock time Δt2 in the IP type shown in the figure is the processing ■
There is a strong possibility that it is constrained by the partial sum accumulation time of . This is because when the basic module 10 has n stages, Δt2 requires n times the sum of the processing time in the arithmetic circuit 14 and the input/output time of the arithmetic result 30.35. In particular, when the basic module 10 is implemented as an LSI, the input/output delay time cannot be ignored. For this reason, a pipeline register 16 is further added to the type tp shown in FIG. The time regulation is set to 1/n. In the IPS type shown in FIG. 7, as shown in the time chart of FIG. 8, the partial sums of the basic modules IOA to D are calculated at the same time 3, and the timing of the cumulative part does not match. In the IPS shown in FIG. 7, a variable stage skew correction shift register 17 for timing adjustment is installed immediately after the image data input port 24. Since the number of pipeline stages in the cumulative path of each basic module l0A-D is one stage, the number of stages of the variable stage number skew correction shift register 17 is as follows: basic module IOA...O stages B. ...
...1st step C...2nd step D...3rd step. In this way, the mismatch (... part) in the time chart of FIG. 8 is corrected, and the continuous Δt
Pipeline operation can be completed in 3 hours.

なお、容易にわかるように、スキュレジスタ１７は、部
分和を求める演算回路１３の直後に設置しても、あるい
は各ＰＥ１２の直前、直後に設置しても同様にタイミン
グの不整合は解決される。As can be easily seen, the timing mismatch is similarly resolved even if the skew register 17 is installed immediately after the arithmetic circuit 13 that calculates the partial sum, or even if it is installed immediately before or after each PE 12. .

第９図に、処理形態が異なる他の実施例を示す。FIG. 9 shows another embodiment with a different processing form.

前述までのタイプ■の構成では、画像データ入力をシフ
１−レジスタ１１を介して各ＰＥ１２＃１〜−４に隣接
する絵素を分配していた。これに対し本実施例では、入
力画像データは各ＰＥ１２＃１〜４に共通に与え、この
乗算結果を演算回路１８゜レジスタ１９を介して累算し
て部分和Σ１を出力するようにしている。この動作を第
１０図のタイムチャートを参照して説明する。In the configuration of type (2) described above, image data input is distributed to adjacent picture elements to each PE 12 #1 to -4 via the shift 1 register 11. In contrast, in this embodiment, the input image data is commonly given to each PE 12 #1 to #4, and the multiplication results are accumulated via the arithmetic circuit 18° register 19 to output the partial sum Σ1. . This operation will be explained with reference to the time chart of FIG.

時刻１で画像データ入力ポート２０より画像ｆｌｔが入
力され、ＰＥ１２＃１にて荷重記憶メモリ１５から読み
出された荷重Ｗｌｌとの積ｆ１１−Ｗｌｌがレジスタ１
９＃２にセットされる。At time 1, the image flt is input from the image data input port 20, and the product f11-Wll with the load Wll read out from the load storage memory 15 at PE12#1 is stored in register 1.
9 #2 is set.

時刻２で画像データｆｘｚが入力され、ＰＥ１２＃２に
て荷重ｗｔ２との積ｆ　１ｚ＊　ｗｚｚがとられ、これ
とレジスタ１９＃２の値ｆｌ１ｍＷ１１との和ｆ１１＊
ｗ１ｔ＋ｆｔ２＊Ｗ１２が演算回路１８でとられ、レジ
スタ１９＃３にセットされる。Image data fxz is input at time 2, and PE 12#2 takes the product f1z*wzz with load wt2, and calculates the sum f11* of this and the value fl1mW11 of register 19#2.
w1t+ft2*W12 is taken by the arithmetic circuit 18 and set in the register 19#3.

時刻３で画像データｆｚａが入力され、ＰＥ１２＃３に
て荷重Ｗ１３との積ｆ　１ｓ−Ａ１３がとられ、これと
レジスタ１９＃３の値ｆ　１１　＊　ｗｔｔ十ｆ　１２
嘲Ｗ１２との和ｆｌｌ傘Ｗｌｌ＋ｆｔｚ傘Ｗ１２＋　ｆ
　ｔａ傘Ｗ１３が演算回路１８でとられ、レジスタ１９
＃４にセットされる。Image data fza is input at time 3, and the product f1s-A13 with the load W13 is taken at PE12#3, and this and the value of register 19#3 f11 * wtt + f12
Japanese full umbrella with mock W12 Wll + ftz umbrella W12 + f
The ta umbrella W13 is taken by the arithmetic circuit 18, and the register 19
Set to #4.

時刻４で画像データｆｘ４が入力され、ＰＥ１２＃４に
て荷重Ｗ１４との積ｆｔ４傘Ｗ１４がとられ、これとレ
ジスタ１９＃４の値ｆ１を申ｗ１ｔ＋ｆｘ２ネｗｘｚ＋
ｆｔ３＊ｗｔａとの和Σ１１＝ｆｌｉ本Ｗｌｌ十〜＋ｆ
ｘ４＊ｗｔａが演算回路１８でとられる。この部分和Σ
１が各基本モジュールｌ０Ａ−Ｄの演算回路１４で累積
され、最終段からが出力される。Image data fx4 is input at time 4, and PE12#4 takes the product ft4 umbrella W14 with load W14, and combines this with the value f1 of register 19#4 to obtain w1t+fx2newxz+
Sum of ft3*wta Σ11=fli book Wll 10~+f
x4*wta is taken by the arithmetic circuit 18. This partial sum Σ
1 is accumulated in the arithmetic circuit 14 of each basic module l0A-D, and is output from the final stage.

以下、各基本クロック時間Δｔ４間隔で空間積和演算結
果ｇが出力される。Thereafter, the spatial product-sum calculation result g is output at intervals of each basic clock time Δt4.

このタイプＨの並列画像処理プロセッサ２−ｎにも、タ
イプ！と同様に、タイプ■Ｐ及び■ＰＳが考えられ、基
本クロック時間Δｔ４を小さくすることが可能である。This type H parallel image processing processor 2-n also has type! Similarly, types ■P and ■PS can be considered, and it is possible to reduce the basic clock time Δt4.

これ、らは容易に類推できるのでここでは省略する。Since these and others can be easily inferred, they will be omitted here.

第１１図は、本発明による並列画像処理プロセッサの一
実施例を示す。前述までの各ＰＥ１２に独立に積和荷重
（メモリ）１５を与えていた方式に対し、第１１図の構
成では全ＰＥ１２共通に積和荷重（メモ１月１５を与え
る方式でありタイプ■の並列画像処理プロセッサ２−ｍ
と呼ぶ。この動作を第１２図のタイムチャートを参照し
て説明する。FIG. 11 shows an embodiment of a parallel image processing processor according to the present invention. In contrast to the method described above in which a sum-of-products load (memory) 15 was given to each PE 12 independently, the configuration shown in Fig. 11 is a method of giving a sum-of-products load (memory 15) to all PEs 12 in common, and is a parallel type of type ■. Image processing processor 2-m
It is called. This operation will be explained with reference to the time chart of FIG.

まず時刻１で既に画像データ入力ポート２０より画像ｆ
１４が入力されているとする。このときシフトレジスタ
１１を介してＰＥ１２＃１〜＃４にはそれぞれｆｔｔｔ
　ｆｔｚ、　ｆ□３．ｆ１番が与えられている。そして
荷重記憶メモリ１５から荷重Ｗｉｔが読み出され、それ
ぞれの入力画像との積がとられる。演算回路２０では、
時刻１のはじめに保持している値が１′０”クリアされ
、前述のｆ１１〜ｆ１４とＷｌｌとの積がそれぞれ保持
される。First, at time 1, the image f has already been input from the image data input port 20.
Assume that 14 is input. At this time, fttt is sent to PE12 #1 to #4 through the shift register 11.
ftz, f□3. The f1 number is given. Then, the load Wit is read out from the load storage memory 15 and multiplied by each input image. In the arithmetic circuit 20,
The value held at the beginning of time 1 is cleared to 1'0'', and the products of the aforementioned f11 to f14 and Wll are held respectively.

時刻２では画像ｆ１５が入力され、ＰＥ１２＃１〜＃４
にはそれぞれｆ１２〜１５が与えられ、次の荷重Ｗ１２
どの積がとられる。この後演算回路２０で以前の値との
累積処理が行われる。例λばＡ１で２はｆ　１１１　Ｗ
１！＋　ｆ　１２１　Ａ１２、Ａ２ではｆ　１２１　ｗ
１ｔ＋　ｆ　１ｓ’ｓ　Ａ１２が結果として保持される
。At time 2, image f15 is input, and PE12 #1 to #4
are respectively given f12 to f15, and the next load W12
Which product is taken? Thereafter, the arithmetic circuit 20 performs an accumulation process with the previous value. For example, λ is A1 and 2 is f 111 W
1! + f 121 A12, f 121 w in A2
1t+f 1s's A12 is retained as the result.

時刻３，４でも同上の処理が実行され、演算回路２ｏ＃
１〜＃４にはＡ１　：Σｊｌ：：　ｆ　ｔｔ　＊　ｗｔｔ＋　ｆ　１
２阜Ｗ１２＋　ｆ　１３”　ｗ１ａ＋　ｆ　１４”　Ｗ
１４＃２：Σｉ２：＝４　ｔｚ＊　Ａ１１＋　ｆ　１３
１　Ｗ１２＋ｆ１４申ｗ１ａ＋　ｆ　ｘｅ、＊　Ｗ１４
＃３：Σｌδ＝　ｆ　ｌａ＊　ｗｔｔ＋　ｆ　１４６　
Ａ１２＋　ｆ　１５＊　ｗｔａ＋　ｆ　１Ｂ”　Ｗ１４
＃４：Σｅａ＝ｆｔ４申Ｗ１１＋　ｆ　ｔｓ＄　Ｗ１２
十ｆｔ８本ｗｉａ＋　ｆ　１？−ＷＬ４とそれぞれの第
１部分和が得られ、これが時刻Δの終りでシフトレジス
タ２１にセットされる。The same process is executed at times 3 and 4, and the arithmetic circuit 2o#
A1 :Σjl:: f tt * wtt + f 1 for 1 to #4
2F W12+ f 13” w1a+ f 14” W
14#2:Σi2:=4 tz* A11+ f 13
1 W12 + f14 monkey w1a + f xe, * W14
#3: Σlδ= f la * wtt + f 146
A12+ f 15* wta+ f 1B” W14
#4: Σea=ft4 monkey W11+ f ts$ W12
10 ft 8 wia + f 1? -WL4 and the respective first partial sums are obtained, which are set in the shift register 21 at the end of time Δ.

時刻５〜８では、各基本モジュールｌ０Ａ−Ｄのシフト
レジスタ２１から、Σｉｆ〜Σ１１．Σ）２〜Σ１２．
Σ）３〜Σ１３．Σｉ４〜Σ１４１が演算回路１４によ
り順次累積され、結果ｇｌｌ〜ｇ１４を出力する。At times 5 to 8, Σif to Σ11. Σ)2 to Σ12.
Σ)3 to Σ13. Σi4 to Σ141 are sequentially accumulated by the arithmetic circuit 14, and the results gll to g14 are output.

と同時に、ＰＥＡ１では画像データｆｘδ〜ｆ１６゜Ｐ
ＥＡ２ではｆｚａ〜ｆｔｏ、ＰＥＡ３ではｆ１７〜ｆｚ
ｏ、ＰＥＡ４ではｆ　ｈａ〜ｆ　ｚｌに対して時刻１〜
４と同様の処理が実行され、部分和Σ１５．Σ１Ｇ。At the same time, in PEA1, image data fxδ~f16°P
fza~fto in EA2, f17~fz in PEA3
o, in PEA4, time 1 to f ha to f zl
4 is executed, and the partial sum Σ15. Σ1G.

Σｉ７．Σ１８を求め１時刻９〜１２にてこれらが累積
され結果ｇ１６〜ｇ１δが得られる。このようにして連
続して空間積和演算結果が出力される。Σi7. Σ18 is calculated and these are accumulated at times 9 to 12 to obtain results g16 to g1δ. In this way, spatial product-sum calculation results are continuously output.

このタイプ■の並列画像処理プロセッサ２−ＩＩＩにも
、タイプ■と同様に、タイプ■Ｐ及び■ＰＳが考えられ
、基本クロック時間Δｔ５を小さくすることが可能であ
る。Similar to the type (2), types (2) P and (2) PS can be considered for the parallel image processing processor 2-III of the type (2), and it is possible to reduce the basic clock time Δt5.

さて、前述のタイプ■〜■までの実施例では、基本モジ
ュール１０間の演算は、部分和演算回路１４を直列接続
する形とし、この回路１４も基本モジュール内に含めて
いた。しかしＬＳＩ化のためにピン数が問題となる場合
には、例えば第３図の点線部のみ基本子ジュールとし、
モジュール間演算は外部で並列に行うことも可能である
。Now, in the embodiments of types (1) to (4) described above, calculations between the basic modules 10 are performed by connecting partial sum calculation circuits 14 in series, and this circuit 14 is also included in the basic module. However, if the number of pins becomes an issue for LSI implementation, for example, only the dotted line in Fig. 3 is set as the basic child Joule.
Inter-module operations can also be performed externally in parallel.

本発明によれば１局所並列画像プロセッサを少ない入出
力ポートでかつ規則的な配列のモジュールに分割できる
ため、ＬＳＩ化に適したアーキテクチャとすることがで
きる。特に、積和荷重を各プロセッサエレメントに共通
に与えるので、荷重係数を収納するＲＡＭを１個で共通
化でき、１個のポートですみ、ＬＳＩとして作り易いと
いう効果がある。According to the present invention, one locally parallel image processor can be divided into modules with a small number of input/output ports and a regular arrangement, so that an architecture suitable for LSI implementation can be achieved. In particular, since the product-sum load is commonly applied to each processor element, one RAM for storing the load coefficients can be used in common, only one port is required, and it is easy to manufacture as an LSI.

[Brief explanation of the drawing]

第１図〜第１０図は最近考えられている並列画像処理技
術の説明図であって、第１図は画像処理システムの構成
を示す図、第２図は局所並列処理の例を説明する図、第
３図、第５図、第７図、第９図は並列画像処理プロセッ
サの構成例を示すブロック図、第４図、第６図、第８図
、第１０図は各並列画像処理プロセッサのタイムチャー
トであり、第１１図は本発明による並列画像処理プロセ
ッサの一実施例図、第１２図は第１１図のタイムチャー
トである。２・・・並列画像処理プロセッサ、３・・・画像メモリ
。１０・・・画像処理プロセッサ基本モジュール、１１・
・・入力画像シフトレジスタ、１２・・・プロセッサエ
レメント、１３・・・部分和演算回路、１４・・・部分
和累算演算回路、１５・・・荷重記憶メモリ、１６・・
・パイプラインレジスタ、１７・・・（可変段数）スキ
ュー補正シフトレジスタ、１８・・・伝播・累積演算回
路、１９・・・伝播レジスタ、２０・・・累積演算回路
、２１・・・部分和出力シフトレジスタ、２４・・・画
像データ入力ポート、２５・・・画像データ出力ポート
。３０・・・演算結果データ入力ポート、３５・・・演算
結第　１　図第Ｚ図第３図、ｆ５４ｃ２ｕ　　　　＞ｔｚ　　・・・・ｆ＋　６　図）ＩＩ　　　Ｊｒｚ第ｇｒｂ第９図第１Ｏ図 ’ｉｎ　　　　２ｒｚ第　ｌ１図第１Ｚ図Figures 1 to 10 are explanatory diagrams of recently considered parallel image processing techniques, with Figure 1 showing the configuration of an image processing system, and Figure 2 showing an example of local parallel processing. , FIG. 3, FIG. 5, FIG. 7, and FIG. 9 are block diagrams showing configuration examples of parallel image processing processors, and FIG. 4, FIG. 6, FIG. 8, and FIG. 10 are block diagrams showing configuration examples of parallel image processing processors. FIG. 11 is a diagram of an embodiment of a parallel image processing processor according to the present invention, and FIG. 12 is a time chart of FIG. 11. 2... Parallel image processing processor, 3... Image memory. 10... Image processing processor basic module, 11.
... Input image shift register, 12... Processor element, 13... Partial sum calculation circuit, 14... Partial sum accumulation calculation circuit, 15... Load storage memory, 16...
- Pipeline register, 17... (variable number of stages) skew correction shift register, 18... Propagation/accumulation calculation circuit, 19... Propagation register, 20... Accumulation calculation circuit, 21... Partial sum output Shift register, 24... image data input port, 25... image data output port. 30...Arithmetic result data input port, 35...Arithmetic result No. 1 Fig. Z Fig. 3, f54c 2u > tz ... f+ 6 Fig.) II Jrz No. grb Fig. 9 Fig. 1 O'in 2rz Figure l1 Figure 1Z

Claims

[Claims]

1. In a parallel image processing processor that takes in image data from an image data supply source and performs locally parallel image data processing, an image data input port and a plurality of first processors that sequentially take in image data from the image data input port are provided. a shift register, a plurality of processor elements that input the contents of each of the first shift registers and perform image processing operations, and a plurality of processor elements that cumulatively add the operation results in each of the processor elements for each processor element. 1
an arithmetic circuit; a second shift register that receives the arithmetic results of the plurality of first arithmetic circuits; an arithmetic result data input port that inputs the arithmetic result data of the preceding basic module; a second arithmetic circuit that adds arithmetic results of the first arithmetic circuit set in the second shift register; and an arithmetic result data output port that outputs arithmetic result data of the second arithmetic circuit. A parallel image processing processor characterized in that a plurality of image processing processor basic modules consisting of the following are installed in parallel.