JPH0443461A

JPH0443461A - Matrix multiplication circuit

Info

Publication number: JPH0443461A
Application number: JP2150310A
Authority: JP
Inventors: Mikio Sasaki; 美樹男笹木
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 1990-06-08
Filing date: 1990-06-08
Publication date: 1992-02-13

Abstract

PURPOSE:To reduce the scale of hardware and to speed up a multiplication processing by providing a row number switching means which switches the designation of the row number of an element used as a multiplication coefficient in a coefficient matrix and an addition means which adds respective coefficient multiplication values outputted from respective storage means in a parallel multiplication means and which outputs it as the element of an output matrix. CONSTITUTION:Multiplied column elements k=1 - k=8 for one row in a multiplied matrix are collectively multiplied by multiplied row elements j=1 - j=8 for one row in the coefficient matrix by the parallel multiplication means. Respective multiplication results are added by the addition means 12 and they are outputted as the elements of the output matrix. The row number of the multiplied row element in the coefficient matrix is switched by the row number switching means 11 and the column number of the multiplied column element in the multiplied matrix is switched by a column number switching means 14. Thus, respective elements of the output matrix is operated. Thus, the scale of the circuit is reduced and the circuit can be applied to a high speed picture processing and the like.

Description

【発明の詳細な説明】「産業上の利用分野」この発明は特に画像圧縮処理に用いて好適な行列乗算回
路に関する。DETAILED DESCRIPTION OF THE INVENTION "Field of Industrial Application" The present invention relates to a matrix multiplication circuit particularly suitable for use in image compression processing.

「従来の技術」画像処理においては、原画像データの行列（マトリック
ス）に対し、いわゆる直交変換を施すことにより、情報
の圧縮が行われる。この直交変換の１つとして、Ｄ　Ｃ
’ｒ　（Ｄ　１ｓｃｒｅＬｅ　Ｃｏｓｉｎｅ　Ｔ　ｒａ
ｎａｆｏｒｅ；離散コサイン変換）が知られている。"Prior Art" In image processing, information is compressed by performing so-called orthogonal transformation on a matrix of original image data. As one of these orthogonal transformations, D C
'r (D 1screLe Cosine Tra
nafore (discrete cosine transformation) is known.

ここで、ＤＣＴについて、説明する。ＮＸＮ個（Ｎは整
数）の画像データ（この画像データは対応する画素の輝
度および色を表す）のマトリックスをｆ（ｉｊＸｉ＝　
１〜Ｎ、ｊ＝１−Ｎ）とすると、これに対するＤＣＴ正
変換値Ｆ　ｂ（ｕ、ｖ）は、下記式（１）のように与え
られる。Here, DCT will be explained. A matrix of NXN (N is an integer) image data (this image data represents the brightness and color of the corresponding pixel) is expressed as f(ijXi=
1 to N, j=1-N), the DCT positive transform value F b (u, v) for this is given as shown in equation (1) below.

Ｐｔ＞（ｕ、ｖ）Ｎ”　　　　　　ｉ＝１　　３＝１・・・・・・（１）ただし、ここで、ｕ−１−Ｎ　　、　　ｖ＝Ｉ〜ＮＣ（ｕ）−１／　ＦＴａｔ　　Ｌｉ＝　１Ｃ（ｕ）＝　
１　　　　　　　　、ａｔ　　ｕ＝２〜ＮＣ（ｕ）＝　
１　／　ＦＴａｔ　　ｕ＝　１Ｃ（ｕ）＝　ｌ　　　　
　　　　ａｔ　　ｕ＝　２〜Ｎまた。ＡｉｕおよびＡ　
ｉｖは、各々、ウオルシュ関数であり、下記のように与
えられる。Pt>(u, v) N" i=1 3=1 ......(1) However, here, u-1-N, v=I~NC C(u)-1/ FTat Li= 1C(u)=
1, at u=2~NC(u)=
1/FTat u= 1C(u)=l
at u= 2~N again. Aiu and A
iv are each Walsh functions and are given as follows.

そして、ＤＣＴ正変換値Ｆ　ｂ（ｕ、ｖ）（ｕ＝　ｌ　
−Ｎ　、ｖ＝Ｉ−Ｎ）が与えられた場合、下記式（２）
に示すＤＣＴ逆変換式により元の画像データ行列ｒ（ｉ
、Ｄ（ｉ＝　１−Ｎ、ｊ＝　ｌ〜Ｎ）を復元することが
できる。Then, the DCT positive transform value F b (u, v) (u= l
-N, v=IN) is given, the following formula (2)
The original image data matrix r(i
, D (i=1-N, j=l~N).

ｆ（ｉ、ｊ）Ｎ＝Σ　・　Σ　Ｃ（ｕ）・　Ｃ（ｖ）・　Ｆｂ（ｕ、■
）・　Ａｉｕ−Ａｊｖｕｎ　　ｖ＝１・・・・・・（２）上記式（１）において、Ｆ　ｂ（ｕ、ｖ）（ｕ＝　１〜
Ｎ　、ｖ＝１−Ｎ）は、元の画像を構成する各空間周波
数成分の係数に等しい。これらの係数を用いて上記式（
２）の逆変換を施し、画像データｆ（ｉ、ｊ）を求める
ことにより、元の画像が忠実に再現される。しかし、実
際には画像の統計的性質によりこれらの係数ｒ（ｉ、ｊ
）（ｉ＝　１−Ｎ　、ｊ＝　１−Ｎ）の内で視覚上重要
な直流成分および低次の成分に電力集中が起き、一方、
視覚上さほど重要でない高次の成分は低電力となる。そ
こで、低次の成分に多くのビットを割り当て、高次の成
分には比較的少ないビットを割り当てるかまたは全く伝
送信しないという処理を施して送信し、受信先にて上記
式（２）の逆変換を行って画像を再生することで、画像
データを伝送する場合におけるデータ量の縮減が行われ
る。f(i, j) N =Σ・Σ C(u)・C(v)・Fb(u,■
)・Aiu−Ajvun v=1 (2) In the above formula (1), F b(u, v)(u=1~
N, v=1-N) is equal to the coefficient of each spatial frequency component that makes up the original image. Using these coefficients, the above formula (
By performing the inverse transformation of 2) and obtaining image data f(i, j), the original image is faithfully reproduced. However, in reality, these coefficients r(i, j
) (i = 1-N, j = 1-N), power concentration occurs in visually important DC components and low-order components, and on the other hand,
Higher-order components that are not visually important have low power. Therefore, by allocating many bits to the low-order components and relatively few bits to the high-order components, or not transmitting them at all, the receiver receives the inverse of equation (2). By performing the conversion and reproducing the image, the amount of data when transmitting image data is reduced.

なお、このようなりＣＴを用いた画像圧縮処理について
は、例えば、特開昭６２−３１４７３号公報で説明され
ている。Incidentally, such image compression processing using CT is described in, for example, Japanese Patent Laid-Open No. 31473/1983.

「発明が解決しよっと４°る課題」ところで、上記直交変換は、一般に、原画像データを行
列化し、この行列に対して直交変換用の係数行列を乗算
することによって行われるが、この行列乗算処理を実行
するには、大規模なノ１−ドウエアを必要とし、しかも
、演算時間が長くかかつてしまう。一般には、原画像デ
ータ行列を幾つかの小ブロックに分割し、各小ブロツク
毎に、画像圧縮処理を行う方法が行われているが、この
方法の場合、必要とするハードウェアを小規模にするこ
とはできても、乗算に要する時間は短縮されないという
問題があった。また、画像圧縮処理の分野においては、
必要とするハードウェアの小規模化、演算処理の高速化
といった要求の他、さらに、希望する画像精度あるいは
圧縮度合に合わせ、原画像を所望のサイズのブロックに
分割して画像圧縮処理を行いたいという要求があった。``Problem to be solved by the invention'' By the way, the above-mentioned orthogonal transformation is generally performed by converting the original image data into a matrix and multiplying this matrix by a coefficient matrix for orthogonal transformation. Executing the multiplication process requires large-scale hardware and requires a long calculation time. Generally, a method is used in which the original image data matrix is divided into several small blocks and image compression processing is performed for each small block, but in this method, the required hardware is reduced. However, there is a problem in that the time required for multiplication cannot be shortened. In addition, in the field of image compression processing,
In addition to requests for smaller hardware and faster arithmetic processing, there is also a desire to perform image compression processing by dividing the original image into blocks of a desired size, depending on the desired image precision or degree of compression. There was a request.

この発明は上述した事情に鑑みてなされたもので、必要
とするハードウェアが小規模で済み、かつ、乗算処理の
高速実行が可能であり、しかも、被乗算行列をブロック
分割し各ブロック毎に係数行列を乗算する場合において
は、そのブロック分割の仕方を自由に切り換えることが
可能な行列乗算回路を提供することを目的としている。This invention was made in view of the above-mentioned circumstances, and requires only a small amount of hardware, and can perform multiplication processing at high speed. Moreover, the multiplicand matrix is divided into blocks, and each block is divided into blocks. It is an object of the present invention to provide a matrix multiplication circuit that can freely switch the method of dividing blocks when multiplying coefficient matrices.

「課題を解決するための手段」この発明は、被乗算行列を記憶し、指定された列番号に
属する各要素を被乗算列要素として出力する被乗算行列
記憶手段と、前記被乗算列要素の列番号の指定を切り換える列番号切
換手段と、前記被乗算行列の各列に対応した複数の記憶手段によっ
て構成され、各記憶手段を構成する各記憶領域には、所
定の係数行列における各係数の内、列番号が該記憶手段
に対応した１列分の係数要素′の各々と、前記被乗算行
列の各要素のとりうる範囲内の６値とを乗算して得られ
る各係数乗算値が記憶されており、各記憶手段に対し、
対応する被乗算列要素、および前記１列分の各係数要素
の内、乗算係数として使用するものの行番号を共にアド
レスとして入力し、前記各記憶手段から、各被乗算列要
素に対応した係数乗算値を得ろようにした並列乗算手段
と、前記係数行列における乗算係数として使用ずろ要素の行
番号の指定を切り換える行番号切換手段と、前記並列乗算手段における各記憶手段から出ノＪされる
各係数乗算値を加算し、出力行列の要素として出力する
加算手段とを具備することを特徴としている。"Means for Solving the Problems" The present invention provides a multiplicand matrix storage means for storing a multiplicand matrix and outputting each element belonging to a designated column number as a multiplicand column element; It is constituted by a column number switching means for switching the designation of column numbers, and a plurality of storage means corresponding to each column of the multiplicand matrix, and each storage area constituting each storage means stores information for each coefficient in a predetermined coefficient matrix. Each coefficient multiplication value obtained by multiplying each of the coefficient elements' for one column whose column number corresponds to the storage means by six values within the range that each element of the multiplicand matrix can take is stored. and for each storage means,
The corresponding multiplicand column element and the row number of the one to be used as a multiplication coefficient among the coefficient elements for one column are input as addresses, and the coefficient multiplication corresponding to each multiplicand column element is input from each storage means. a parallel multiplication means configured to obtain a value; a row number switching means for switching the designation of a row number of a zero element used as a multiplication coefficient in the coefficient matrix; and each coefficient outputted from each storage means in the parallel multiplication means. It is characterized by comprising an adding means for adding the multiplied values and outputting it as an element of an output matrix.

「作用」上記構成によれば、被乗算行列における１列分の被乗算
列要素と係数行列における１行分の被乗算付要素との乗
算が並列乗算手段によって一括して行われ、各乗算結果
が加算手段によって加算されて出力行列の要素として出
力される。そして、係数行列における被乗算付要素の行
番号が、行番号切換手段によって切り換えられ、かつ、
被乗算行列における被乗算列要素の列番号が列番号切換
手段によって切り換えられることにより、出力行列の各
要素が演算される。"Operation" According to the above configuration, the multiplicand column elements for one column in the multiplicand matrix and the multiplicand elements for one row in the coefficient matrix are multiplied all at once by the parallel multiplication means, and each multiplication result is are added by the adding means and output as elements of the output matrix. and the row number of the multiplicand element in the coefficient matrix is switched by the row number switching means, and
Each element of the output matrix is calculated by switching the column number of the multiplicand column element in the multiplicand matrix by the column number switching means.

「実施例」以下、図面を参照してこの発明の一実施例について説明
する。"Embodiment" Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

第１図はこの発明の一実施例によるＤＣＴ用の行列乗算
回路の構成を示すブロック図である。なお、同図には、
」二足式（１）の演算の内、行列乗算ｉ＝ｌ　　　ｊ＝
１の実行に関係する部分のみが示されている。FIG. 1 is a block diagram showing the configuration of a matrix multiplication circuit for DCT according to an embodiment of the present invention. In addition, in the same figure,
” Among the operations of the bipedal formula (1), matrix multiplication i=l j=
Only the parts related to the execution of 1 are shown.

第１図において、１０はＤＣＴ処理すべき入力画像デー
タの行列［Ｘ　］＝　Ｘ　ｊｋ（ｊ＝　１〜Ｂ　、に＝
　１〜８）１式（３）におけるｆ（ｉ、Ｄに相当）を記
憶するＲＡＭ（ランダムアクセスメモリ）であり、所定
周期毎に入力画像データの中の８×８画素分のデータが
別途入力ソース等からサンプリング入力される。In FIG. 1, 10 is a matrix of input image data to be subjected to DCT processing [X] = X jk (j = 1 to B, =
1 to 8) This is a RAM (random access memory) that stores f (corresponding to i, D) in equation (3), and data for 8 x 8 pixels of input image data is input separately at a predetermined period. Sampling is input from a source etc.

１画素データは、ここではＮｉｎビットで構成されてお
り、例えば８ビツトである。また、ＲＡＭｌ０には、所
定周期毎に、アドレスデータとして、列番号ｊを指定す
る情報が与えられ、１列の画像データ［Ｘ　ｋ］＝　Ｘ
　ｊｋ（ｊ＝　１〜Ｂ　）が−括読み出シサれる。そし
て、この−列の画像データは、ラッチし、〜Ｌ、に取り
込まれるようになっている。ここで、ＲＡＭｌ０におけ
るアドレス切換およびラッチＬｌ−Ｌ８におけるデータ
取り込みは、この行列乗算回路が搭載されるシステムの
システムクロックの周期をＴとした場合、その３２倍の
周期Ｔ３２＝３２・Ｔ毎に発生されるクロックに同期し
て行われる。One pixel data is composed of Nin bits, for example, 8 bits. Further, information specifying the column number j is given to RAM10 as address data at every predetermined period, and one column of image data [X k]=X
jk (j=1 to B) are read out in bulk. The image data of this - column is then latched and taken into ~L. Here, address switching in RAMl0 and data fetching in latches Ll-L8 occur every 32 times the period T32=32·T, where T is the system clock period of the system in which this matrix multiplication circuit is installed. This is done in synchronization with the clock.

１〜８はＲＯＭ（リードオンリメモリ）、１１は行番号
切換回路である。第２図に示すように、ＲＯＭ１〜８の
各々の記憶領域は、画像データ行列［Ｘ］＝Ｘｊｋ（ｊ
＝　１〜Ｂ、に＝　１〜Ｂ）の各行に対応して８つのセ
クタ（第０セクタ〜第７セクタ）に分割されている。各
セクタには、上述したＤＣＴ用の係数Ｃ１ｊ（ｉ＝　１
〜８　、ｊ＝　１〜８）（上記式（３）におけるａｉｕ
−ａｉｖに相当）が割り当てられている。1 to 8 are ROMs (read only memories), and 11 is a row number switching circuit. As shown in FIG. 2, each storage area of ROM1 to ROM8 has an image data matrix [X]=Xjk(j
= 1 to B, and = 1 to B), and is divided into eight sectors (0th sector to 7th sector). Each sector has the above-mentioned DCT coefficient C1j (i=1
~8, j=1~8) (aiu in the above formula (3)
-aiv) is assigned.

さらに詳述すると、ＲＯＭＩを構成する８つのセクタに
は係数Ｃｉ、（ｉ＝１〜８）が、・・・　ＲＯＭ　ｊを
構成する８つのセクタには係数Ｃ１ｊ（ｉ＝　１−８）
が、−・・、ＲＯＭＢを構成する８つのセクタには係数
Ｃ１ａ（ｉ−１〜８）が各々割り当てられている。More specifically, the eight sectors that make up ROMI have coefficients Ci, (i=1-8)... The eight sectors that make up ROM j have coefficients C1j (i=1-8).
... Coefficients C1a (i-1 to 8) are respectively assigned to the eight sectors constituting the ROMB.

そして、ＲＯＭ１〜８の各々には、行番号切換回路！Ｉ
から出力される３ビツトの行番号切換データＣＴ３がセ
クタ指定データとして入力されると共に、ラッチＬｌ〜
８を介し、画像データＸ、に〜Ｘ、ｋが各々セクタ内ア
ドレスとして入力される。And each of ROM1-8 has a row number switching circuit! I
The 3-bit row number switching data CT3 output from the latches Ll to
8, image data X, to X, k are input as intra-sector addresses, respectively.

第３図は係数Ｃ１ｊが割り当てられたＲＯＭｊにおける
第（ｉ−１）セクタの各記憶領域の記憶内容を示したも
のである。ここで、１画素分の画像データＸｊｋがＮｉ
ｎビット（Ｎ　ｉｎは整数）で構成されるものとすると
、画像データＸｊｋのデータ範囲は０〜（２のＮｉｎ乗
−１）となり、各セクタ内にはこのデータ範囲に対応し
て（２のＮｉｎ乗）個の記憶領域（第０番地〜第（２の
Ｎｉｎ乗−１）番地）が設けられる。そして、第２図に
示すように、セクタ内アドレス「０」に対しＣ１ｊ・０
、セクタ内アドレスｒｌＪに対しＣ１ｊ−１１・・・と
いうように、セクタ内アドレスに係数Ｃ１ｊを乗算した
値が記憶される。このようにデータを書き込んでおくこ
とにより、画像データＸｊｋがラッチＬｊを介してＲＯ
Ｍｊにセクタ内アドレスとして入力されると、ＲＯＭ　
ｊからデータＣ１ｊ−ｘｊｋが読み出される。そして、
この実施例の場合、行番号切換データＣＴ３は、周期′
ｒ４＝４・Ｔ毎に行番号切換回路１１によって切り換え
られるようになっているため、ＲＯＭ　ｊの出力データ
ハ、ソレニ従ッテ、Ｃ１ｊ−Ｘ　ｊｋ％Ｃｉ＋１・Ｘｊ
ｋ、・・・と切り換えられる。この実施例におけるＲＯ
Ｍ１〜８のアクセス時間は４・Ｔ以内となっており、上
記行番号切換データＣＴ３が切り換えられてから、それ
に対応する出力データが得られるのに４・Ｔ以内の時間
を要する。FIG. 3 shows the storage contents of each storage area of the (i-1)th sector in ROMj to which the coefficient C1j is assigned. Here, image data Xjk for one pixel is Ni
Assuming that it is composed of n bits (N in is an integer), the data range of image data Xjk is 0 to (2 to the Nin power - 1), and each sector has (2 to the power Nin) storage areas (addresses 0 to (2 to the power of Nin - 1)) are provided. Then, as shown in FIG.
, C1j-11 for the intra-sector address rlJ, and so on, the value obtained by multiplying the intra-sector address by the coefficient C1j is stored. By writing data in this way, image data Xjk is transferred to RO via latch Lj.
When input as an intra-sector address to Mj, the ROM
Data C1j-xjk is read from j. and,
In this embodiment, the line number switching data CT3 has a cycle '
Since the row number switching circuit 11 switches every r4=4·T, the output data of ROM j is C1j−X jk%Ci+1·Xj
k,... can be switched. RO in this example
The access time for M1 to M8 is within 4·T, and after the line number switching data CT3 is switched, it takes within 4·T to obtain the corresponding output data.

第１図において、１２はＲＯＭ１〜８の各出力データ（
Ｎｏｕｔビットで構成される）を加算する加算回路、１
３は加算回路１２の出力データを保持するラッチである
。これらの加算回路１２およびラッチ！３について第４
図を参照して説明する。In FIG. 1, 12 indicates each output data (
an adder circuit for adding (consisting of Nout bits), 1
3 is a latch that holds output data of the adder circuit 12; These adder circuits 12 and latches! 4th about 3
This will be explained with reference to the figures.

第４図に示すように、ＲＯＭＩおよび２の出力は加算器
ＡＩによって加算され、ＲＯＭ３および４の出力は加算
器Ａ２によって加算され、ＲＯＭ５お上び６の出力は加
算器Ａ３によって加算され、ＲＯＭ７お上び８の出力は
加算器Ａ４によって加算される。これら加算器Ａｌ−Ａ
４の出力ビツト数は（ＮＯＬＩＬ＋　４　）ビットとな
っている。また、加算器ＡＩおよびＡ２の出力は加算器
Ａ５によって加算され、加算器Ａ３およびＡ４の出力は
加算４八〇によって加算される。これら加算器Ａ５、Ａ
６の出力ビツト数は（Ｎ　ｏｕｔ＋　２　）ビットとな
っている。さらに、加算ＢＡ５およびＡ６の出力は加算
器Ａ７によって加算される。この加算器Ａ７の出力ビツ
ト数は（Ｎｏｕｔ＋　３　）ビットとなっている。ここ
で、加算器ＡＩ−Ａ７の各加算速度はＲＯＭ　１〜８の
アクセスタイムと比べ充分高速である。従って、ＲＯＭ
１〜８の各出力データおよび加算器ＡＩ−Ａ７の各出力
データは、ＲＯＭ１〜８の各出力データが得られた直後
に、これら全出力のデータ位相が同一でかつ任意のデー
タを独立して選択アクセス可能な状態となる。なお、ノ
＼−ドウエアの動作速度上の制約等から上述した状態が
実現不可能な場合を想定してみると、最低限、ＲＯＭ１
〜８から出力データが入力されて加算器Δ７からその総
和が出力されるまでの所要時間が４・１゛以内であれば
、少なくとも後述の第５図に示す８ｘ８ＤＣＴの演算は
可能である。As shown in FIG. 4, the outputs of ROMI and 2 are added by adder AI, the outputs of ROM3 and 4 are added by adder A2, the outputs of ROM5 and 6 are added by adder A3, and the outputs of ROM7 and ROM1 are added by adder A3. The outputs of 8 and 8 are added by adder A4. These adders Al-A
The number of output bits of 4 is (NOLIL+4) bits. Further, the outputs of adders AI and A2 are added by adder A5, and the outputs of adders A3 and A4 are added by addition 480. These adders A5, A
The number of output bits of 6 is (N out + 2) bits. Furthermore, the outputs of adders BA5 and A6 are added by adder A7. The number of output bits of this adder A7 is (Nout+3) bits. Here, each addition speed of adder AI-A7 is sufficiently faster than the access time of ROMs 1-8. Therefore, ROM
Immediately after each output data of ROMs 1 to 8 is obtained, each output data of ROMs 1 to 8 and each output data of adder AI-A7 are processed so that the data phase of all these outputs is the same and any data is independently Selective access becomes possible. Furthermore, assuming that the above-mentioned state cannot be realized due to constraints on the operating speed of the node/ware, at least ROM1
If the time required from the input of output data from .about.8 to the output of the sum from adder .DELTA.7 is within 4.1", at least the 8.times.8 DCT operation shown in FIG. 5, which will be described later, is possible.

セレクタＳＥＬの入力ポートＰＩにはＲＯＭ　１〜８の
各出力データが入力され、入力ポートＰ２には加算器Ａ
ｌ−Ａ４の各出力データが入力され、人力ボートＰ３に
は加算器Ａ５およびＡ６の出力データが人力され、人力
ボートＰ４には加算器Ａ７の出力データが入力される。Each output data of ROMs 1 to 8 is input to the input port PI of the selector SEL, and the adder A is input to the input port P2.
Each output data of l-A4 is inputted, the output data of adders A5 and A6 is inputted to the human-powered boat P3, and the output data of the adder A7 is inputted to the human-powered boat P4.

そして、セレクタＳＥＬでは制御情報Ｃ０ＮＴに従って
入力ポートが選択されるとともに、この制御情報Ｃ０Ｎ
Ｔに応じて異なる速度のセレクタクロックに同期してボ
ート内における入力データの選択、切換、および出力が
行なわれる。Then, the selector SEL selects the input port according to the control information C0NT, and also selects the input port according to the control information C0NT.
Selection, switching, and output of input data within the boat are performed in synchronization with selector clocks having different speeds according to T.

セレクタＳＥＬの出力ポートから出力されたデータは順
次ラッチ１３に書き込まれる。ラッチ１３は１列分（こ
の実施例の場合、８個分）のデータの記憶領域を有し、
ラッチ１３への入力データはセレクタクロックに同期し
て各記憶領域に書き込まれる。また、入力データの書込
位置の制御は制御情報Ｃ０ＮＴに従って行なわれる。い
ま少し詳しく説明する。制御情報Ｃ０ＮＴが「０」で入
力ポートＰｌが選択された場合、前述したシステムクロ
ック　（周期Ｔ）の２倍の周波数を有するザブクロツタ
がセレクタクロックとして供給されこれにより周期Ｔ／
２毎にＲＯＭｌ−ＲＯＭ２→・・・−ＲＯＭＢというよ
うに入力元が切り換えられ出力ポートに接続される　（
もし、高速サブクロックが得られない場合、セレクタＳ
ＥＬの出力ポートを２ポート構成とし、周期Ｔ毎にＲＯ
ＭＩとＲＯＭ２−ＲＯＭ３とＲＯＭ４→・・・−ＲＯＭ
７とＲＯＭ８というように入力元を切り換え、各対人力
を２つの出力ポートにそれぞれ接続するようにすればよ
い）。また、制御情報Ｃ０ＮＴが「１）で入カポートＰ
２が選択された場合、周期Ｔ毎に加算器Ａｌ−Ａ２→Ａ
３−Ａ４と入力光が切り換えられて出力ボートに接続さ
れ、制御情報Ｃ０ＮＴが「２」で入カポ−１−Ｐ３が選
択された場合、周期２・Ｔ毎に加算器Ａ５−Ａ６と切り
換えられて出力ボートに接続され、制御情報Ｃ０ＮＴが
［３］で入カポ−）Ｐ４が選択された場合、加算器Δ７
の出力がＴ４全期間にわたり選択されて出力ボートに接
続される。ラッチ１３から出力される１列分のデータは
、順次バッファメモリ等（図示せず）へ転送され、以後
の画像処理等に利用される。Data output from the output port of the selector SEL is sequentially written into the latch 13. The latch 13 has a storage area for data for one column (in the case of this embodiment, eight pieces),
Input data to the latch 13 is written to each storage area in synchronization with the selector clock. Further, the writing position of input data is controlled according to control information C0NT. I'll explain it in a little more detail now. When the control information C0NT is "0" and the input port Pl is selected, the subclocker having twice the frequency of the system clock (period T) mentioned above is supplied as a selector clock, and thereby the period T/
The input source is switched as ROM1-ROM2→...-ROMB every 2 and connected to the output port (
If high-speed subclock is not available, selector S
The output port of EL is configured as 2 ports, and RO is output every period T.
MI and ROM2-ROM3 and ROM4→...-ROM
7 and ROM8, and connect each interpersonal power to the two output ports, respectively). In addition, if the control information C0NT is “1)”, the input port P
2 is selected, adder Al-A2→A every period T
3-A4 and the input light are switched and connected to the output port, and when the control information C0NT is "2" and the input port-1-P3 is selected, the adder is switched to the adder A5-A6 every cycle 2·T. When the control information C0NT is [3] and the input capo P4 is selected, the adder Δ7 is connected to the output port.
The output of is selected and connected to the output port for the entire period T4. One column of data output from the latch 13 is sequentially transferred to a buffer memory or the like (not shown) and used for subsequent image processing or the like.

以上から、第１図に示す行列乗算回路は、■ラッチし１
〜Ｌ８へ取り込む入力データ列の設定、■行番号切換デ
ータＣＴ３の設定、および■セレクタＳＥＬの制御情報
Ｃ０ＮＴの設定により、種々の行列乗算が可能となる。From the above, the matrix multiplication circuit shown in FIG.
Various matrix multiplications are possible by setting the input data string to be taken into ~L8, (1) setting the row number switching data CT3, and (2) setting the control information C0NT of the selector SEL.

以下、この行列乗算回路の動作を説明する。The operation of this matrix multiplication circuit will be explained below.

第５図（ａ　）および（ｂ　”）は、制御情報Ｃ０ＮＴ
＝ｒ３Ｊの場合を示す。FIG. 5(a) and (b'') show the control information C0NT.
The case where =r3J is shown.

これにより例えば、８Ｘ８ＤＣＴ、すなわち、８次の１
次元ＤＣＴＤＣＴ（Ｙ＝　　＠　　Ｃｉｊ　−Ｘｊｋ）が実現され
る。As a result, for example, 8X8DCT, 8th order 1
A dimensional DCT DCT (Y= @Cij −Xjk) is realized.

コニｌこれは、第５図（ａ　）に示すように、直交変換用の８
×８係数行列［Ｃ］および変換すべき８Ｘ８デ一タ行列
［Ｘ］の行列積［Ｙ］で表わされる。As shown in Figure 5(a), this is the 8
It is expressed as a matrix product [Y] of a ×8 coefficient matrix [C] and an 8X8 data matrix [X] to be converted.

第１図の構成各部がどのように動作するかを、第５図（
ｂ　）にタイムチャートとして示す。まず、データ行列
［Ｘ］の１列分の画像データ列［Ｘ　ｋ］”　Ｘ　＋に
−Ｘ　ｓｋがＲＡＭｌ０から読み出され、これらデータ
Ｘ、に−Ｘｓｋは、Ｔ３２の期間、ラッチＬ１−Ｌ８に
保持され、各々、ＲＯＭ１〜８にセクタ内にアドレスと
して供給される。そして、その間、周期Ｔ４間隔で行番
号切換データＣＴ３が「０」から「７」まで順次切り換
えられる。まず、行番号切換データＣＴ３が「０」のと
きには、上述した第２図および第３図の各ＲＯＭ内容ア
ドレスの説明から明らかなように、ＲＯＭ１〜８から各
々、Ｃ＋＋　・Ｘ　ｌｋｑ　　Ｃ、ｔ　・Ｘ　ｔｋ、−
１Ｃ，、−Ｘ、ｋが読み出され、加算回路１２に入力さ
れ、その結果、第４図に示す加算器Ａ７からこれらの総
和が出力され、セレクタＳＥＬを介し、ラッチ１３の第
１番目のメモリ位置に書き込まれる。次に、周期Ｔ４の
後、行番号切換データＣＴ３がｒｌＪに切り換えられる
と、ＲＯＭ１〜８から各々、Ｃ０・Ｘ、に、　Ｃｔｔ　
”　Ｘ　ｔｋ、・・・　Ｃｆｌ・Ｘ、ｋが読み出され、
同じく加算器Ａ７からその総和が、ラッチ１３の第２番
目のメモリ位置に書き込まれる。以下、行番号切換デー
タＣ’ｒ　３が［２」〜「７」と変わる毎に、同様の動
作が行なわれ、周期Ｔ３２の間にラッチ１３の第１〜第
８番目のメモリがすべて書き込まれ、これが出力積行列
［Ｙ］の１列分のデータ列［Ｙｋ］となる。そして、周
期Ｔ３２が切り換わると、ラッチ１３から出力データ列
［Ｙｋ］が出力され、図示しないバッファメモリ等へ転
送される。同時に、次の１列分の画像データ列［Ｘｋ＋
１１　＝　Ｘ　＋（ｋ”＋）　〜Ｘ　ｓ（ｋ”＋）が新
たに読み出されて、以後周期Ｔ３２にわたってラッチＬ
ｌ−Ｌ８に保持され、上述したデータ列［Ｘｋ］の場合
と同様の動作により、データ列［Ｘ　ｋ＋、］に対応し
た出力データ列［Ｙ　ｋ＋、］が得られ、同様にバッフ
ァメモリ等へ転送される。以下、同様の動作が行なわれ
、結局、周期Ｔ３２を８回繰り返すことにより、８ｘ８
の積行列［Ｙ］の各要素列［Ｙ　＋］・・・　［Ｙ、］
がバッファメモリ上等に順次算出されていき、最終的に
目的とする出力積行列［Ｙ］が得られる。この行列乗算
回路によれば、８×８Ｄ　Ｃ’ｒ　ｌ！ｉｉ像データ処
理か８−　’Ｉ’　：づ２．４−なわら２５６・Ｔの所
要時間で行なわれる。Figure 5 (
b) is shown as a time chart. First, the image data string [X k]'' for one column of the data matrix [X] is read out from the RAM l0, and these data The row number switching data CT3 is sequentially switched from "0" to "7" at intervals of period T4. First, when the line number switching data CT3 is "0", as is clear from the explanation of each ROM content address in FIG. 2 and FIG. X tk, -
1C. written to a memory location. Next, after cycle T4, when the row number switching data CT3 is switched to rlJ, Ctt is transferred from ROM1 to C0 to C0 and X, respectively.
"X tk,...Cfl・X,k is read out,
The sum, also from adder A7, is written to the second memory location of latch 13. Thereafter, a similar operation is performed every time the row number switching data C'r3 changes from [2] to "7", and all the first to eighth memories of the latch 13 are written during the period T32. , this becomes the data string [Yk] for one column of the output product matrix [Y]. Then, when the cycle T32 changes, the output data string [Yk] is output from the latch 13 and transferred to a buffer memory (not shown) or the like. At the same time, the next column of image data [Xk+
11 = X + (k”+) ~
The output data string [Y k+,] corresponding to the data string [X k+,] is obtained by the same operation as the data string [Xk] described above, and is similarly stored in the buffer memory, etc. be transferred. Thereafter, similar operations are performed, and in the end, by repeating the period T32 eight times, the 8x8
Each element column [Y +] of the product matrix [Y]...[Y,]
are sequentially calculated on the buffer memory, etc., and finally the desired output product matrix [Y] is obtained. According to this matrix multiplication circuit, 8×8D C'r l! Image data processing is performed in a time of 8-'I': 2.4-256.T.

この演算速度について、いま少し考察してみる。Let's take a moment to consider this calculation speed.

例えば、１フレーム３５２ｘ２８Ｂ画素の画像データを
処理する場合、その総所要時間は、２５６−Ｔ−（３５
２ｘ２８８）／　（８ｘ８）＝４０５５０４・Ｔとなる
。ＮＴＳＣ基準に準拠した画像処理を行なうには、３０
フレ一ム／秒で画像圧縮を行なう必要があり、この条件
を満足するシステムクロックの周期Ｔを求めると、’ｌ
’＝８２．２０ｎｓ、すなわち周波数で１２．１６５Ｍ
Ｈｚとなる。これは現在の集積回路技術を考えた場合、
充分クリアできる数値である。For example, when processing image data of 352 x 28 B pixels for one frame, the total time required is 256-T-(35
2x288)/(8x8)=405504·T. To perform image processing in accordance with NTSC standards, 30
It is necessary to compress the image at a rate of one frame per second, and finding the system clock cycle T that satisfies this condition is 'l
'=82.20ns, or 12.165M in frequency
Hz. Considering current integrated circuit technology, this means that
This is a sufficient number to clear.

また、上述した８ｘ８行列積の演算構成は、入カデータ
列［Ｘｋｌの設定を変更すれば、他にも応用可能である
。第６図は、マスク・フィルタリングへの応用を示すも
のである。ＤＣＴのようにブロック単位で符号化する場
合には、ブロック単位でマスク・フィルタリングを行い
、量子化雑音の平滑化をする場合が多いが、第６図のよ
うに、目的データ　（ここではＸ、。）の周辺にｑ在４
゛るデータ内容をデータ列［Ｘｋｌとして整形し、上述
した８×８行列積の演算構成回路！４の［Ｘｋ］入力端
に供給するとともにこの回路１４内の係数行列内容をマ
スク・フィルタリング用に設定し直し、かつ、目的デー
タの係数乗算を行なうビットシフト手段１５および加算
器１６を設ければ、必要とする３×３マスク・フィルタ
リングが簡単に実現できる。Furthermore, the above-described 8x8 matrix product calculation configuration can be applied to other applications by changing the setting of the input data string [Xkl. FIG. 6 shows an application to mask filtering. When encoding in block units like DCT, mask filtering is often performed in block units to smooth quantization noise, but as shown in Figure 6, the target data (here X, ) around 4
The data contents are formatted as a data string [Xkl, and the above-mentioned 8x8 matrix product arithmetic configuration circuit! If a bit shift means 15 and an adder 16 are provided for supplying the data to the [Xk] input terminal of the circuit 4, resetting the contents of the coefficient matrix in this circuit 14 for mask filtering, and performing coefficient multiplication of the target data, , the required 3×3 mask filtering can be easily achieved.

次に、この行列乗算回路により、４×４．２ｘ２．１×
１の各行列乗算を行なう場合について、応用例を含め、
第７図〜第９図に基づき説明する。Next, by this matrix multiplication circuit, 4×4.2×2.1×
For the case of performing each matrix multiplication of 1, including an application example,
This will be explained based on FIGS. 7 to 9.

第７図（ａ　）および（ｂ　）は、制御情報Ｃ０ＮＴ＝
ｒ２Ｊの場合を示す。これにより４Ｘ４ＤＣＴを実現で
きる。直交変換用の４個の４Ｘ４係数行列［Ｃコ、変換
すべき４個の４×４デ一タ行列［Ｘａｌ〜［Ｘｄｌ、お
よび対応する４個の出力行列積［Ｙａコ〜［Ｙｄ］は、
第７図（ａ　）のように各々８×８範囲に配列されてい
る。第７図（ｂ　”）のタイムチャートに各部の状態変
化を示し動作を説明する。この動作は、上述した８ｘ８
ＤＣＴの動作を上位と下位に分解し必要とされる４×４
ＤＣＴの演算を並列的に行なうようにしたものである。FIGS. 7(a) and (b) show control information C0NT=
The case of r2J is shown. This makes it possible to realize 4×4 DCT. The four 4×4 coefficient matrices for orthogonal transformation [C, the four 4×4 data matrices to be transformed [Xal~[Xdl], and the corresponding four output matrix products [Ya~[Yd] are ,
As shown in FIG. 7(a), they are each arranged in an 8×8 range. The operation will be explained by showing the state changes of each part in the time chart of Fig. 7(b'').
The required 4×4 by decomposing the DCT operation into upper and lower parts
The DCT calculations are performed in parallel.

周期Ｔ３２の前半期間に、ＲＡＭｌ０から行列［Ｘａコ
と　［Ｘｃ］、または行列［Ｘｂコと　［Ｘｄｌの各１
列分の要素が直列につながった形で１列分の画像データ
列［Ｘｋｌとして読み出され、同後半期間に次の１列分
のデータ列［Ｘ　ｋ＋１］が新たに読み出される。行番
号切換データＣＴ３が「０」〜「３」の期間は、画像デ
ータ列［Ｘ　ｋ、］の処理が行なわれ、セレクタＳＥＬ
は、加算器Ａ５から出力されるＲＯＭ１〜４の出力の総
和、および加算器Ａ６から出力されるＲＯＭ５〜８の出
力の総和を交互に選択して−これをラッチ１３上に図示
のごとく取り込んでいき出力データ列［Ｙｋ］を形成し
ていく。行番号切換データＣＴ３が「４」〜「７」の期
間は、画像データ列［Ｘ　ｋ＋、］の処理が行われ、セ
レクタＳＥＬは、加算器Ａ５から出力されるＲＯＭ１〜
４の出力の総和、および加算４八６から出力されるＲＯ
Ｍ５〜８の出力の総和を交ＪＪ］に選択し、これをラッ
チ１３上に取り込んでいき出力データ列［Ｙ　ｋ＋、］
を形成していく。In the first half of period T32, one each of matrices [Xa and [Xc] or matrices [Xb and [Xdl] is stored from RAM10.
One column of image data string [Xkl] is read out in a form in which the elements of the column are connected in series, and the next one column of data string [X k+1] is newly read out in the second half period. During the period when the row number switching data CT3 is "0" to "3", the image data string [Xk,] is processed, and the selector SEL
selects alternately the sum of the outputs of ROMs 1 to 4 outputted from adder A5 and the sum of the outputs of ROMs 5 to 8 outputted from adder A6, and captures these onto latch 13 as shown in the figure. Then, an output data string [Yk] is formed. During the period when the row number switching data CT3 is "4" to "7", the image data string [X k+, ] is processed, and the selector SEL selects the ROM1 to ROM output from the adder A5.
The sum of the outputs of 4 and RO output from adder 486
The sum of the outputs of M5 to M8 is selected as AC JJ], and this is loaded onto the latch 13, resulting in an output data string [Y k+,]
will continue to form.

このように周期Ｔ３２の期間に出力データ列が２列分作
成されるので、以後入力データ列として順次、データ列
［Ｘ　ｋ＋、］まで同様に入力して処理していくと、総
計１２８・Ｔの時間経過後には、４個の出力積行列　［
Ｙａ］〜［Ｙｄ］に相当する８×８範囲の出力結果がバ
ッファメモリ等に得られる。In this way, two output data strings are created during the period T32, so if the input data strings are sequentially input and processed in the same way up to the data string [X k+, ], the total will be 128·T. After the elapse of time, the four output product matrices [
Output results in an 8×8 range corresponding to [Ya] to [Yd] are obtained in a buffer memory or the like.

なお、上述した８ｘ８ＤＣＴの場合と同様、周期Ｔ３２
を繰り返し動作の基本周期とし、かつ、周期Ｔ４毎に１
つの加算器出力をセレクタＳＥＬに取り込みラッチ１３
に出力していくような、様動作の４×４ＤＣＴを行なう
場合には、行番号切換データＣＴ３を周期Ｔ４毎にｒＯ
Ｊ−ｒＯＪ−ｒｌＪ　−ｒｌＪ→ｒ２Ｊ−ｒ２Ｊ→ｒ３
Ｊ−ｒ３Ｊ　−ｒＯＪ　−ｒＯＪ−・・・というように
切り換えてやればよい。この場合当然演算速度は半分に
なる。Note that, as in the case of the 8x8DCT described above, the period T32
is the basic period of repetitive operation, and 1 every period T4
The output of the adder is taken into the selector SEL and the latch 13
When performing 4×4 DCT with various operations, such as outputting data to
J-rOJ-rlJ -rlJ→r2J-r2J→r3
It is sufficient to switch as follows: J-r3J-rOJ-rOJ-... In this case, the calculation speed will naturally be halved.

第８図（ａ　）および（ｂ　’）は、制御情報Ｃ０ＮＴ
＝　ｒｌＪの場合を示す。これにより２Ｘ２ＤＣ′Ｉ゛
を実現できろ。直交変換用の１６個の２×２係数行列　
［Ｃ］、変換すべき１６個の２×２デ一タ行列［Ｘａ３
〜　［Ｘｐ］、および対応する１６個の出力行列積［Ｙ
　ａ］〜［Ｙ　ｐ］は、第８図（ａ　）のように各々８
×８範囲に配列されている。第８図（ｂ　）のタイムチ
ャートに各部の状態変化を示す。FIG. 8(a) and (b') show the control information C0NT.
= rlJ is shown. With this, we can realize 2X2DC'I゛. 16 2x2 coefficient matrices for orthogonal transformation
[C], 16 2×2 data matrices to be converted [Xa3
~[Xp], and the corresponding 16 output matrix products [Y
a] to [Yp] are each 8 as shown in Figure 8(a).
They are arranged in a ×8 range. The time chart in FIG. 8(b) shows changes in the state of each part.

これによれば、総計６４・Ｔの時間で、１６個の出力積
行列［Ｙａ］〜［Ｙｐｌの集合に相当ずろ８×８出力が
バッファメモリ等に得られる。According to this, in a total time of 64·T, 8×8 outputs corresponding to the set of 16 output product matrices [Ya] to [Ypl are obtained in the buffer memory or the like.

この場合、行番号切換データＣＴ３が「２」−「７」の
期間に利用されるＲＯＭ１〜８の内容は、行番号切換デ
ータＣＴ３．６（ｒＯＪおよびｒｌＪの場合と全く同じ
であるので、ＲＯＭ１〜８の容量を減らすべく第８図（
ａ　）の係数行列を上！／４のみ（２×２係数行列［Ｃ
］が横に４個並んだ状態）とし、行番号切換データＣＴ
３を周期Ｔ４毎に、「０」→「１」→「０」→ｒｌＪ→
・・・と切換えるようにしてもよい。In this case, the contents of ROM1 to ROM8 used during the period when line number switching data CT3 is "2" to "7" are exactly the same as in the case of row number switching data CT3.6 (rOJ and rlJ), so ROM1 Figure 8 (
a ) coefficient matrix above! /4 only (2 × 2 coefficient matrix [C
] are lined up horizontally), and the line number switching data CT
3 every cycle T4, "0" → "1" → "0" → rlJ →
. . . may be switched.

また、さらにその応用として、２５６・Ｔの期間に各々
異なる４種類の２Ｘ２ＤＣＴを同時算出させることも可
能である。第８図（ａ　）に示される係数行列を上下方
向に４分割し、４個の異なる２×２係数行列［Ｃａ］〜
［Ｃｄ］を各分割部分に各々同じものを横に４個並べた
形とする。そして、上述した係数ＲＯＭの容量を減らし
た場合と同様の処理により、まず、最初の６４・Ｔ期間
で、例えば係数行列［Ｃｄ］に関し目的とする出力積行
列の集合を得、その後、順次係数行列［Ｃｂ］、・・・
［Ｃｄ］に関し同様に処理していく。各処理毎に異なる
２Ｘ２ＤＣＴの出力積行列の集合が得られるから、これ
らをラッチ１３からバッファメモリ等へ転送する際、こ
れらが互いに独立して利用できるように別々に配置して
やればよい（なお、係数行列［Ｃａ］〜［Ｃｄ］につい
て並列的に処理しても同様の結果を得ることができるの
はいうまでもない）。Furthermore, as an application thereof, it is also possible to simultaneously calculate four different types of 2X2DCT during a period of 256·T. The coefficient matrix shown in FIG. 8(a) is divided into four in the vertical direction, and four different 2×2 coefficient matrices [Ca] ~
Let [Cd] be in the form of four identical pieces arranged horizontally in each divided part. Then, by the same process as when reducing the capacity of the coefficient ROM described above, first, in the first 64·T period, a set of target output product matrices is obtained for the coefficient matrix [Cd], and then the coefficients are sequentially Matrix [Cb],...
[Cd] will be processed in the same way. Since a different set of 2X2DCT output product matrices is obtained for each process, when transferring these from the latch 13 to a buffer memory etc., it is only necessary to arrange them separately so that they can be used independently of each other (note that the coefficient It goes without saying that similar results can be obtained even if the matrices [Ca] to [Cd] are processed in parallel).

なお、２Ｘ２ＤＣＴの出力行列は、１行列についてみれ
ば、行列要素が４個であり、各要素のデータ長はこの実
施例では１６＋１ビツトであるので、特に高速性が要求
されなければ、シリアルボート出力とすることも充分可
能となる　（ただし、前述したＴ／２のサブクロックを
利用するか、または、システムクロックＴの場合セレク
タＳＥＬが２ボート出力構成となっていることが前提で
あるが）。Note that the output matrix of 2X2DCT has 4 matrix elements for one matrix, and the data length of each element is 16+1 bits in this example, so unless high speed is particularly required, serial port output (However, it is assumed that the T/2 sub-clock described above is used or, in the case of the system clock T, that the selector SEL has a two-vote output configuration).

第９図（ａ　）　、（ｂ　）および（ｃ　）は、制御情
報Ｃ０ＮＴ＝　ｒＯＪの場合を示す。これにより１×！
の行列積演算も実行できる。その応用例としては、例え
ば第９図（ａ　）に示すような、あるベクトル基底に対
し尺度変更を施し新しいベクトル基底を生成する変換が
ある。これは第９図（ｂ　）にして表される１　６ｘｌ
　６行列［Ｃ］に対し、尺度変更用の１Ｘ１６デ一タ行
列［Ｘ、］を用意し、これらを行列乗算することにより
、尺度変更された新たなベクトル基底Ｙ１〜Ｙ、の集合
に相当する１６Ｘ１６行列［Ｃ′〕を生成するものであ
る。FIGS. 9(a), (b) and (c) show the case where the control information C0NT=rOJ. This makes it 1x!
Matrix multiplication operations can also be performed. An example of its application is, for example, a transformation as shown in FIG. 9(a) in which a certain vector basis is scaled to generate a new vector basis. This is expressed as 16xl in Figure 9(b).
6 matrix [C], prepare a 1X16 data matrix [X,] for scale change, and multiply these by matrix, which corresponds to a set of new vector bases Y1 to Y, whose scale has been changed. A 16×16 matrix [C′] is generated.

第９図（ｃ　）のタイムチャートに各部の状態変化を示
す。これによれば、総計３２・Ｔの時間で、新たなベク
トル基底の集合に相当する行列［Ｃ′］がバッファメモ
リ等に得られる。この場合、セレクタＳＥＬからラッチ
１３への出力としては、８×８、すなわち６４個のデー
タが周期Ｔ３２の期間内に全て出力されることになり、
同セレクタＳＥＬは前述したＴ／２の高速サブクロック
をを利用するか、またはシステムクロックＴしか利用で
きない場合には同セレクタＳＥＬは２ポート出力構成と
することが必須となる。また、当然バッファメモリも、
特に省略等を考慮しなければ前述した８ｘ８ＤＣＴの場
合の８倍必要になる。The time chart in FIG. 9(c) shows changes in the state of each part. According to this, a matrix [C'] corresponding to a new set of vector bases is obtained in a buffer memory or the like in a total of 32·T time. In this case, 8×8, that is, 64 pieces of data are all output from the selector SEL to the latch 13 within the period T32.
The selector SEL must use the aforementioned T/2 high-speed subclock, or if only the system clock T can be used, the selector SEL must have a two-port output configuration. Also, of course, the buffer memory
If no particular omissions are taken into account, eight times as much is required as in the case of the 8x8 DCT described above.

「発明の効果」以上説明したように、この発明によれば、小規模で、か
つ、高速画像処理等に応用できる高速の行列乗算回路が
用意に実現できる。また、この行列乗算回路によれば、
入力データ列、ＲＯＭ切換タイミング、および出力接続
切換等の設定により、種々の行列乗算が可能となり、例
えば８×８．４×４．２ｘ２、ＩＸＩ等種々の直交変換
に対しても柔軟に対応することができるという効果が得
られる。"Effects of the Invention" As described above, according to the present invention, it is possible to easily realize a small-scale, high-speed matrix multiplication circuit that can be applied to high-speed image processing and the like. Also, according to this matrix multiplication circuit,
Various matrix multiplications are possible by setting the input data string, ROM switching timing, output connection switching, etc., and it can also flexibly support various orthogonal transformations such as 8 x 8.4 x 4.2 x 2, IXI, etc. You can get the effect that you can.

[Brief explanation of drawings]

第１図は、この説明の一実施例による行列乗算回路の構
成を示すブロック図、第２図は、同実施例におけるＲＯＭ１〜８の係数の割当
を示す図、第３図は、同実施例におけるＲＯＭｊの第（ｌ−１）セ
クタの記憶内容を示す図、第４図は、同実施例における加算回路１２およびラッチ
１３を示すブロック図、第５図（ａ　）および（ｂ　）は、同実施例を８×８で
乗算させる場合の動作を°示す説明図およびタイムチャ
ート、第６図は、同実施例を８×８で乗算させる場合の別の例
を示すブロック図、第７図（ａ　）および（ｂ　）は、同実施例を４×４で
乗算させる場合の動作を示す説明図およびタイムチャー
ト、第８図（ａ　）および（ｂ）は１、同実施例を２×２で
乗算させる場合の動作を示す説明図およびタイムチャー
ト、第９図（ａ　）　、（ｂ　）および（ｃ　）は、同実施
例をｔｘｔで乗算させる場合の動作を示す第１、第２の
説明図およびタイムチャートであるニー１　〜Ｂ　・　
ＲＯＭ、　　Ｉ　　０−ＲＡＭ。１１・・・行番号切換回路、１２・・・加算回路、１３
・・−ラッチ回路。FIG. 1 is a block diagram showing the configuration of a matrix multiplication circuit according to an embodiment of this description. FIG. 2 is a diagram showing the assignment of coefficients to ROMs 1 to 8 in the embodiment. FIG. FIG. 4 is a block diagram showing the adder circuit 12 and latch 13 in the same embodiment, and FIGS. 5(a) and (b) are the same. An explanatory diagram and a time chart showing the operation when the embodiment is multiplied by 8 x 8, Fig. 6 is a block diagram showing another example when the embodiment is multiplied by 8 x 8, and Fig. 7 ( a) and (b) are explanatory diagrams and time charts showing the operation when the same embodiment is multiplied by 4 x 4. Fig. 8 (a) and (b) are 1 and the same embodiment is multiplied by 2 An explanatory diagram and a time chart showing the operation when multiplication is performed; FIGS. 9(a), (b), and (c) are first and second explanatory diagrams showing the operation when the same embodiment is multiplied by txt. and the time chart Knee 1 to B.
ROM, I0-RAM. 11... Row number switching circuit, 12... Addition circuit, 13
...-Latch circuit.

Claims

[Scope of Claims] Multiplicand matrix storage means for storing a multiplicand matrix and outputting each element belonging to a designated column number as a multiplicand column element; and a column for switching the designation of the column number of the multiplicand column element. It is constituted by a number switching means and a plurality of storage means corresponding to each column of the multiplicand matrix, and each storage area constituting each storage means has a column number corresponding to the column number of each coefficient in a predetermined coefficient matrix. Each coefficient multiplication value obtained by multiplying each of the coefficient elements for one column corresponding to the means by each value within the possible range of each element of the multiplicand matrix is stored, and each storage means stores On the other hand, the corresponding multiplicand column element and the row number of the one to be used as a multiplication coefficient among the coefficient elements for one column are input together as addresses, and the data corresponding to each multiplicand column element is input from each storage means. Parallel multiplication means for obtaining coefficient multiplication values; Row number switching means for switching designation of row numbers of elements to be used as multiplication coefficients in the coefficient matrix; and each coefficient output from each storage means in the parallel multiplication means. 1. A matrix multiplication circuit, comprising: addition means for adding multiplication values and outputting the multiplication values as elements of an output matrix.