EP1018082A1 - Variable blockgrösse, zweidimensionale, inverse diskrete cosinus-transformationsvorrichtung - Google Patents

Variable blockgrösse, zweidimensionale, inverse diskrete cosinus-transformationsvorrichtung

Info

Publication number: EP1018082A1
Authority: EP; European Patent Office
Prior art keywords: idct; butterfly; stages; serial; transform
Prior art date: 1997-08-25
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.): Withdrawn

Application number

EP98942197A

Other languages

English (en)

French (fr)

Inventor

Kenneth D. Easton

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

Qualcomm Inc

Original Assignee

Qualcomm Inc

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

1997-08-25

Filing date

1998-08-24

Publication date

2000-07-12

1998-08-24 Application filed by Qualcomm Inc filed Critical Qualcomm Inc

2000-07-12 Publication of EP1018082A1 publication Critical patent/EP1018082A1/de

Status Withdrawn legal-status Critical Current

Links

Classifications

- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/147—Discrete orthonormal transforms, e.g. discrete cosine transform, discrete sine transform, and variations therefrom, e.g. modified discrete cosine transform, integer transforms approximating the discrete cosine transform

Definitions

the present invention relates to digital signal processing. More particularly, the present invention relates to a novel and improved variable block size 2-dimensional (2-D) inverse discrete cosine transform (IDCT) engine.
2-D 2-dimensional inverse discrete cosine transform
the 2-dimensional discrete cosine transform (IDCT) and inverse discrete cosine transform (IDCT) are important signal processing operations in digital image compression.
One such digital image compression application is in the area of high definition television (HDTV).
HDTV high definition television
ADC analog-to- digital converter
the resultant sampled data is then digitally processed to minimize the amount of data which must be transmitted and/or stored while retaining high picture quality.
a key element of the compression process is the 2-D discrete cosine transform wherein an NxN block of sampled data, or an image, is transformed from the time domain to the frequency domain.
the transformed data can be further processed with block codes, such as Huffman code, run length codes, and /or error correcting codes, such as convolutional codes and Reed-Solomon codes.
block codes such as Huffman code, run length codes, and /or error correcting codes, such as convolutional codes and Reed-Solomon codes.
An exemplary HDTV image compression scheme is disclosed in U.S. Patent No. 5,452,104, U.S. Patent No. 5,107,345, and U.S. Patent No. 5,021,891, all three entitled “ADAPTIVE BLOCK SIZE IMAGE COMPRESSION METHOD AND SYSTEM", and U.S. Patent No. 5,576,767, entitled • TNTERFRAME VIDEO ENCODING AND DECODING SYSTEM", all four patents are assigned to the assignee of the present invention and incorporated by reference herein.
the digitally encoded video waveform is transmitted and/or stored.
the reverse of the digital signal processing is performed to reconstruct the pixels of the original image.
the recovered image is provided to a digital-to-analog converter (DAC) which converts the reconstructed image back to an analog video waveform which can be displayed on a monitor or television.
DAC digital-to-analog converter
An important element in the decoding process is the inverse discrete cosine transform which transforms the frequency domain data back to the time domain.
the IDCT engine is required to run at a high output rate to reconstruct the original image at real time. Furthermore, since the IDCT engine is typically located within a consumer product, cost is a major consideration. The IDCT engine needs to be designed to operate at high speed with minimal complexity.
the digital image compression system typically processes the video signal on a frame-by-frame basis. Each video frame is further partitioned into NxN blocks. In most compression systems, the block size is fixed by the system design to simplify the implementation of the DCT and the IDCT engines.
variable block sizes can enhance the performance of the compression system under certain conditions to allow for optimal compression of the image and/or to improve the quality of the reconstructed image.
Variable block sizes can be used to take advantages of certain characteristics of the image.
variable block size DCT and IDCT engines are designed with banks of transform processors of various sizes. Each processor computes a different block size transform on the same data block. The transforms from the various processors are then combined into the desired composite transformed block. This approach can be unwieldy because of the large amount of required hardware and the complexity in coordinating the various hardware blocks.
the present invention is a novel and improved variable block size 2- dimensional (2-D) inverse discrete cosine transform (IDCT) engine.
the NxN data block is transformed by columns by a first 1-D IDCT processor.
the intermediate results from the first IDCT processor are temporarily stored in a transposition memory. Once all the columns have been processed, the intermediate results are transformed by rows by a second 1-D IDCT processor.
the output from the 2nd IDCT processor comprises the transformed output of the IDCT engine.
each data block is either a 16x16 transform or a mix of any combinations of 8x8 transforms, 4x4 transforms, and/or 2x2 transforms.
a 21-bit control signal precisely describes the desired partition and informs the IDCT engine to compute the proper combination of transforms.
different combinations of transforms can be easily performed by correctly ordering the input data, selectively combining the data before the butterfly stages, and controlling the additions and multiplications at each stage of butterfly. The unnecessary butterflies are placed in the bypass mode.
Serial adders and bit-serial multipliers greatly simplify the design since the computation is performed on only one bit of data at a time.
Serial computations also greatly simplify the routing crossbars between successive stages of butterflies. Because of the pipelined structure of the IDCT engine of the present invention, the throughput rate is maintained at the rate of one transformed point or pixel per clock cycle. This is the same throughput rate as with parallel computations. Only the processing delay is increased because of the serial nature of the computations.
the data block is first transformed by columns by the 1-D IDCT processor and the intermediate results are temporarily stored by columns in a transposition memory.
the second 1-D transform is performed only after all the columns have been transformed.
the intermediate results are concurrently written to memory by columns and read from memory by rows.
the memory is transposed, or alternates between column major and row major, over successive NxN blocks.
the intermediate result is read from a memory location and a new result is written to the same memory location within the same clock cycle.
the transposition memory reduces the memory requirement to one bank of memory of the same size as one NxN data block.
FIG. 1A, IB, and 1C are diagrams of an exemplary NxN image, a partitioned image, and a tree diagram corresponding to the partitioned image of the present invention, respectively;
FIG. 2 is a block diagram of an exemplary variable block-size 2-D IDCT engine of the present invention
FIGS. 3A-D are exemplary diagrams of a 2-point IDCT trellis, a 4-point IDCT trellis, an 8-point IDCT trellis, and a 16-point IDCT trellis of the present invention, respectively;
FIG. 4 is a block diagram of an exemplary 1-D IDCT processor of the present invention.
FIG. 5A and 5B are graphical diagram of a serial butterfly and block diagram of an exemplary implementation of the serial butterfly of the present invention, respectively;
FIG. 6A and 6B are block diagrams of an exemplary bit-serial multiplier of the present invention in word-wide representation and bit- wide representation, respectively;
FIG. 7 is a block diagram of an exemplary serial adder of the present invention.
FIG. 8 is a block diagram of an exemplary I/O buffer of the present invention.
Discrete cosine transform DCT
IDCT inverse discrete cosine transform
DCT transform is typically performed on the sampled data as one of a series of digital signal processing operations. Other operations are performed on the transformed data, including quantization, data compression, and error correcting coding.
the IDCT transforms the data from the frequency domain back into the time domain according to the following equation:
the DCT and IDCT transforms are separable transforms. This means that a 2-D transform can be broken down into two 1-D transforms.
a 2-D IDCT transform can be performed on a data block by first performing a 1-D IDCT transform on the columns of the data block. The intermediate results from the first IDCT transform are temporarily stored in a memory element. A second IDCT transform is then performed on the rows of intermediate results. The output from the second IDCT transform comprises the reconstructed pixels of the original image.
FIG. 1A illustrates a diagram of an exemplary data block.
N is equal to 16 although the present invention can be easily extended to other values of N.
FIG. 2 illustrates an exemplary block diagram of the 2-D IDCT engine 10 of the present invention.
the input data block comprising of IDCT coefficients
IDCT processors 20a and 20b are identical 1-D IDCT processors which perform the IDCT transform on the input data according to equation (2).
the intermediate results from IDCT processor 20a are provided to memory element 22 where they are temporarily stored by columns.
the intermediate results are then provided by rows to IDCT processor 20b.
IDCT processor 20b performs the 1-D IDCT transform and provides the transformed output, or the reconstructed image, to the subsequent digital signal processing block (not shown in FIG. 2).
the input data block is provided to IDCT processor 20a by columns and the intermediate results are provided to IDCT processor 20b by rows.
the data block can be provided to IDCT processor 20a by rows and to IDCT processor 20b by columns.
IDCT processors 20a and 20b are pipelined such that both IDCT processors 20 are active at the same time.
An important property of the IDCT is that a larger transform can be created by arranging the input data points, computing the sum on selective combinations of data points, and performing serial butterfly on the output of two smaller transforms. Serial butterfly is an operation which is described in detail below.
a 16-point IDCT is a butterfly of two 8-point IDCTs
each 8-point IDCT is a butterfly of two 4-point IDCTs
each 4-point IDCT is a butterfly of two 2-point IDCTs.
This property of the IDCT is well known in the art and is best illustrated by a trellis diagram.
a trellis diagram of a 2- point IDCT is shown in FIG. 3A
a 4-point IDCT is shown in FIG. 3B
an 8-point IDCT is shown in FIG. 3C.
the 2-point IDCT comprises a single butterfly stage. As shown in FIGS.
the 4-point IDCT comprises a stage of two 2-point IDCTs, a stage of serial add before the 2-point IDCT stage, and a butterfly stage after the 2-point IDCT stage.
the 8-point IDCT comprises a stage of two 4-point IDCTs, a stage of serial add before the 4-point IDCT stage, and a butterfly stage after the 4-point IDCT stage.
FIG. 3D The diagram of the 16-point IDCT trellis 100 of the present invention is illustrated in FIG. 3D.
Trellis 100 is derived by B.G. Lee and described in detail in a book by K.R. Rao, entitled “Discrete Cosine Transform : Algorithms, Advantages, and Applications", Academic Press, 1990.
the 16- point IDCT trellis 100 comprises three stages of serial add and four stages of serial butterfly, with each stage of butterfly comprising eight serial butterflies.
the interconnects between successive stages are fixed, thus limiting the IDCT processor to performing only 16-point IDCT transforms.
the stages are interconnected using reconfigurable trellis cross-connects.
16-point IDCT 110 is a butterfly of two 8-point IDCTs 108, with the data points to the lower 8-point IDCT selectively combined.
Two 8-point IDCTs 108 is a butterfly of four 4-point IDCTs 106, with the data points to the lower 4-point IDCTs selectively combined.
Four 4-point IDCTs 106 is a butterfly of eight 2-point IDCTs 104, with the data points to the lower 2-point IDCTs selectively combined.
the reconfigurable trellis cross-connects in combination with the bypass mode of the serial butterflies allow the 2-D IDCT engine 10 of the present invention to compute any arbitrary mix of transforms within an NxN block.
IDCT processor 20 can perform two 8-point transforms, eight 2-point transforms, one 8-point transform and two 4-point transforms, or one 8-point, one 4-point and two 2-point transforms.
IDCT processor 20 can perform two 8-point transforms, eight 2-point transforms, one 8-point transform and two 4-point transforms, or one 8-point, one 4-point and two 2-point transforms.
the serial butterflies operate on two input bit streams and provides two output bit streams.
the serial butterfly comprises one greatly simplified bit-serial multiplier and two serial adders.
the serial structure of IDCT processors 20 permits the routing crossbars between successive stages of serial butterfly to be implemented using only 1- bit wide data buses.
IDCT engine 10 computes the transform of the 16x16 block in 256 clock cycles. For each clock cycle, one IDCT coefficient is supplied to IDCT engine 10 and one output pixel is extracted from IDCT engine 10.
IDCT processors 20a and 20b are pipelined such that both processors are active concurrently. Each IDCT processor 20 receives one input data point and provides one transformed data point for each clock cycle.
I/O buffers 52 can be implemented as a parallel-to-serial shift registers as described below.
Serial adders 56 receive the data bits and perform serial additions of the bits in the manner described below.
Serial adders 56 are enabled by ADD_ENABLE which, in the exemplary embodiment, comprises 7-bits and corresponds to the first three stages of add shown in trellis 100 in FIG. 3D. Each serial add is represented by encircled dots 112 (only one encircled dot is labeled for simplicity). In the first stage, there are seven serial adds 112 which require four bits to enable/disable. In the second stage, there are two sets of three serial adds 112 which require two bits to control.
serial adders 56 can be controlled to compute the serial adds 112 as required by the first three stages of trellis 100 in FIG. 3D.
the bank of 16 serial adders 56 in FIG. 4 symbolically represents the functionality required by the first three stages of trellis 100.
serial adders 56 are provided to the first stage 58 of eight serial butterflies which perform the functions shown within eight 2- point IDCTs 104 in FIG. 3D.
Each serial butterfly is shown by the block diagram in FIG. 5B.
An exemplary implementation of the serial butterfly is described in detail below.
the first stage 58 of serial butterfly is always enabled such that IDCT processor 20 performs at least 2-point transforms.
the outputs from the first stage 58 are provided to the second stage 62 of serial butterfly through routing crossbar 60. Routing crossbar 60 interconnects the first two stages as shown by IDCT trellis 100.
the second stage 62 of serial butterfly can be selectively enabled to provide 4-point transforms. Since there are four sets of butterfly (see FIG. 3D), a 4-bit control is necessary to individually enable each set. The 4-bit control is part of the control signal labeled as MAP in FIG. 4.
the outputs from the second stage 62 of serial butterfly are provided to the third stage 66 through routing crossbar 64.
the third stage 66 comprises two sets of four serial butterflies as shown within two 8-point IDCTs 108 in FIG. 3D. Each set can be individually enabled by a 2-bit control.
the outputs from the third stage 66 are provided to the fourth stage 70 through routing crossbar 68.
the fourth stage 70 comprises one set of eight serial butterflies as shown within 16-point IDCTs 110 in FIG. 3D.
the serial butterflies can be selectively enabled by a 1-bit control.
the serial transformed data from the fourth stage 70 comprises the output from IDCT processor 20.
the 1-bit serial transformed data from the fourth stage 70 is routed to a bank of serial-to-parallel output buffers.
IDCT processor 20 provides the IDCT output in word serial fashion such that one output word is provided for each clock cycle.
the output buffers can be combined with the input buffers to form I/O buffer 52 as described in detail below.
controller 26 provides the control signals to IDCT processors 20a and 20b and memory element 22. These control signals synchronize IDCT processors 20a and 20b and memory element 22 and determine the reconstructed composite image. Controller 26 receives the ADDRESS input and the PQR input. The ADDRESS input informs controller 26 of the start of the data block. The PQR input comprises the three commands P, Q, and R which inform controller 26 of the desired block partition.
R equals to "1" indicates that the 16x16 block is to be divided into smaller 8x8 transform blocks
Q indicates that the 8x8 block is to be divided into smaller 4x4 transform blocks
P indicates that the 4x4 block is to be divided into smaller 2x2 transform blocks.
each block can be individually divided without regard to other blocks in the image.
1-bit control is required for R since there is only one 16x16 transform block within the 16x16 data block
4-bit control is required for Q since there can be four 8x8 transform blocks within the 16x16 data block
16-bit control is required for P since there can be sixteen 4x4 transform blocks within the 16x16 data block.
FIG. IB An exemplary partition of a data block 4 is shown in FIG. IB and an exemplary graphical illustration, e.g. tree diagram 6, of the PQR control corresponding to the image partition is shown in FIG. IC.
the 21-bit control PQR can be provided to controller 26 serially or in parallel.
the PQR input is a 2-D representation of the desired block partition.
Controller 26 parses the PQR input into 1-D column and row control signals. These column and row control signals are then used to generate the control signals to command IDCT processors 20a and 20b to perform the proper mix of transforms.
controller 26 commands IDCT processor 20a to perform two 4-point transforms and one 8- point transform for the first four columns of data.
controller 26 commands IDCT processor 20a to perform one 4-point transform, two 2-point transforms, and one 8-point transform. The process continues until all columns are processed.
the intermediate results from IDCT processor 20a are stored by columns in memory element 22.
controller 26 commands IDCT processor 20b to perform the proper mix of transforms on the rows of intermediate results from memory element 22. After all columns have been processed by IDCT processor 20a, controller 26 commands IDCT processor 20b to perform two 4- point transforms and one 8-point transform for the first four rows of intermediate results. For the next two rows, controller 26 commands IDCT processor 20b to perform one 4-point transform, two 2-point transforms, and one 8-point transform. Again, the process continues until all rows are processed.
control signals generated by controller 26 for IDCT processors 20a and 20b include WRITE_ENABLE, READ_ENABLE, ADD_ENABLE, and MAP.
WRITE-ENABLE controls the writing of the input data points into the proper I/O buffers 52 such that the input data points are arranged in the correct order (see FIG. 3D).
READ_ENABLE controls the order in which the transformed data is read from IDCT processor 20. In the exemplary embodiment, the transformed data can be sequentially read from IDCT processor 20.
ADD_ENABLE controls the first set of serial adders 56 which perform the adds in the first three stages of trellis 100. ADD_ENABLE is dependent on the desired mix of transforms and is generated in accordance with the PQR input.
MAP controls the last three stages 62, 66 and 70 of serial butterfly to produce the desired mix of transforms. MAP is also generated in accordance with the PQR input.
Four control bits are needed for the second stage 62 to individually enable or disable each of the four sets of butterfly (see FIG. 3D).
two control bits are needed for the third stage 66 and one control bit is needed for the fourth stage 70.
no control signal is needed for the first stage 58 since IDCT processor 20 always performs at least a 2- point transform.
a control signal can be generated to provide bypass of the first stage 58 if this is desired or necessary. Since the 2-D transform of the present invention is performed serially using two 1-D transforms, controller 26 delays the control signals to IDCT processor 20b, relative to IDCT processor 20a, to synchronize the control signals with the input data.
Controller 26 can be implemented as a combination of combinatory logic and state machine. Alternatively, controller 26 can be implemented with a micro-controller or a micro-processor running a microcode. Different implementations of controller 26 to perform the functions as described herein are within the scope of the present invention.
memory element 22 can be implemented as a transposition memory.
a 2-D transform is achieved by performing a 1-D transform on the columns of the input data block, storing the intermediate results, and performing a 1-D transform on the rows of the intermediate results.
the 1-D transform on the rows cannot be performed until all columns have been transformed.
the two 1-D transforms are pipelined such that both operate concurrently.
Memory element 22 can be implemented as a block of memory as illustrated in FIG. 1A. Assume that the intermediate results from IDCT processor 20a are initially written by columns to memory element 22. IDCT processor 20b cannot operate on the rows of intermediate results until all columns are operated on by IDCT processor 20a. Once the last column of memory element 22 is filled, the intermediate results are provided to IDCT processor 20b in rows. However, because of the pipelined structure, IDCT processor 20a provides one column of data for each row of data retrieved by IDCT processor 20b. This column of data cannot be written over a previous column since some data points in the previous column are still needed by IDCT processor 20b. To resolve this problem, the new column of intermediate results is written over the row of data just retrieved by IDCT processor 20b.
memory element 22 can be implemented with read- modify-write capability such that the same memory location can be read from and written to on the same clock cycle.
one data point can be read from one location of memory element 22 by IDCT processor 20b and that same location can be written to by IDCT processor 20a.
memory element 22 is transposed, or alternates between column major and row major, over successive 16x16 blocks. The transposition reduces the memory requirement to only one bank of memory.
the control signals to implement memory element 22 as a transposition memory are provided by controller 26. Controller 26 has the necessary timing information and can synchronize IDCT processors 20a and 20b and memory element 22 with the input data block.
Memory element 22 can be implemented using storage element or one of any number of memory devices that are known in the art, such as RAM memory devices, latches, or other types of memory devices.
FIG. 5A is a graphical illustration of the serial butterfly and FIG. 5B is a block diagram of the same serial butterfly.
Serial butterfly 140 operates on two inputs XI and X2. Input XI is delayed by delay element 148 to align the top and bottom signal paths.
serial adders 160a and 160b The outputs from delay element 148 and multipliers 150 are provided to serial adders 160a and 160b.
Serial adder 160a adds the output from multiplier 150 to the output from delay element 148 and serial adder 160b subtracts the output from multiplier 150 from the output from delay element 148.
the outputs from the serial adders 160a and 160b comprise the serial butterfly outputs Zl and Z2, respectively.
serial adders 160a and 160b are designed such that the adders can be turned off to allow Yl and Y2 to pass through as Zl and Z2, respectively.
serial butterfly 140 operates on two input bit streams and provides two output bit streams.
FIGS. 6A and 6B An exemplary block diagram of bit-serial multiplier 150 is shown in FIGS. 6A and 6B.
FIG. 6A illustrates bit-serial multiplier 150 in a word-wide representation
FIG. 6B illustrates the same multiplier 150 in a bit-wide representation.
a bit-serial multiply of X with C is achieved by successively adding C to the intermediate product term and shifting the result by one binary bit position. This is shown by the block diagram in FIG. 6A.
Latch 212 is cleared by the LD signal, which is enabled for one cycle out of every 16 clock cycles, to prepare latch 212 for the next multiply.
the LD signal also loads parallel-to-serial shift register 214 with the product term of the just completed multiply from adder 210. The product term is then shifted out serially from register 214 during the next multiply.
the precision of the input data X, the constant C, and the product Y is 16-bit.
16-bit of precision results in less arithmetic error than that specified by "IEEE Standard 1180-1990 : Specifications for the Implementations of 8x8 Inverse Discrete Cosine Transform.”
the 16-bit representation can comprise 1 sign bit, 9 magnitude bits, and 6 fractional bits. Other representations of less than 16-bit or more than 16-bit can be contemplated and are within scope of the present invention.
adder 210, latch 212, and register 214 are all implemented with 16-bit. For each clock cycle, one bit of X is shifted into bit-serial multiplier 150, with the LSB bit first. Depending on the value of the input bit and the LD signal, the constant C is added to the intermediate product term which is stored in latch 212.
AND gate 204 determines whether C is to be added to the intermediate product term based on the input bit and the LD signal.
the intermediate product term from adder 210 is then shifted by one bit position and stored back into latch 212, in bit positions D[14..0].
bit-serial multiplier 150 can be implemented with the same amount of hardware as an accumulator, which is very compact for IC design.
Bit-serial multiplier 150 is shown in further detail in FIG. 6B.
Adder 210, latch 212, and register 214 are shown in bit form. Depending on the value of input bit X and the LD signal, the constant C can be added to the intermediate product term which is stored in latch 212.
Each adder 210 receives a carry-in (Ci) input from latch 212 of the next less significant bit and provides a carry-out (Co) output to adder 210 of the next more significant bit. This is a standard carry chain of an adder.
the simple truncation of the LSB bits produces a slight negative bias on the twos complement output product term.
This slight negative bias can be offset by adding an LSB on the second to last adder 210a, which produces a half LSB of positive offset in the output product term.
the overall offset can be minimized by alternating the truncation and positive offset over successive multipliers 150.
the offset is controlled by the ROUND signal which can be hardwired high or low depending on the desired result.
An exemplary block diagram of serial adder 160 is shown in FIG. 7A.
Serial adder 160 serially receives two inputs Yl and Y2, with the LSB bit first.
the bypass mode allows IDCT processor 20 of the present invention to perform different mix of transforms.
the inputs Yl and Y2 are provided serially to AND gate 240 and XOR gate 242, respectively.
ADD_EN is also provided to AND gate 240. When ADD_EN is low, the output from AND gate 240 is low and Yl is not provided to adder 244. When ADD_EN is high, Yl is provided to adder 244. INVERT signal is provided to XOR gate 242 and register 246. To perform a subtraction, the input Y2 is converted to a negative number and added to the other operand. Conversion of a twos complement number to a negative number requires inverting all the bits in the original number and adding a one to the LSB bit position.
the inversion of the bits is performed by XOR gate 242 when the INVERT signal is high.
the one is added to the LSB bit position of the input number by storing a one at the start of the serial add, when the LD signal is enabled the INVERT signal is high, and adding this value to the carry-in (Ci) of adder 244.
the carry-out (Co) from the prior 1-bit add is stored in register 246.
the carry-out is added with the next set of bits from the two inputs Yl and Y2.
the sum output S from adder 244 represents the output from serial adder 160.
the constant C can be hardwired or mask programmable. Since the first stage 58 of butterfly is always performed in the exemplary embodiment, the constant C for bit-serial multipliers 150 for this stage can be hardwired. However, for the remaining stages 62, 66, and 70 of butterfly, the constant C can be mask programmable to allow multipliers 150 to perform multiplies of the input X2 with either 1/(2C*) or 1, when serial butterfly 140 is placed in the bypass mode. Multipliers 150 can also be loaded with other values of C to perform scaling or normalization of the input X2.
serial adder 160 can perform adds, subtracts, or bypass of the two inputs.
Serial adder 160 can be modified to perform the functions as required by serial butterfly 140. For example, referring to FIG.
serial adder 160a only performs adds or bypass. Therefore, serial adder
serial adder 160b only performs subtracts or bypass. Therefore, the
INVERT signal in serial adder 160 can be tied to a high reference.
Serial adder 160 can also be used to perform serial adds and bypass as required by serial adders 56 in FIG. 4 which implement the serial adds 112 required by the first three stages of trellis 100 as shown in FIG. 3D.
delay element 148 can be implemented with a series of latches. The number of latches is selected to match the processing delay of multiplier 150. V. I/O Buffer
a bank of 16 I/O buffers 52 receives the input data and provides the transformed data.
the input and output from IDCT processor 20 are provided in a word serial manner, or one complete data point per clock cycle.
the 16 data points are loaded into the 16 I/O buffers 52 in 16 clock cycles. Once all I/O buffers 52 are loaded, the 16 data points are provided to the IDCT trellis in a bit serial manner, one bit per clock cycle.
I/O buffers 52 also receive the transformed data bits from the final stage 70 of serial butterfly. The transformed data is provided serially to I/O buffer 52,
I/O buffer 52 comprises 16-bit latch 262, 16-bit parallel-to-serial shift register 264, 16-bit latch 266, and output buffer 268.
the IDCT input is provided to all latches 262 within the 16 I/O buffers 52.
Each I/O buffer 52 latches the IDCT input when directed by the control signal WR(w).
WR(w) is decoded from the WRITE_ENABLE signal which originates from controller 26.
Latch 262 within each I/O buffer 52 is enabled for only one out of every sixteen clock cycles. After the 16 data points have been latched by latches 262, the LD signal is enabled and the values stored in latches 262 are provided to registers 264.
the LSB register 264a serially shifts the data point to routing crossbar 54, one bit per clock cycle with the LSB bit shifted out first.
one transformed data bit is serially shifted into the MSB register 264q, one bit per clock cycle with the LSB shifted in first.
the LD signal loads register 264 with the next data point and loads latch 266 with the transformed data point. The transformed data is stored in latch 266 until it is read out through output buffer 268.
Output buffer 268 is selectively enabled such that the transformed data is provided serially from the 16 I/O buffers 52, one transformed data point per clock cycle.
the order of the read is controlled by the RD(w) signal which is decoded from READ_ENABLE.
the block diagram in FIG. 8 represents one implementation of I/O buffer 52.
Other implementations which perform the same functions as described herein can be contemplated and are within the scope of the present invention.
the present invention has been described in the context of a 2-D IDCT engine, the inventive concept can be extended to other transforms such as discrete Fourier transform (DFT), inverse discrete Fourier transform (IDFT), fast Fourier transform (FFT), inverse fast Fourier transform (IFFT), discrete cosine transform (DCT), and Hadamard transform. Therefore, application of the inventive concept as described herein to other transforms are within the scope of the present invention.
DFT discrete Fourier transform
IDFT inverse discrete Fourier transform
FFT fast Fourier transform

Landscapes

Physics & Mathematics (AREA)
Engineering & Computer Science (AREA)
General Physics & Mathematics (AREA)
Mathematical Physics (AREA)
Mathematical Analysis (AREA)
Mathematical Optimization (AREA)
Computational Mathematics (AREA)
Pure & Applied Mathematics (AREA)
Data Mining & Analysis (AREA)
Theoretical Computer Science (AREA)
Algebra (AREA)
Databases & Information Systems (AREA)
Software Systems (AREA)
General Engineering & Computer Science (AREA)
Discrete Mathematics (AREA)
Complex Calculations (AREA)

EP98942197A 1997-08-25 1998-08-24 Variable blockgrösse, zweidimensionale, inverse diskrete cosinus-transformationsvorrichtung Withdrawn EP1018082A1 (de)

Applications Claiming Priority (3)

Application Number	Priority Date	Filing Date	Title
US91809097A	1997-08-25	1997-08-25
US918090		1997-08-25
PCT/US1998/017423 WO1999010818A1 (en)	1997-08-25	1998-08-24	Variable block size 2-dimensional inverse discrete cosine transform engine

Publications (1)

Publication Number	Publication Date
EP1018082A1 true EP1018082A1 (de)	2000-07-12

Family

ID=25439787

Family Applications (1)

Application Number	Title	Priority Date	Filing Date
EP98942197A Withdrawn EP1018082A1 (de)	1997-08-25	1998-08-24	Variable blockgrösse, zweidimensionale, inverse diskrete cosinus-transformationsvorrichtung

Country Status (5)

Country	Link
EP (1)	EP1018082A1 (de)
KR (1)	KR20010023031A (de)
CN (1)	CN1268231A (de)
AU (1)	AU9030298A (de)
WO (1)	WO1999010818A1 (de)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
JP2003198504A (ja) *	2001-12-27	2003-07-11	Mitsubishi Electric Corp	逆拡散処理方法、拡散符号割当方法、移動体端末および基地局
JP2003223433A (ja)	2002-01-31	2003-08-08	Matsushita Electric Ind Co Ltd	直交変換方法、直交変換装置、符号化方法、符号化装置、逆直交変換方法、逆直交変換装置、復号化方法、及び、復号化装置
US7096245B2 (en)	2002-04-01	2006-08-22	Broadcom Corporation	Inverse discrete cosine transform supporting multiple decoding processes
US20060080375A1 (en) *	2004-10-12	2006-04-13	Lee Kun-Bin	Method and apparatus for inverse discrete cosine transform implementation
US7725516B2 (en) *	2005-10-05	2010-05-25	Qualcomm Incorporated	Fast DCT algorithm for DSP with VLIW architecture
US9110849B2 (en)	2009-04-15	2015-08-18	Qualcomm Incorporated	Computing even-sized discrete cosine transforms
CN101605259B (zh) *	2009-05-31	2012-11-21	华亚微电子（上海）有限公司	对多媒体数据进行变换编、解码的装置及方法
US8762441B2 (en)	2009-06-05	2014-06-24	Qualcomm Incorporated	4X4 transform for media coding
US9069713B2 (en)	2009-06-05	2015-06-30	Qualcomm Incorporated	4X4 transform for media coding
CN101646080B (zh) *	2009-06-18	2013-09-25	杭州高特信息技术有限公司	基于avs并行流水idct快速变换的方法和装置
US8451904B2 (en)	2009-06-24	2013-05-28	Qualcomm Incorporated	8-point transform for media data coding
US9075757B2 (en)	2009-06-24	2015-07-07	Qualcomm Incorporated	16-point transform for media data coding
US9118898B2 (en)	2009-06-24	2015-08-25	Qualcomm Incorporated	8-point transform for media data coding
US9081733B2 (en)	2009-06-24	2015-07-14	Qualcomm Incorporated	16-point transform for media data coding
CN102065309B (zh) *	2010-12-07	2012-12-05	青岛海信信芯科技有限公司	一种dct实现方法及dct实现电路
US9824066B2 (en) *	2011-01-10	2017-11-21	Qualcomm Incorporated	32-point transform for media data coding
US20160044314A1 (en) *	2014-08-08	2016-02-11	Qualcomm Incorporated	System and method for reusing transform structure for multi-partition transform
CN116389740A (zh) *	2023-04-07	2023-07-04	中国电子科技集团公司第五十八研究所	一种基于risc-v指令集的hevc视频编码器重构模块变换方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
FR2608808B1 (fr) *	1986-12-22	1989-04-28	Efcis	Circuit integre de traitement numerique de signaux

1998
- 1998-08-24 AU AU90302/98A patent/AU9030298A/en not_active Abandoned
- 1998-08-24 KR KR1020007001644A patent/KR20010023031A/ko not_active Withdrawn
- 1998-08-24 WO PCT/US1998/017423 patent/WO1999010818A1/en not_active Ceased
- 1998-08-24 CN CN98808477A patent/CN1268231A/zh active Pending
- 1998-08-24 EP EP98942197A patent/EP1018082A1/de not_active Withdrawn

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO9910818A1 *

Also Published As

Publication number	Publication date
AU9030298A (en)	1999-03-16
CN1268231A (zh)	2000-09-27
KR20010023031A (ko)	2001-03-26
WO1999010818A1 (en)	1999-03-04

Legal Events

Date	Code	Title	Description
2000-05-26	PUAI	Public reference made under article 153(3) epc to a published international application that has entered the european phase	Free format text: ORIGINAL CODE: 0009012
2000-07-12	17P	Request for examination filed	Effective date: 20000223
2000-07-12	AK	Designated contracting states	Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE
2002-08-23	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN
2002-10-09	18D	Application deemed to be withdrawn	Effective date: 20020301

Publication	Publication Date	Title
EP1018082A1 (de)	2000-07-12	Variable blockgrösse, zweidimensionale, inverse diskrete cosinus-transformationsvorrichtung
US5535288A (en)	1996-07-09	System and method for cross correlation with application to video motion vector estimator
US4777614A (en)	1988-10-11	Digital data processor for matrix-vector multiplication
EP0736205B1 (de)	2007-10-03	Verfahren ind vorrichtung zur durchfuehrung einer schnellen hadamard transform
US5452466A (en)	1995-09-19	Method and apparatus for preforming DCT and IDCT transforms on data signals with a preprocessor, a post-processor, and a controllable shuffle-exchange unit connected between the pre-processor and post-processor
US5867601A (en)	1999-02-02	Inverse discrete cosine transform processor using parallel processing
KR19990044305A (ko)	1999-06-25	압축 데이터에 의한 승산-가산 연산 수행 장치
US20050004963A1 (en)	2005-01-06	Parallel adder-based DCT/IDCT design using cyclic convolution
JPH08235159A (ja)	1996-09-13	逆コサイン変換装置
US6574651B1 (en)	2003-06-03	Method and apparatus for arithmetic operation on vectored data
US20050125469A1 (en)	2005-06-09	Method and system for discrete cosine transforms/inverse discrete cosine transforms based on pipeline architecture
US6317767B2 (en)	2001-11-13	Methods and systems for performing short integer chen IDCT algorithm with fused multiply/add
JP6357345B2 (ja)	2018-07-11	ビデオデータ処理時に空間領域と周波数領域との間の変換を実行するためのデータ処理装置および方法
US5805482A (en)	1998-09-08	Inverse discrete cosine transform processor having optimum input structure
US6003058A (en)	1999-12-14	Apparatus and methods for performing arithimetic operations on vectors and/or matrices
US5625713A (en)	1997-04-29	Apparatus and method for increasing the throughput of an acoustic or image compression system
US6460061B1 (en)	2002-10-01	2-dimensional discrete cosine transform using a polynomial transform
US5825420A (en)	1998-10-20	Processor for performing two-dimensional inverse discrete cosine transform
EP0913778A1 (de)	1999-05-06	Eine Kommutatorschaltung
US5671169A (en)	1997-09-23	Apparatus for two-dimensional inverse discrete cosine transform
US5999958A (en)	1999-12-07	Device for computing discrete cosine transform and inverse discrete cosine transform
JP3575991B2 (ja)	2004-10-13	直交変換回路
KR100306745B1 (ko)	2002-05-18	알에이씨를사용하는하프밴드서브밴드디씨티/아이디씨티회로및그방법
US5801979A (en)	1998-09-01	Carry logic that produces a carry value from NLSBs for a ROM accumulator in an inverse discrete cosine transform processor
KR100575285B1 (ko)	2006-04-28	고속의 저전력 이산 코사인 변환 장치 및 방법