WO2024082674A1 - 浮点数据精度转换方法和装置 - Google Patents
浮点数据精度转换方法和装置 Download PDFInfo
- Publication number
- WO2024082674A1 WO2024082674A1 PCT/CN2023/102089 CN2023102089W WO2024082674A1 WO 2024082674 A1 WO2024082674 A1 WO 2024082674A1 CN 2023102089 W CN2023102089 W CN 2023102089W WO 2024082674 A1 WO2024082674 A1 WO 2024082674A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- field
- bit width
- code value
- value
- code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/499—Denomination or exception handling, e.g. rounding or overflow
- G06F7/49942—Significance control
- G06F7/49947—Rounding
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/14—Conversion to or from non-weighted codes
- H03M7/24—Conversion to or from floating-point codes
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3059—Digital compression and data reduction techniques where the original information is represented by a subset or similar information, e.g. lossy compression
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3082—Vector coding
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/38—Indexing scheme relating to groups G06F7/38 - G06F7/575
- G06F2207/3804—Details
- G06F2207/3808—Details concerning the type of numbers or the way they are handled
- G06F2207/3812—Devices capable of handling different types of numbers
Definitions
- the present application relates to the field of chip technology, and in particular to a floating point data precision conversion method and device.
- low-precision computing power such as FP16/BF16
- FP32/FP64 the iterative algorithm and high-precision data format FP32/FP64 are used to solve the high-precision calculation results.
- the embodiment of the present application provides a floating-point data precision conversion method and device, which realizes the conversion of high-precision data to low-precision data.
- the second floating-point data uses the prefix code field to indicate the bit width of the second exponent field, which effectively balances the relationship between the bit width, range and precision of the second floating-point data.
- the retained code value is rounded according to the discarded code value in the first mantissa field, without the support of other devices, the conversion efficiency of high-precision data to low-precision data is improved, and the hardware overhead is reduced.
- an embodiment of the present application provides a method for converting the precision of floating-point data, wherein the first floating-point data includes a sign field, a first exponent field and a first mantissa field, and the second floating-point data includes a sign field, a prefix code field, a second exponent field and a second mantissa field, the prefix code field is used to indicate the bit width of the second exponent field, and the precision of the first floating-point data is higher than the precision of the second floating-point data, and the method includes: determining the first bit width of the prefix code field, the first coding value of the prefix code field, the first bit width of the second exponent field, the first coding value of the second exponent field and the first bit width of the second mantissa field according to the first coding value of the first exponent field; determining the retained coding values and the discarded coding values in the first mantissa field, the retained coding values including the coding values starting from the highest
- the floating-point data precision conversion method realizes the conversion of high-precision data into low-precision data.
- the symbol domain of the second floating-point data can be obtained based on the symbol domain of the first floating-point data
- the prefix code domain and the second exponent domain of the second floating-point data can be obtained based on the first exponent domain of the first floating-point data
- the second mantissa domain of the second floating-point data can be obtained based on the first mantissa domain of the first floating-point data.
- the bit width of the second exponent domain is indicated by a shorter prefix code domain in the second floating-point data, which can effectively improve the precision or bit width of the mantissa of the second floating-point data, and at the same time, a larger numerical range can be represented for the second floating-point data with only 1 mantissa precision, which effectively balances the relationship between the bit width, range and precision of the second floating-point data.
- the prefix code domain can adopt a prefix code encoding method, occupying less bit width, and parsing the second exponent domain and the second mantissa domain is convenient.
- the retained code value is rounded according to the discarded code value in the first mantissa domain, without the support of other devices, the conversion efficiency of high-precision data to low-precision data conversion is improved, and the hardware overhead is reduced.
- the rounding operation includes a carry operation and a discard operation, and the rounding operation is performed on the retained code value according to the discarded code value to obtain the first code value of the second mantissa domain, including: when the code value starting from the highest bit in the discarded code value and the bit width is greater than or equal to the second preset threshold, a carry operation is performed on the lowest bit of the retained code value, and the discarded code value is discarded, and the code value after the carry of the retained code value is the first code value of the second mantissa domain; when the code value starting from the highest bit in the discarded code value and the bit width is the preset bit width is less than the second preset threshold, the discarded code value is discarded, and the retained code value is the second mantissa domain.
- the second preset threshold used for comparison is the coding value starting from the lowest bit in the discarded coding value and with a bit width of the preset bit width.
- the generation of the second preset threshold does not require an additional random number generator, and there is no performance bottleneck for random number generation, which improves the conversion efficiency of high-precision data to low-precision data, and at the same time reduces hardware overhead.
- the rounding operation includes a carry operation or a discard operation
- the retained code value is rounded according to the discarded code value to obtain the first code value of the second mantissa field, including: when the highest bit of the discarded code value is greater than or equal to a first preset threshold, a carry operation is performed on the lowest bit of the retained code value, and the discarded code value is discarded, and the code value obtained after the carry operation on the retained code value is the first code value of the second mantissa field; when the highest bit of the discarded code value is less than the first preset threshold, a discard operation is performed on the discarded code value, and the retained code value is the first code value of the second mantissa field.
- the first preset threshold value can be 0 or 1, and the highest bit of the discarded coded value is compared with the first preset threshold value, which belongs to the rounding away from 0.
- the rounding away from even number and the rounding away from odd number can also be included.
- the rounding away from 0 has a smaller hardware implementation area and lower power consumption than other rounding methods, and has a higher data resolution.
- the floating-point data precision conversion method provided by the embodiment of the present application also includes: determining whether the retained code value after the carry operation overflows; if the retained code value after the carry operation overflows, adding 1 to the lowest bit of the first code value of the first exponent field to obtain the second code value of the first exponent field; if the second bit width of the prefix code field is different from the first bit width of the prefix code field, determining the second code value of the prefix code field, the second code value of the second exponent field, the second bit width of the second mantissa field, and the second code value of the second mantissa field according to the second code value of the first exponent field; if the second bit width of the prefix code field is the same as the first bit width of the prefix code field, determining whether the first bit width of the second exponent field and the second bit width of the second exponent field are the same; if the second bit width of the second exponent field is less than the first bit width of the second exponent field
- the lowest bit of the first code value in the first exponent field is added by 1 to obtain the second bit width of the prefix code field and the bit width of the second exponent field. If the second bit width of the prefix code field is the same as the first bit width of the prefix code field, if the bit width of the second exponent field changes, the second bit width of the second mantissa field can be obtained, which can solve the problem of overflow caused by retaining the code value for carry operation.
- determining the first bit width of the prefix code domain, the first coding value of the prefix code domain, the first bit width of the second exponent domain and the first coding value of the second exponent domain according to the first coding value of the first exponent domain includes: determining an indication value according to the first coding value of the first exponent domain, determining the first bit width of the prefix code domain and the first coding value of the prefix code domain corresponding to the indication value by looking up a table, the indication value being also used to indicate the first bit width of the second exponent domain; determining the first coding value corresponding to the first bit width of the second exponent domain according to the first coding value of the first exponent domain.
- determining the first bit width of the second mantissa field according to the first encoded value of the first exponent field includes: determining the first bit width of the second mantissa field according to the total bit width of the second floating-point data, the first bit width of the prefix code field, and the first bit width of the second exponent field.
- the bit width of the sign field is 1, the bit width of the prefix code field is 2 or 3, the first bit width of the second exponent field is an integer from 0 to 4, and the first bit width of the second mantissa field is an integer from 1 to 4.
- a shorter prefix code field is used in the second floating-point data to indicate the first bit width of the second exponent field, so that the second floating-point data can provide a maximum of 4 mantissa precision, and at the same time, a larger numerical range can be represented for the second floating-point data that only provides 1 mantissa precision, effectively balancing the relationship between the bit width, range and precision of the second floating-point data.
- the highest bit is hidden when the second exponent field is stored, reducing the first bit width that needs to be stored in the second exponent field, effectively avoiding the problem of numerical overlap of the first code value of the second exponent field corresponding to the indication value of different prefix code fields, so that there is no redundant coding in the HiFloat8 data format.
- the second floating-point data when the first floating-point data exceeds the upper limit of the data range of the second floating-point data, the second floating-point data is determined based on a saturation method or an infinity method; when the first floating-point data exceeds the lower limit of the data range of the second floating-point data, the second floating-point data is zero; when the first floating-point data is a non-numeric value, the second floating-point data is a non-numeric value.
- the second floating point data when the first floating point data exceeds the upper limit and the lower limit of the data range of the second floating point data, the second floating point data can represent the first floating point data by a special value, such as a saturated value, an infinite value, and a zero value.
- the first floating point data is a non-digital value
- the second floating point data is also represented by a non-digital value.
- an embodiment of the present application provides a floating-point data precision conversion device, wherein the first floating-point data includes a sign field, a first exponent field, and a first mantissa field, and the second floating-point data includes a sign field, a prefix code field, a second exponent field, and a second mantissa field, and the prefix code field is used to indicate the bit width of the second exponent field.
- the precision of the first floating-point data is higher than the precision of the second floating-point data.
- the device includes: a bit width calculation A unit is used to determine the first bit width of the prefix code field, the first code value of the prefix code field, the first bit width of the second exponent field, the first code value of the second exponent field and the first bit width of the second mantissa field according to the first code value of the first exponent field; a mantissa field calculation unit is used to determine the retained code value and the discarded code value in the first mantissa field, the retained code value includes the code value starting from the highest bit in the first mantissa field and having the same bit width as the first bit width of the second mantissa field; a rounding operation unit is used to round the retained code value according to the discarded code value to obtain the first code value of the second mantissa field.
- the beneficial effects of the second aspect can refer to the description of the first aspect.
- the rounding operation includes a carry operation or a discard operation
- the rounding operation unit is also used to: when the discarded code value starts from the highest bit and the code value with a bit width of a preset bit width is greater than or equal to a second preset threshold, perform a carry operation on the lowest bit of the retained code value, and discard the discarded code value, and the code value after the carry of the retained code value is the first code value of the second mantissa domain; when the discarded code value starts from the highest bit and the code value with a bit width of a preset bit width is less than the second preset threshold, perform a discard operation on the discarded code value, and the retained code value is the first code value of the second mantissa domain; wherein the second preset threshold is the code value starting from the lowest bit of the discarded code value and the bit width is the preset bit width.
- the rounding operation includes a carry operation or a discard operation
- the rounding operation unit is also used to: when the highest bit of the discarded code value is greater than or equal to a first preset threshold, perform a carry operation on the lowest bit of the retained code value, and perform a discard operation on the discarded code value, and the code value obtained after the carry operation on the retained code value is the first code value of the second mantissa field; when the highest bit of the discarded code value is less than the first preset threshold, perform a discard operation on the discarded code value, and the retained code value is the first code value of the second mantissa field.
- the device also includes: an overflow unit, which is used to determine whether the retained code value after the carry operation overflows; the bit width calculation unit is also used to, if the retained code value after the carry operation overflows, add 1 to the first code value of the first exponent field to obtain the second code value of the first exponent field; determine the second bit width of the second exponent field and the second bit width of the prefix code field according to the second code value of the first exponent field; if the second bit width of the prefix code field is different from the first bit width of the prefix code field, determine the second code value of the prefix code field, the second code value of the second exponent field, and the second bit width of the second mantissa field according to the second code value of the first exponent field.
- an overflow unit which is used to determine whether the retained code value after the carry operation overflows
- the bit width calculation unit is also used to, if the retained code value after the carry operation overflows, add 1 to the first code value of the first exponent field to obtain the
- the second bit width of the prefix code field is the same as the first bit width of the prefix code field, determine whether the first bit width of the second exponent field is the same as the second bit width of the second exponent field; if the second bit width of the second exponent field is less than the first bit width of the second exponent field, add 1 to the bit width of the retained encoding value to obtain the second bit width of the second mantissa field and the second encoding value of the second mantissa field; if the second bit width of the second exponent field is greater than or equal to the first bit width of the second exponent field, discard the lowest bit of the retained encoding value to obtain the second bit width of the second mantissa field and the second encoding value of the second mantissa field.
- the bit width calculation unit is also used to: determine an indication value according to a first coding value of the first exponent field, determine the first bit width of the prefix code field and the first coding value of the prefix code field corresponding to the indication value by looking up a table, and the indication value is also used to indicate the first bit width of the second exponent field; determine the first coding value corresponding to the first bit width of the second exponent field according to the first coding value of the first exponent field.
- bit width calculation unit is further used to determine the first bit width of the second mantissa field according to the total bit width of the second floating-point data, the first bit width of the prefix code field, and the first bit width of the second exponent field.
- the bit width calculation unit is also used to: when the first floating-point data exceeds the upper limit of the conversion range of the second floating-point data, determine the second floating-point data based on a saturation method or an infinity method; when the first floating-point data exceeds the lower limit of the conversion range of the second floating-point data, the second floating-point data is zero; when the first floating-point data is a non-numeric value, the second floating-point data is a non-numeric value.
- a communication device comprising at least one processor, wherein the at least one processor is connected to a memory, and the at least one processor is used to read and execute a program stored in the memory so that the device executes a method as described in the first aspect or any one of the first aspects.
- a chip is provided, wherein the chip is coupled to a memory and is used to read and execute program instructions stored in the memory to implement the method described in the first aspect or any one of the first aspects.
- the present application provides a chip system, which is applied to a cloud center.
- the chip system includes one or more interface circuits and one or more processors.
- the interface circuit and the processor are interconnected by a line; the interface circuit is used to receive a signal from a memory of the cloud center and send the signal to the processor, and the signal includes a computer instruction stored in the memory.
- the cloud center executes the floating-point data precision conversion method provided by the first aspect or its corresponding possible design.
- an embodiment of the present application provides a computer-readable storage medium, including computer instructions.
- the computer instructions When the computer instructions are executed on an electronic device, the electronic device executes the floating-point data precision conversion method in any of the above aspects and any possible implementation methods.
- an embodiment of the present application provides a computer program product, which, when executed on a computer or a processor, enables the computer or the processor to execute the floating-point data precision conversion method in any of the above aspects and any possible implementations. Law.
- any of the floating-point data precision conversion devices, chip systems, computer-readable storage media or computer program products provided above can be applied to the corresponding methods provided above. Therefore, the beneficial effects that can be achieved can refer to the beneficial effects in the corresponding methods and will not be repeated here.
- FIG1 is a diagram of an IEEE754 floating-point data format provided in an embodiment of the present application.
- FIG2 is a schematic diagram of a system or device for applying a floating-point data precision conversion method or device provided in an embodiment of the present application
- FIG3 is a schematic diagram of the structure of an SLC provided in an embodiment of the present application.
- FIG4 is a flow chart of a floating point data precision conversion method provided by an embodiment of the present application.
- FIG5 is a schematic diagram of the structure of a random rounding method provided in an embodiment of the present application.
- FIG6 is a flow chart of a rounding method away from zero provided in an embodiment of the present application.
- FIG7 is a flow chart of another floating point data precision conversion method provided by an embodiment of the present application.
- FIG8 is a flow chart of another floating point data precision conversion method provided by an embodiment of the present application.
- FIG9 is a flow chart of converting FP32 data into HiFloat8 data provided by an embodiment of the present application.
- FIG. 10 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
- Scalar calculation unit the circuit for scalar calculation is called scalar calculation unit, where scalar is also called pure quantity, which has only size but no direction. Scalar calculation is mostly used for general calculation.
- execution unit (EXU) part of the multi-stage pipeline of the central processing unit (CPU)
- ALU arithmetic logic unit
- Vector computing unit is a computing unit specially designed for vector computing with a certain degree of parallelism, such as a single instruction multiple data (SIMD) processor, in which a vector is also called a vector, usually referring to a one-dimensional array with a length greater than 1.
- SIMD single instruction multiple data
- Vector computing units are mostly used in fields such as HPC and AI machine learning, including solving mathematical problems such as linear programming, Fourier transform, filtering calculation, and linear algebra, partial differential equations, and integration.
- an arithmetic execution unit (Vector Unit) based on the HiFloat data format can be embedded.
- Matrix computing unit a computing unit specially designed for matrix computing with corresponding parallelism, such as a systolic array processor, in which a matrix is a 2D array arranged in a rectangular array.
- Matrix computing units are mostly used for matrix computing in fields such as HPC and AI machine learning, including matrix multiplication, matrix inversion, and matrix decomposition.
- the matrix computing acceleration unit a matrix unit based on the HiFloat data format can be embedded.
- Tensor computing units are specially designed for tensor computing and have corresponding parallelism. For example, cube computing units, where tensors are multidimensional arrays with more than 2 dimensions, and 3-dimensional arrays are common. Tensor computing units are mostly used in the field of AI machine learning, such as convolution operations. In tensor computing acceleration units, tensor units based on HiFloat data format can be embedded.
- first and second are used for descriptive purposes only and are not to be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features.
- a feature defined as “first” or “second” may explicitly or implicitly include one or more of the features.
- plural means two or more.
- IEEE 754 The Institute of Electrical and Electronics Engineers (IEEE) has established IEEE 754 as a binary floating-point arithmetic standard, which defines floating-point data representation methods such as double-precision FP64, single-precision FP32, and half-precision FP16.
- double-precision FP64 data and single-precision FP32 data are suitable for CPU and floating-point arithmetic environments
- half-precision FP16 data are suitable for computer graphics environments.
- Figure 1 is a diagram of the IEEE754 floating-point data format provided in an embodiment of the present application.
- IEEE754 floating-point data includes a sign field (bit sign, S), an exponent field (bits exponent, E), and a mantissa field (bits mantissa, M).
- bit sign, S sign, S
- exponent field bits exponent
- mantissa mantissa
- M mantissa field
- the sign field is 1 bit
- the exponent field is 11 bits
- the mantissa field is 52 bits
- the sign field is 1 bit
- the exponent field is 8 bits
- the mantissa field is 23 bits
- FP16 data the sign field is 1 bit
- the exponent field is 5 bits
- mantissa field 10 bits.
- the first floating-point data provided in the embodiment of the present application is FP32 data.
- the data format of FP32 data is first introduced below, for example, as shown in Table 1, which is the data format of FP32 data.
- FP32 data includes a sign field S, an exponent field E, and a mantissa field M.
- the sign field of FP32 data determines whether the FP32 data is a positive number or a negative number, where 0 represents a positive number and 1 represents a negative number.
- the exponent field of FP32 data is a power of 2, and FP32 data can be weighted.
- the mantissa field of FP32 data is a binary decimal. The steps for converting FP32 data to decimal data are as follows: (1) If the sign field of FP32 data is 0, then the FP32 data is a positive number. (2) The encoding value of the exponent field of FP32 data is 01111100, which represents 124 in decimal.
- the exponent field of the FP32 data is 2 to the power of -3.
- the mantissa field of the FP32 data is 010000000000000000000000. Since the exponent field is not all "0" or all "1", the decimal data represented by the mantissa field is 1.25. Based on the conversion formula between floating-point data and decimal values, the decimal data corresponding to the FP32 data is 0.15625.
- the prior art provides a simplified random rounding method for converting FP32 to FP16 or BF16 data format, and the threshold of random rounding is calculated using specific bits of the data itself.
- the 11th to 18th bits and the 16th to 23th bits in the mantissa field of FP32 data (a total of 23 bits, the most significant bit (MSB) to the least significant bit (LSB) are numbered 1-23) are added, and the overflowed bits are used as the threshold of the random input.
- this method only describes a single random rounding conversion method, which only involves the conversion of FP32 to FP16 or BF16 data format, and cannot meet the conversion of high-precision data in other formats to low-precision data.
- the threshold generation method of this method involves more bits in the mantissa field, the calculation is more complicated, and the hardware overhead is large.
- the floating-point data precision conversion method realizes the conversion of high-precision data into low-precision data.
- the sign domain of the second floating-point data can be obtained based on the sign domain of the first floating-point data
- the prefix code domain and the second exponent domain of the second floating-point data can be obtained based on the first exponent domain of the first floating-point data
- the second mantissa domain of the second floating-point data can be obtained based on the first mantissa domain of the first floating-point data.
- the bit width of the second exponent domain is indicated by a shorter prefix code domain in the second floating-point data, which can effectively improve the precision or bit width of the mantissa of the second floating-point data, and at the same time, a larger numerical range can be represented for the second floating-point data with only 1 mantissa precision, which effectively balances the relationship between the bit width, range and precision of the second floating-point data.
- the prefix code domain can adopt the prefix code encoding method, occupying less bit width, and parsing the second exponent domain and the second mantissa domain is convenient.
- the retained code value is rounded according to the discarded code value in the first mantissa domain, without the support of other devices, the conversion efficiency of high-precision data to low-precision data conversion is improved, and the hardware overhead is reduced.
- the second floating-point data provided in the embodiment of the present application is HiFloat8 data, as shown in Table 2, which is the encoding method of HiFloat8 data.
- 8 is the total bit width of HiFloat8 data, and the total bit width can vary.
- the sign field occupies one bit, 0 for positive, 1 for negative, or 1 for negative and 0 for positive.
- the prefix field occupies 2 or 3 bits, and the prefix field can express 5 different information.
- the value of D can be 0, 1, 2, 3, 4.
- the bit width of the exponent field varies according to the value of D, and the mantissa field occupies the remaining bit width.
- the prefix code domain can be encoded using integers, in which case D is a fixed value.
- the prefix code domain can also be encoded using prefix codes, in which case D is a finite value set.
- prefix code encoding 2 bits are used to encode the values 2, 3, and 4, and 3 The bit encoding values are 0 and 1.
- the prefix code encoding method of the prefix code field is shown in Table 3.
- Table 3 is the encoding method of the prefix code field.
- Ec is the symmetry center of the order code, which is also the bias in FP32 data.
- HiFloat(N,5,Ec) can be configured as HiFloat(8,5,0), abbreviated as HiF8, or other configurations.
- the distribution of HiFloat8 encoding values is shown in Table 4.
- the floating-point data precision conversion method and device of the present application can be applied to different systems or devices, such as the execution device 20 shown in Figure 2, which is a schematic diagram of a system or device for the application of a floating-point data precision conversion method and device provided in an embodiment of the present application.
- the execution device can be a terminal, such as a mobile phone terminal, a tablet computer, a laptop computer, an AR device (not shown in Figure 2), a VR device (not shown in Figure 2), a vehicle terminal (not shown in Figure 2), etc., and can also be a server, etc.
- the floating-point data precision conversion method provided in the present application can be applied to scenarios involving mixed precision calculations such as CPU, HPC and AI in the execution device 20, such as scalar calculation units, vector calculation units, matrix calculation units and tensor calculation units.
- the device for floating-point data precision conversion proposed in the present application can be a chip, for example, the chip is a system-on-a-Chip (SoC).
- SoC system-on-a-Chip
- Figure 3 is a structural diagram of a SoC provided in an embodiment of the present application.
- the SoC includes a processor, which can be a single-core processor or a multi-core processor, a memory, and an I/O interface, etc. After the processor loads the data and application in the memory, it processes the data, such as performing the calculation processing in the present application.
- the sign domain, prefix code domain, second exponent domain, and second mantissa domain of the second floating-point data can be determined by reading the sign domain, first exponent domain, and first mantissa domain in the FP32 data.
- the embodiment of the present application provides a floating-point data precision conversion method, which is applied to converting first floating-point data into second floating-point data, wherein the first floating-point data includes a sign field, a first exponent field, and a first mantissa field, and the second floating-point data includes a sign field, a prefix code field, a second exponent field, and a second mantissa field, wherein the prefix code field is used to indicate the bit width of the second exponent field, and the precision of the first floating-point data is higher than the precision of the second floating-point data.
- FIG. 4 is a flow chart of a floating-point data precision conversion method provided by an embodiment of the present application, and the method includes:
- Step 401 The execution device determines the first bit width of the prefix code field, the first code value of the prefix code field, the first bit width of the second exponent field, the first code value of the second exponent field, and the first bit width of the second mantissa field according to the first code value of the first exponent field.
- the precision of the first floating-point data is higher than the precision of the second floating-point data, wherein the first floating-point data may be FP32 data and the second floating-point data may be HiFloat8 data.
- the first floating-point data or the second floating-point data includes a binary integer part and a binary decimal part, wherein the first exponent field and the second exponent field respectively determine the binary integer part of the first floating-point data and the second floating-point data, the first mantissa field determines the binary decimal part of the first floating-point data, and the prefix code field and the second mantissa field determine the binary decimal part of the second floating-point data.
- the calculation operation can be performed to obtain the first bit width of the prefix code field and the first coding value of the prefix code field.
- the first bit width and the second exponent field of the second exponent field can also be obtained. After obtaining the first bit width of the prefix code field and the first bit width of the second exponent field, the first bit width of the second mantissa field can be determined.
- Step 402 The execution device determines the retained code values and discarded code values in the first mantissa field, where the retained code values include code values in the first mantissa field starting from the highest bit and having the same bit width as the first bit width of the second mantissa field.
- the bit width of the first mantissa field of the first floating-point data is greater than the bit width of the second mantissa field of the second floating-point data.
- the first floating-point data is converted into the second floating-point data. Since the bit width of the second mantissa field of the second floating-point data is limited, it is necessary to round off the coded values in the first mantissa field.
- the coded values starting from the highest bit in the first mantissa field and having the same first bit width in the bit width field and the second mantissa field are determined as the retained coded values, and the remaining coded values in the first mantissa field except the retained coded values are determined as the discarded coded values.
- Step 403 The execution device rounds the retained code value according to the discarded code value to obtain the first code value of the second mantissa field.
- the rounding operation may be a carry operation and a discard operation, and whether to perform a carry operation on the retained code value or a discard operation on the discarded code value is determined according to the discarded code value.
- One determination method may be to compare the discarded code value with a threshold value, and if the discarded code value is greater than the threshold value, perform a carry operation on the retained code value, and if the discarded code value is less than the threshold value, perform a discard operation on the discarded code value.
- the floating-point data precision conversion method realizes the conversion of high-precision data into low-precision data.
- the sign domain of the second floating-point data can be obtained based on the sign domain of the first floating-point data
- the prefix code domain and the second exponent domain of the second floating-point data can be obtained based on the first exponent domain of the first floating-point data
- the second mantissa domain of the second floating-point data can be obtained based on the first mantissa domain of the first floating-point data.
- the bit width of the second exponent domain is indicated by a shorter prefix code domain in the second floating-point data, so that the second floating-point data can provide a maximum of 4 mantissa precision, and the second floating-point data that only provides 1 mantissa precision can represent a larger numerical range, which effectively balances the relationship between the second floating-point data bit width, range and precision.
- the prefix code domain can adopt a prefix code encoding method, occupying less bit width, and parsing the second exponent domain and the second mantissa domain is convenient.
- the retained code value is rounded according to the discarded code value in the first mantissa domain, without the support of other devices, the conversion efficiency of high-precision data to low-precision data conversion is improved, and the hardware overhead is reduced.
- the embodiment of the present application further provides a stochastic rounding (SR) method.
- SR stochastic rounding
- FIG5 is a schematic diagram of the structure of a stochastic rounding method provided by the embodiment of the present application.
- Step 403 may also include:
- Step 4033 When the code value whose bit width is the preset bit width and starts from the highest bit in the discarded code value is greater than or equal to the second preset threshold, the execution device performs a carry operation on the lowest bit of the retained code value and discards the discarded code value.
- the code value after the carry operation of the retained code value is the first code value of the second mantissa field.
- Step 4034 When the code value with a bit width of the preset bit width and starting from the highest bit in the discarded code value is smaller than the second preset threshold, the execution device discards the discarded code value and retains the code value as the first code value of the second mantissa field.
- the second preset threshold is the encoding value of the discarded encoding value starting from the lowest bit and having a bit width of the preset bit width.
- the preset bit width may be an integer between 10 and 14.
- the second preset threshold may be a coding value starting from the lowest bit in the discarded coding value and having a bit width of 14, and the part of the discarded coding value used for comparison with the second preset threshold is a coding value starting from the highest bit and having a bit width of 14.
- the first mantissa field 23’b010000000000000000000 in Table 1 if the first bit width of the second mantissa field is 2, the retained code value in the first mantissa field is 2’b01, the discarded code value is 21’b0000000000000000, the partially discarded code value is 14’b00000000000000, and the second preset threshold is 14’b000000000000000. Since the partially discarded code value is equal to the second preset threshold, the discarded code value is discarded, and the code value after the retained code value is carried is the first code value of the second mantissa field, that is, the first code value of the second mantissa field is 2’b01.
- the first mantissa field 23’b010000000000000000000 in Table 1 if the first bit width of the second mantissa field is 1, the reserved code value in the first mantissa field is 1’b0, the discarded code value is 22’b1000000000000000000000, the partially discarded code value is 14’b100000000000000, and the second preset threshold is 14’b000000000000000. Since the partially discarded code value is greater than the second preset threshold, a carry operation is performed on the lowest bit of the reserved code value, and a discard operation is performed on the discarded code value.
- the code value obtained after the carry operation on the reserved code value is the first code value of the second mantissa field, that is, the first code value of the second mantissa field is 1’b1.
- the second preset threshold used for comparison is a coding value discarded from the lowest bit in the coding value and having a bit width of a preset bit width.
- the generation of the second preset threshold does not require an additional random number generator, and there is no performance bottleneck for random number generation, which improves the conversion efficiency of high-precision data to low-precision data, and at the same time reduces hardware overhead.
- the rounding operation includes a carry operation or a discard operation.
- the embodiment of the present application provides a round half to zero carry operation.
- FIG. 6 is a flowchart of a rounding method away from zero provided in an embodiment of the present application.
- Step 403 may include:
- Step 4031 When the highest bit of the discarded code value is greater than or equal to the first preset threshold, the execution device performs a carry operation on the lowest bit of the retained code value and discards the discarded code value.
- the code value obtained after the carry operation on the retained code value is the first code value of the second mantissa field.
- Step 4032 When the highest bit of the discarded code value is less than the first preset threshold, the execution device discards the discarded code value, and the retained code value is the first code value of the second mantissa field.
- the first preset threshold value may be 1.
- the highest bit of the discarded code value is greater than or equal to the preset threshold value, that is, the highest bit of the discarded code value is 1, a carry operation is performed on the lowest bit of the retained code value, and a discard operation is performed on the discarded code value.
- the highest bit of the discarded code value is less than the first preset threshold value, that is, when the highest bit of the discarded code value is 0, a discard operation is performed on the discarded code value.
- the retained code value in the first mantissa field is 2’b01
- the discarded code value is 21’b000000000000000000000
- the highest bit of the discarded code value is 0. Since the highest bit of the discarded code value is less than the first preset threshold, the discarded code value is discarded, and the retained code value is the first code value of the second mantissa field, that is, the first code value of the second mantissa field is 2’b01.
- the first mantissa field 23’b010000000000000000000 in Table 1 if the first bit width of the second mantissa field is 1, the retained code value in the first mantissa field is 1’b0, and the discarded code value is 22’b1000000000000000000, and the highest discarded code value is 1. Since the highest bit of the discarded code value is greater than the first preset threshold, a carry operation is performed on the lowest bit of the retained code value, and a discard operation is performed on the discarded code value, and the code value obtained after the carry operation on the retained code value is the first code value of the second mantissa field, that is, the first code value of the second mantissa field is 1’b1.
- the preset threshold value may also be 0.
- a carry operation is performed on the lowest bit of the retained code value, and the discarded code value is discarded, and the code value obtained after the carry operation on the retained code value is the first code value of the second mantissa domain.
- a discard operation is performed on the discarded code value, and the retained code value is the first code value of the second mantissa domain.
- the round half to even rounding method and the round half to odd rounding method may also be included.
- the TA rounding method provided in the embodiment of the present application has a smaller hardware implementation area and lower power consumption than other rounding methods, and has a higher data resolution.
- FIG7 is a flow chart of another floating point data precision conversion method provided in an embodiment of the present application.
- the floating point data precision conversion method provided in an embodiment of the present application may also include:
- Step 404 The execution device determines whether the retained code value after the carry operation overflows.
- the reserved code value may overflow.
- an overflow may occur after a carry operation is performed on the lowest bit of the reserved code value.
- Step 405 If the retained code value after the carry operation overflows, the execution device adds 1 to the least significant bit of the first code value in the first exponent field to obtain the second code value in the first exponent field.
- the first exponent field in Table 1 its first encoding value is 8’b01111100. If the retained encoding value after the carry operation overflows, the lowest bit of the first encoding value of the first exponent field is added by 1 to obtain the second encoding value of the first exponent field, that is, the second encoding value of the first exponent field is 8’b01111101.
- Step 406 The execution device determines a second bit width of the second exponent field and a second bit width of the prefix code field according to the second encoding value of the first exponent field.
- the second bit width of the second exponent field is 1 and the second bit width of the prefix code field is 3.
- Step 407 If the second bit width of the prefix code field is different from the first bit width of the prefix code field, the execution device determines the second code value of the prefix code field, the second code value of the second exponent field, the second bit width of the second mantissa field, and the second code value of the second mantissa field according to the second code value of the first exponent field.
- the second bit width of the prefix code field is different from the first bit width of the prefix code field, since the prefix code field is used to indicate the bit width of the second exponent field, the first bit width of the second exponent field is different from the second bit width of the second exponent field. If the second bit width of the prefix code field is greater than the first bit width of the prefix code field, the second bit width of the second exponent field is less than the first bit width of the second exponent field. At this time, the number of bit widths increased by the prefix code field is the same as the number of bit widths reduced by the second exponent field, so the first bit width of the second mantissa field remains unchanged.
- the second bit width of the prefix code field is less than the first bit width of the prefix code field, the second bit width of the second exponent field is greater than the first bit width of the second exponent field. At this time, The number of bit widths reduced by the prefix code field is the same as the number of bit widths reduced by the second exponent field, so the first bit width of the second mantissa field remains unchanged.
- the execution device determines the second code value of the prefix code field and the second code value of the second exponent field according to the second code value of the first exponent field.
- the second code value of the second mantissa field is all 0, for example, if the second bit width of the second mantissa field is 3, the second code value of the second mantissa field is 3'b000.
- the first bit width of the prefix code field determined based on the first bit 8’b01111100 of the first exponent field is 2, and the second bit width of the second exponent field is 2.
- the number of bit widths reduced by the prefix code field is the same as the number of bit widths increased by the second exponent field, so the first bit width of the second mantissa field remains unchanged.
- Step 408 If the second bit width of the prefix code field is the same as the first bit width of the prefix code field, the execution device determines whether the first bit width of the second exponent field is the same as the second bit width of the second exponent field.
- Step 409 If the second bit width of the second exponent field is smaller than the first bit width of the second exponent field, the execution device adds 1 to the bit width of the retained code value to obtain the second bit width of the second mantissa field and the second code value of the second mantissa field.
- the second bit width of the prefix code field is the same as the first bit width of the prefix code field. If the second bit width of the second exponent field is less than the first bit width of the second exponent field, the first bit width of the second mantissa field will increase.
- the bit width of the reserved code value is added by 1 to obtain the second bit width of the second mantissa field and the second code value of the second mantissa field. In one example, if the reserved code value is 2'b01, the bit width of the reserved code value is 2, and after adding 1 to the bit width of the reserved code value, the bit width of the reserved code value is 3, and the reserved code value is 3'b010.
- Step 4010 If the second bit width of the second exponent field is greater than the first bit width of the second exponent field, the execution device discards the lowest bit of the retained encoding value to obtain the second bit width of the second mantissa field and the second encoding value of the second mantissa field.
- the second bit width of the prefix code field is the same as the first bit width of the prefix code field. If the second bit width of the second exponent field is greater than the first bit width of the second exponent field, the first bit width of the second mantissa field will be reduced. The lowest bit of the retained code value is discarded to obtain the second bit width of the second mantissa field and the second code value of the second mantissa field. In one example, if the retained code value is 2'b01, the bit width of the retained code value is 2, and after the lowest bit of the retained code value is discarded, the retained code value is 1'b0, and the bit width of the retained code value is 1.
- FIG8 is a flowchart of another floating-point data precision conversion method provided by an embodiment of the present application.
- Step 401 includes:
- Step 4011 the execution device determines an indication value according to the first coding value of the first exponent field, determines the first bit width of the prefix code field and the first coding value of the prefix code field corresponding to the indication value by looking up a table, and the indication value is also used to indicate the first bit width of the second exponent field.
- the exponent value N of the first exponent field can be determined based on the first encoding value of the first exponent field, and the indication value can be determined based on the exponent value of the first exponent field.
- the table lookup here is to look up Table 3, the indication value is the value of D, and the indication value can be 0, 1, 2, 3, 4.
- the first exponent field in Table 1 its first encoding value is 8'b01111100, which represents 124 in decimal. After removing the offset 127 for FP32 data, a decimal value of -3 is obtained, where -3 is the exponent value N of the first exponent field.
- D INT[log 2
- the indication value is also used to indicate the first bit width of the second exponent field, that is, the first bit width of the second exponent field is 2.
- Step 4012 The execution device determines a first encoding value corresponding to the first bit width of the second exponent field according to the first encoding value of the first exponent field.
- the first exponent field in Table 1 please refer to Table 4.
- D is 2
- the exponent value of the first exponent field is -3, that is, the exponent sign bit Se is 1
- the first bit width of the second exponent field determined by the indication value is 2
- the first encoding value of the determined second exponent field is 11.
- Step 4013 The execution device determines the first bit width of the second mantissa field according to the total bit width of the second floating-point data, the first bit width of the prefix code field, and the first bit width of the second exponent field.
- the total bit width of the second floating-point data is Nb
- the first bit width of the prefix code field is Db
- the first bit width of the second exponent field is Eb
- the bit width of the sign field is 1
- the first bit width of the second mantissa field is Mb
- the bit width of the sign field is 1, the bit width of the prefix code field is 2 or 3, the first bit width of the second exponent field is an integer from 0 to 4, and the first bit width of the second mantissa field is an integer from 1 to 4.
- a shorter prefix code field is used in the second floating-point data to indicate the first bit width of the second exponent field, so that the second floating-point data can provide up to 4 bits of mantissa precision, and at the same time, the second floating-point data that only provides 1 bit of mantissa precision can represent a larger range of values, effectively balancing the relationship between the bit width, range and precision of the second floating-point data.
- the highest bit is hidden when the second exponent field is stored, which reduces the first bit width that needs to be stored in the second exponent field, and effectively avoids the first encoding of the second exponent field corresponding to the indication values of different prefix code fields.
- the problem of numerical overlap occurs in the value, so there is no redundant encoding in the HiFloat8 data format.
- the second floating-point data is determined based on a saturation method or an infinity method.
- the saturation mode can be to use the maximum floating point data that can be represented by the low-precision floating point data as the first floating point data.
- the infinity mode can be to use the infinite data of the low-precision floating point data as the first floating point data.
- the second floating point data after the precision conversion of the first floating point data can be represented as 8'b01101111.
- the second floating-point data is zero.
- the second floating-point data is zero, and the second floating-point data can be represented as 8’b01111110.
- the second floating-point data is a non-numeric value.
- the second floating-point data can be represented as 8’b11111110.
- Figure 9 is a flowchart of converting FP32 data to HiFloat8 data provided by an embodiment of the present application. Taking the conversion of FP32 data to HiFloat8 data as an example, applied to the conversion module, the conversion process includes the following process.
- the conversion module receives FP32 data, which includes a sign field S, an exponent field E[0:7], and a mantissa field M[0:22];
- bit width of the new prefix code field and the bit width of the exponent field of the HiFloat8 data are calculated based on the exponent field of the FP32 after the addition operation, and are represented by db1 and eb1 respectively;
- the electronic device includes hardware and/or software modules corresponding to the execution of each function.
- the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is executed in the form of hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application in combination with the embodiments, but such implementation should not be considered to be beyond the scope of the present application.
- This embodiment can divide the electronic device into functional modules according to the above method example.
- each functional module can be divided according to each function, or two or more functions can be integrated into one processing module.
- the above integrated module can be It is implemented in the form of hardware. It should be noted that the division of modules in this embodiment is schematic and is only a logical function division. There may be other division methods in actual implementation.
- FIG10 shows a possible composition diagram of the electronic device 100 involved in the above embodiment, as shown in FIG10 , FIG10 is a structural diagram of an electronic device provided in an embodiment of the present application.
- the electronic device 100 may include: a bit width calculation unit 101, a mantissa domain calculation unit 102, and a rounding operation unit 103.
- the bit width calculation unit 101 may be used to support the electronic device 100 in executing the above-mentioned steps 401, 4011, 4012, 4013, etc., and/or other processes for the technology described herein.
- the mantissa domain calculation unit 102 may be used to support the electronic device 100 in executing the above step 402 and/or other processes of the technology described herein.
- the rounding operation unit 103 may be used to support the electronic device 100 to execute the above-mentioned steps 403 , 4031 , 4032 , etc., and/or other processes for the technology described herein.
- the electronic device 100 provided in this embodiment is used to execute the above floating-point data precision conversion method, and thus can achieve the same effect as the above implementation method.
- the electronic device 100 may include a processing module, a storage module and a communication module.
- the processing module can be used to control and manage the actions of the electronic device 100, for example, it can be used to support the electronic device 100 to perform the steps performed by the above-mentioned bit width calculation unit 101, the mantissa domain calculation unit 102 and the rounding operation unit 103.
- the storage module can be used to support the electronic device 100 to store program codes and data, etc.
- the communication module can be used to support the communication between the electronic device 100 and other devices, such as communication with a wireless access device.
- the processing module can be a processor or a controller. It can implement or execute various exemplary logic boxes, modules and circuits described in conjunction with the disclosure of this application.
- the processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of digital signal processing (DSP) and a microprocessor, etc.
- the storage module can be a memory.
- the communication module can specifically be a radio frequency circuit, a Bluetooth chip, a Wi-Fi chip, or other devices that interact with other electronic devices.
- the processing module is a processor and the storage module is a memory
- the electronic device involved in this embodiment may be a server, a computer, and the like.
- the embodiment of the present application also provides an electronic device, including one or more processors and one or more memories.
- the one or more memories are coupled to the one or more processors, and the one or more memories are used to store computer program codes, and the computer program codes include computer instructions.
- the electronic device executes the above-mentioned related method steps to implement the floating-point data precision conversion method in the above-mentioned embodiment.
- An embodiment of the present application also provides a computer storage medium, in which computer instructions are stored.
- the computer instructions When the computer instructions are executed on an electronic device, the electronic device executes the above-mentioned related method steps to implement the floating-point data precision conversion method in the above-mentioned embodiment.
- the embodiments of the present application also provide a computer program product.
- the computer program product When the computer program product is run on a computer, the computer is caused to execute the above-mentioned related steps to implement the floating-point data precision conversion method executed by the electronic device in the above-mentioned embodiment.
- an embodiment of the present application also provides a device, which can specifically be a chip, component or module, and the device may include a connected processor and memory; wherein the memory is used to store computer execution instructions, and when the device is running, the processor can execute the computer execution instructions stored in the memory so that the chip executes the floating-point data precision conversion method executed by the electronic device in the above-mentioned method embodiments.
- the electronic device, computer storage medium, computer program product or chip provided in this embodiment is used to execute the corresponding method provided above. Therefore, the beneficial effects that can be achieved can refer to the beneficial effects in the corresponding method provided above and will not be repeated here.
- the disclosed devices and methods can be implemented in other ways.
- the device embodiments described above are only schematic.
- the division of the modules or units is only a logical function division. There may be other division methods in actual implementation.
- multiple units or components can be combined or integrated into another device, or some features can be ignored or not executed.
- Another point is that the coupling or direct coupling between each other shown or discussed is not necessarily a direct coupling between the two devices.
- the coupling or communication connection may be an indirect coupling or communication connection through some interface, device or unit, which may be electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place or distributed in multiple different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the present embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium.
- the technical solution of the embodiment of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions to enable a device (which can be a single-chip microcomputer, chip, etc.) or a processor (processor) to execute all or part of the steps of the method described in each embodiment of the present application.
- the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk and other media that can store program code.
Landscapes
- Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Analysis (AREA)
- Computing Systems (AREA)
- Computational Mathematics (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Nonlinear Science (AREA)
- Complex Calculations (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims (16)
- 一种浮点数据精度转换方法,其特征在于,第一浮点数据包括符号域、第一指数域和第一尾数域,第二浮点数据包括所述符号域、前缀码域、第二指数域和第二尾数域,所述前缀码域用于指示所述第二指数域的位宽,所述第一浮点数据的精度高于所述第二浮点数据的精度,所述方法包括:根据所述第一指数域的第一编码值确定所述前缀码域的第一位宽、所述前缀码域的第一编码值、所述第二指数域的第一位宽、所述第二指数域的第一编码值以及所述第二尾数域的第一位宽;确定所述第一尾数域中的保留编码值和舍弃编码值,所述保留编码值包括所述第一尾数域中从最高位开始,且位宽与所述第二尾数域的第一位宽相同的编码值;根据所述舍弃编码值对所述保留编码值进行舍入操作,得到所述第二尾数域的第一编码值。
- 根据权利要求1所述的方法,其特征在于,所述舍入操作包括进位操作和舍弃操作,所述根据所述舍弃编码值对所述保留编码值进行舍入操作,得到所述第二尾数域的第一编码值包括:所述舍弃编码值中从最高位开始,且位宽为预设位宽的编码值大于或等于第二预设阈值时,对所述保留编码值的最低位进行进位操作,对所述舍弃编码值进行舍弃操作,所述保留编码值进位后的编码值为所述第二尾数域的第一编码值;所述舍弃编码值中从最高位开始,且位宽为预设位宽的编码值小于所述第二预设阈值时,对所述舍弃编码值进行舍弃操作,所述保留编码值为所述第二尾数域的第一编码值;其中,所述第二预设阈值为所述舍弃编码值中从最低位开始,且位宽为预设位宽的编码值。
- 根据权利要求1所述的方法,其特征在于,所述舍入操作包括进位操作或舍弃操作,所述根据所述舍弃编码值对所述保留编码值进行舍入操作,得到所述第二尾数域的第一编码值包括:所述舍弃编码值的最高位大于或等于第一预设阈值时,对所述保留编码值的最低位进行进位操作,并对所述舍弃编码值进行舍弃操作,所述保留编码值进行进位操作后得到的编码值为所述第二尾数域的第一编码值;所述舍弃编码值的最高位小于所述第一预设阈值时,对所述舍弃编码值进行舍弃操作,所述保留编码值为所述第二尾数域的第一编码值。
- 根据权利要求2或3所述的方法,其特征在于,所述方法还包括:判断进位操作后的所述保留编码值是否溢出;若进位操作后的所述保留编码值溢出,则对所述第一指数域的第一编码值的最低位执行加1操作,得到所述第一指数域的第二编码值;根据所述第一指数域的第二编码值确定所述第二指数域的第二位宽和所述前缀码域的第二位宽;若所述前缀码域的第二位宽和所述前缀码域的第一位宽不同,根据所述第一指数域的第二编码值确定所述前缀码域的第二编码值、所述第二指数域的第二编码值、所述第二尾数域的第二位宽和所述第二尾数域的第二编码值;若所述前缀码域的第二位宽和所述前缀码域的第一位宽相同,判断所述第二指数域的第一位宽和所述第二指数域的第二位宽是否相同;若所述第二指数域的第二位宽小于所述第二指数域的第一位宽,对所述保留编码值的位宽进行加1操作,得到所述第二尾数域的第二位宽和所述第二尾数域的第二编码值;若所述第二指数域的第二位宽大于或等于所述第二指数域的第一位宽,对所述保留编码值的最低位进行舍弃操作,得到所述第二尾数域的第二位宽和所述第二尾数域的第二编码值。
- 根据权利要求1所述的方法,其特征在于,所述根据所述第一指数域的第一编码值确定所述前缀码域的第一位宽、所述前缀码域的第一编码值、所述第二指数域的第一位宽和所述第二指数域的第一编码值包括:根据所述第一指数域的第一编码值确定指示值,通过查表确定与所述指示值对应的所述前缀码域的第一位宽和所述前缀码域的第一编码值,所述指示值还用于指示所述第二指数域的第一位宽;根据所述第一指数域的第一编码值确定所述第二指数域的第一位宽对应的第一编码值。
- 根据权利要求1或5所述的方法,其特征在于,所述根据所述第一指数域的第一编码值确定所述第二尾数域的第一位宽包括:根据所述第二浮点数据的总位宽、所述前缀码域的第一位宽、所述第二指数域的第一位宽确定所述第二尾数域的第一位宽。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:所述第一浮点数据超出所述第二浮点数据的数据范围的上限时,基于饱和方式或无穷大方式确定第二浮点数据;所述第一浮点数据超出所述第二浮点数据的数据范围的下限时,所述第二浮点数据为零;所述第一浮点数据为非数字值时,所述第二浮点数据为非数字值。
- 一种浮点数据精度转换装置,其特征在于,第一浮点数据包括符号域、第一指数域和第一尾数域,第二浮点数据包括所述符号域、前缀码域、第二指数域和第二尾数域,所述前缀码域用于指示所述第二指数域的位宽,所述第一浮点数据的精度高于所述第二浮点数据的精度,所述装置包括:位宽计算单元,用于根据所述第一指数域的第一编码值确定所述前缀码域的第一位宽、所述前缀码域的第一编码值、所述第二指数域的第一位宽、所述第二指数域的第一编码值以及所述第二尾数域的第一位宽;尾数域计算单元,用于确定所述第一尾数域中的保留编码值和舍弃编码值,所述保留编码值包括所述第一尾数域中从最高位开始,且位宽与所述第二尾数域的第一位宽相同的编码值;舍入操作单元,用于根据所述舍弃编码值对所述保留编码值进行舍入操作,得到所述第二尾数域的第一编码值。
- 根据权利要求8所述的装置,其特征在于,所述舍入操作包括进位操作或舍弃操作,所述舍入操作单元还用于:所述舍弃编码值中从最高位开始,且位宽为预设位宽的编码值大于或等于第二预设阈值时,对所述保留编码值的最低位进行进位操作,对所述舍弃编码值进行舍弃操作,所述保留编码值进位后的编码值为所述第二尾数域的第一编码值;所述舍弃编码值中从最高位开始,且位宽为预设位宽的编码值小于所述第二预设阈值时,对所述舍弃编码值进行舍弃操作,所述保留编码值为所述第二尾数域的第一编码值;其中,所述第二预设阈值为所述舍弃编码值中从最低位开始,且位宽为预设位宽的编码值。
- 根据权利要求8所述的装置,其特征在于,所述舍入操作包括进位操作或舍弃操作,所述舍入操作单元还用于:所述舍弃编码值的最高位大于或等于第一预设阈值时,对所述保留编码值的最低位进行进位操作,并对所述舍弃编码值进行舍弃操作,所述保留编码值进行进位操作后得到的编码值为所述第二尾数域的第一编码值;所述舍弃编码值的最高位小于所述第一预设阈值时,对所述舍弃编码值进行舍弃操作,所述保留编码值为所述第二尾数域的第一编码值。
- 根据权利要求9或10所述的装置,其特征在于,所述装置还包括:溢出单元,用于判断进位操作后的所述保留编码值是否溢出;所述位宽计算单元还用于若进位操作后的所述保留编码值溢出,则对所述第一指数域的第一编码值进行加1操作,得到所述第一指数域的第二编码值;根据所述第一指数域的第二编码值确定所述第二指数域的第二位宽和所述前缀码域的第二位宽;若所述前缀码域的第二位宽和所述前缀码域的第一位宽不同,根据所述第一指数域的第二编码值确定所述前缀码域的第二编码值、所述第二指数域的第二编码值、所述第二尾数域的第二位宽和所述第二尾数域的第二编码值;若所述前缀码域的第二位宽和所述前缀码域的第一位宽相同,判断所述第二指数域的第一位宽和所述第二指数域的第二位宽是否相同;若所述第二指数域的第二位宽小于所述第二指数域的第一位宽,对所述保留编码值的位宽进行加1操作,得到所述第二尾数域的第二位宽和所述第二尾数域的第二编码值;若所述第二指数域的第二位宽大于或等于所述第二指数域的第一位宽,对所述保留编码值的最低位进行舍弃操作,得到所述第二尾数域的第二位宽和所述第二尾数域的第二编码值。
- 根据权利要求8所述的装置,其特征在于,所述位宽计算单元还用于:根据所述第一指数域的第一编码值确定指示值,通过查表确定与所述指示值对应的所述前缀码域的第一位宽和所述前缀码域的第一编码值,所述指示值还用于指示所述第二指数域的第一位宽;根据所述第一指数域的第一编码值确定所述第二指数域的第一位宽对应的第一编码值。
- 根据权利要求8至12任一项所述的装置,其特征在于,所述位宽计算单元还用于:根据所述第二浮点数据的总位宽、所述前缀码域的第一位宽、所述第二指数域的第一位宽确定 所述第二尾数域的第一位宽。
- 根据权利要求8所述的装置,其特征在于,所述位宽计算单元还用于:所述第一浮点数据超出所述第二浮点数据的转换范围的上限时,基于饱和方式或无穷大方式确定所述第二浮点数据;所述第一浮点数据超出所述第二浮点数据的转换范围的下限时,所述第二浮点数据为零;所述第一浮点数据为非数字值时,所述第二浮点数据为非数字值。
- 一种计算机可读存储介质,其特征在于,包括计算机指令,当计算机指令在电子设备上运行时,使得电子设备执行上述权利要求1-7中的任一项所述的方法。
- 一种计算机程序产品,其特征在于,当计算机程序产品在计算机或处理器上运行时,使得所述计算机或所述处理器执行上述权利要求1-7中的任一项所述的方法。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP23878671.9A EP4597299A4 (en) | 2022-10-19 | 2023-06-25 | METHOD AND APPARATUS FOR CONVERTING PRECISION DATA INTO FLOATING POINTS |
| US19/181,941 US20250278241A1 (en) | 2022-10-19 | 2025-04-17 | Floating-point data precision conversion method and apparatus |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211281416.6 | 2022-10-19 | ||
| CN202211281416.6A CN117908827A (zh) | 2022-10-19 | 2022-10-19 | 浮点数据精度转换方法和装置 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US19/181,941 Continuation US20250278241A1 (en) | 2022-10-19 | 2025-04-17 | Floating-point data precision conversion method and apparatus |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024082674A1 true WO2024082674A1 (zh) | 2024-04-25 |
Family
ID=90695281
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/102089 Ceased WO2024082674A1 (zh) | 2022-10-19 | 2023-06-25 | 浮点数据精度转换方法和装置 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20250278241A1 (zh) |
| EP (1) | EP4597299A4 (zh) |
| CN (1) | CN117908827A (zh) |
| WO (1) | WO2024082674A1 (zh) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118170347B (zh) * | 2024-05-11 | 2024-11-26 | 北京壁仞科技开发有限公司 | 精度转换装置、数据处理方法、处理器、电子设备 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130282780A1 (en) * | 2012-04-23 | 2013-10-24 | Lsi Corporation | Method and Apparatus to Perform Floating Point Operations |
| CN104778026A (zh) * | 2015-04-28 | 2015-07-15 | 浪潮电子信息产业股份有限公司 | 一种带simd的高速数据格式转换部件及转换方法 |
| CN111340207A (zh) * | 2020-03-03 | 2020-06-26 | 南京大学 | 浮点数转换方法及装置 |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US3742198A (en) * | 1971-03-19 | 1973-06-26 | Bell Telephone Labor Inc | Apparatus for utilizing a three-field word to represent a floating point number |
| US8106914B2 (en) * | 2007-12-07 | 2012-01-31 | Nvidia Corporation | Fused multiply-add functional unit |
| US9582248B2 (en) * | 2014-09-26 | 2017-02-28 | Arm Limited | Standalone floating-point conversion unit |
-
2022
- 2022-10-19 CN CN202211281416.6A patent/CN117908827A/zh active Pending
-
2023
- 2023-06-25 EP EP23878671.9A patent/EP4597299A4/en active Pending
- 2023-06-25 WO PCT/CN2023/102089 patent/WO2024082674A1/zh not_active Ceased
-
2025
- 2025-04-17 US US19/181,941 patent/US20250278241A1/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130282780A1 (en) * | 2012-04-23 | 2013-10-24 | Lsi Corporation | Method and Apparatus to Perform Floating Point Operations |
| CN104778026A (zh) * | 2015-04-28 | 2015-07-15 | 浪潮电子信息产业股份有限公司 | 一种带simd的高速数据格式转换部件及转换方法 |
| CN111340207A (zh) * | 2020-03-03 | 2020-06-26 | 南京大学 | 浮点数转换方法及装置 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4597299A4 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN117908827A (zh) | 2024-04-19 |
| US20250278241A1 (en) | 2025-09-04 |
| EP4597299A4 (en) | 2025-12-17 |
| EP4597299A1 (en) | 2025-08-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN115934030B (zh) | 算数逻辑单元、浮点数乘法计算的方法及设备 | |
| CN112230881B (zh) | 浮点数处理器 | |
| WO2022143432A1 (zh) | 一种矩阵计算装置、方法、系统、电路、芯片及设备 | |
| CN106951211B (zh) | 一种可重构定浮点通用乘法器 | |
| CN110515589B (zh) | 乘法器、数据处理方法、芯片及电子设备 | |
| US20230305803A1 (en) | Method for Processing Floating Point Number and Related Device | |
| WO2023029464A1 (zh) | 数据处理装置、方法、芯片、计算机设备及存储介质 | |
| CN111381808B (zh) | 乘法器、数据处理方法、芯片及电子设备 | |
| CN118915995A (zh) | 运算单元、浮点数运算方法及装置 | |
| WO2023124235A1 (zh) | 多输入浮点数处理方法、装置、处理器及计算机设备 | |
| CN115586922A (zh) | 一种存储与计算格式解耦的SpMV混合精度优化方法 | |
| CN116974517A (zh) | 浮点数处理方法、装置、计算机设备和处理器 | |
| US20250278241A1 (en) | Floating-point data precision conversion method and apparatus | |
| US20260003571A1 (en) | Floating-Point Data Precision Conversion Method and Apparatus | |
| CN113791756B (zh) | 转数方法、存储介质、装置及板卡 | |
| CN117910537A (zh) | 一种神经网络训练方法及装置 | |
| CN116882475A (zh) | 应用于神经网络的训练方法及装置以及相关产品 | |
| CN111310909A (zh) | 一种浮点数转换电路 | |
| CN209895329U (zh) | 乘法器 | |
| WO2019205064A1 (zh) | 神经网络加速装置与方法 | |
| CN116502028B (zh) | 基于浮点数压缩技术的大规模fft实现方法及装置 | |
| CN111313906A (zh) | 一种浮点数的转换电路 | |
| CN121411824A (zh) | Alu、处理器、芯片产品及设备 | |
| WO2025107602A1 (zh) | 数据处理方法和数据处理装置 | |
| CN121235129A (zh) | 基于数据量化的大模型优化方法及装置 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23878671 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023878671 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2023878671 Country of ref document: EP Effective date: 20250428 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 11202502549V Country of ref document: SG |
|
| WWP | Wipo information: published in national office |
Ref document number: 11202502549V Country of ref document: SG |
|
| WWP | Wipo information: published in national office |
Ref document number: 2023878671 Country of ref document: EP |