Xilinx FFT IP core function implementation and simulation

The FFT algorithm is an efficient algorithm for calculating DFT. The algorithm was first proposed by JW Cooley and JW Tukey in 1965. Later, new algorithms emerged. In general, there are two development directions: one is the algorithm for N equal to 2 integer power, such as base 2 algorithm, base 4 Algorithms and split-base algorithms, etc.; the other is an algorithm with N not equal to 2 integer power, such as prime factor algorithm, Winograd algorithm, and so on. The base 2 algorithm is the commonly used FFT algorithm. The core idea is to decompose the sequence of N points into (N-1)/2 points successively, and finally decompose it into 2 points DFT for calculation, thus eliminating a lot of repeated operations in DFT. .

The FFT algorithm can decompose the sequence from the time domain or the frequency domain:

1 time extraction method (DIT), that is, the sequence x(n) is directly divided into odd sub-sequences and even sub-sequences by odd and even, and then the DFT of the entire sequence is realized by calculating the DFT of the sub-sequence;

2 Frequency extraction method (DIF), that is, the sequence number k of the frequency domain X(k) is successively decomposed into an even point subsequence and an odd point subsequence according to odd and even, and then the DFT of the subsequence is calculated to obtain a DFT in the entire frequency domain.

The computational complexity and the required computational complexity of the time extraction method and the frequency extraction method are the same, and the decomposition forms of the two methods are known: the time extraction method needs to reorder the input data sequence x(n), the frequency The decimation method needs to sort the output data sequence X[k].

At present, FFT algorithms have been widely used in many fields such as digital signal processing, image processing, oil exploration and earthquake prediction. At the same time, in order to facilitate the application of FFT algorithms in engineering practice, major FPGA manufacturers have also launched IP module libraries with related functions. The IP core Fast Fourier Transform V7.1 developed by Xilinx provides a variety of optional calculation parameters, structures, and data input and output streams in the FFT algorithm. The FFT algorithm can be conveniently implemented according to user requirements.

2 Xilinx FFT IP core function implementation

Xilinx IP core functions are hardware description language (HDL) design files based on complex system functions that are optimized for the structure of all Xilinx FPGA devices and provide functional simulation of hardware description language (VHDL, Verilog) Models can be designed and debugged in standard EDA simulation tools.

Xilinx FFT IP core V7.1 is introduced by Xilinx's FPGA development tool ISE14.1. Its maximum system clock frequency is 550MHz, the maximum data throughput is 550MSPS, and the maximum 655 operation is 65536 points. The data and phase factor bits are 34 bits wide and support all major Xilinx FPGA chips. At the same time, the Xilinx FFT IP core V7.1 can implement FFT transform and FFT inverse transform (IFFT) with a transform length of N points in real or complex form. The value range of N is (8~65536). The real part and the imaginary part of the input data are expressed in two's complement form with a bit width of M bits. The value range of M is (8~34); similarly, the range of the value of the phase factor is also (8~34). Data, phase factor, and cache data for reordering of output data can be stored in block RAM or distributed RAM during FFT implementation. For Burst I/O structures, block RAM can store data and phase factors of any number of points, while distributed RAM can only store data and phase factors with no more than 1024 points. For Streaming I/O structures, mixed storage can be used. The method first selects the number of orders using the block RAM memory and then uses the distributed RAM for the rest.

The Xilinx FFT IP core has four architectures to choose from. Users can choose between the number of logical resources used and the length of the conversion time, as follows:

1 Pipeline, Streaming I/O structure: Allows continuous data processing and uses the most logical resources.

2 Base 4, Burst I/O structure: Provides data import/export phase and processing phase, which are performed separately when importing data and processing data. This structure has a smaller structure, but the conversion time is longer.

3 Base 2, Burst I/O structure: uses less logic resources, and provides a two-stage process in the same base stage.

4 Base 2 Lite Burst I/O structure: This is a variant based on the base 2 structure. The time division multiplexing method uses the least amount of logic resources, but the conversion time is the longest.

For the Burst I/O structure, the DIT extraction method is used; the pipeline, the Streaming I/O structure uses the DIF extraction method.

In the actual hardware operation, the execution speed of the module is a very important parameter, so this paper is based on the pipeline, the simulation verification of the Streaming I/O structure, and continuous data processing. Pipeline, Streaming I / O structure for a series of base 2 butterfly processing engine using pipeline technology, and each butterfly processing engine has its own independent memory to store input data and intermediate data. Under this structure, the FFT IP core has the ability to simultaneously process the current frame N point data, load the next frame N point data, and output the previous frame N point data.

Xilinx FFT IP Core V5.0 supports three algorithm types: full precision uncompressed, block floating point and fixed point compression (compression ratio is user-defined). For a full-precision uncompressed structure, any meaningful integer in the data channel will be preserved, and the fractional part generated during the operation will be truncated or rounded. With this structure, for the fixed-point algorithm, after multi-stage multiplication operation, the data bit width will be doubled, and the output bit width is (input bit width + log2 (data conversion length) +1) bits. For block floating-point type, the same compression ratio is used for any data point in one frame of data. This compression ratio is displayed by the block index as the output value, and only when the FFT IP core detects that data overflow will occur. The compression operation will be performed.

Battery

Storage Battery,Auto Battery,Car Battery,Autocraft Battery

SUZHOU DEVELPOWER ENERGY EQUIPMENT CO.,LTD , https://www.fisoph-power.com