The modern Field programmable Gate Arrays (FPGA) abundant hardware resources, flexibility, scalability and improved performance in create a promising solution for on-board high performance, low power computational platforms for real time remote sensing hyperspectral image. This work is focused on developing a target detection architecture based on Reed-Xiaoli (RX) algorithm on FPGAs for real-time target detection using hyperspectral data.
The Reed-Xiaoli (RX) Floating point architecture is modeled in Very High Speed Integrated Circuit Hardware Description Language (VHDL). The VHDL model is simulated and, synthesized for FPGA or Application Specific integrated Circuit (ASIC) devices. The hardware implementation gives massive parallel pipeline realization of algorithm and, the FPGA allows fast IC prototyping and low cost Modifications. The processing time provided by the FPGA hardware implementation is significantly simplified and much higher than that performed by the onboard multiple microprocessors using software while reducing power usage.
ACKNOWLEDGEMENT
I'm gratitude to my graduate advisor, for his illuminating advice, support and constant encouragement in completing thesis. I would also like to thank my friends for their valuable suggestions in completing my thesis work.
TABLE OF CONTENTS
Page
ABSTRACT iii
ACKNOWLEDGEMENTS iv
TABLE OF CONTENTS v
LIST OF FIGURES vii
LIST OF TABLES viii
CHAPTER I INTRODUCTION 1
Hyperspectral Image
Related work
Application
CHAPTER II SYSTEM OVERVIEW 3
2.1 Real Time 3
2.2 Detection Algorithm
2.3 Hardware Architecture of BIP 4
2.4 Hardware realization of Matrix Multiplication 5
CHAPTER III FPGA IMPLEMENTATION
3.1 FPGA: Field Programming Gate Array
3.2 FPGA Design Flow
3.3 Floating Point Representation
CHAPTER IV RESULTS 15
REFERENCES 20
APPENDIX A Matlab Code and Output
APPENDIX B VHDL 37
APPENDIX C Synthesis report for vertex 6 FPGA
APPENDIX D Synthesis report for Startix III FPGA
VITA
LIST OF FIGURES
Page
Figure 1:Real-Time System Detection
Figure 2: Architecture for BIP 7
Figure 3: Pipelined operation 9
Figure 4: Parallel Architecture 12
Figure 5: Block Diagram Rx Algorithm
Figure 6: Simulation Environment
Figure 7: Compilation and synthesized tool
Figure 8: Floor plan
Figure 9: Timing Analysis
Figure 10: configure the LEs
Figure 11: Simulation output 21
Figure 12: Schematic Luts
LIST OF TABLES
Page
Table 1: Vertex 6 FPGA Device Summary
Table 2: FPGAs Device Utilization Summary 5
CHAPTER I
INTRODUCTION
The "hyper" in the term hyperspectral conveys "over" as in "too many" and pertains to the huge number of measured wavelength bands. Compared to the information extracted by other type of remotely sensed data, hyperspectral imagery gives the potential for more accurate and elaborate information. Hyperspectral images offer ample spectral information to recognize and identify spectrally unique materials. Hence they are spectrally over determined. [7].
Traditionally, hyperspectral images are downlinked to ground stations, where data are calculated and examined. As a result, a long turnaround time is obtained, which is a major bottleneck in remote sensing applications. To yield immediate products to assist fast decision-making, real-time processing and analysis are essential. It is extremely desirable that onboard processing can be accomplished. Real-time onboard implementation is greatly required to influence immediate decision-making in vital circumstances. Hence, in this research, real-time data analysis is started immediately after data are received onboard. Thus, intermediate results are available before data collection is finished [1].
The growing demand for real-time processing has been found in signal and image processing fields such as the internet communications, web browsing, telemedicine, etc. In various real-time applications, practical implementation has the benefit of computing data online and renders timely analysis to determine and resolve acute situations. Various algorithms have been created earlier for hyperspectral or multispectral image processing. Due to recent advances of remote sensing instrument technology in applications, images gained by high resolution airborne sensors, like the 224-band airborne visible or infrared imaging spectrometer (AVIRIS) operated by NASA's Jet Propulsion Laboratory, Pasadena, CA, and the 210-band hyperspectral digital imagery collection experiment (HYDICE), can now enlarge applications to law enforcement, battlefield, reconnaissance and surveillance, environmental monitoring, disaster and damage control, etc., where real-time processing becomes essential and provides quick assessment [16].
Numerous designers are currently carrying out various research projects whose goal is the growth of specific circuits for the static and dynamic image processing. These developments are realized in devices such as Field Programmable Gate Array (FPGA) or ASIC, usually by mean of standard hardware description languages such as VHDL [4].
VHDL allows the use a single language throughout all the process of design. But, only this is not enough to do a successfully project. Complex algorithms used to process images and image sequences make necessary to do simulations of his operation to verify the fulfillment of the specifications under which it has been designed [5-6].
Modern FPGAs can accommodate multimillion gates on a single chip. During the last decade, the logic density, functionality and speed of FPGA have improved considerably. Modern FPGAs are now capable of running at speed beyond 500 MHz [8]. FPGAs have a potential for dynamic reconfiguration [9]; that is, reprogramming part of the device at run time so that resources can be reused through time multiplexing. This is an important feature of FPGAs
The contributions of this research thesis is that the proposed design will be providing a better speed and also it requires less area to diminish computation time such that hardware implementation is substantially simplified, as favored for onboard processing.
1.1 Hyperspectral Image
Analyze some fundamental spectral remote sensing concepts initially to understand the benefits of hyperspectral imagery. It may be recalled here that each photon of light has a wavelength defined by its energy level. Light and various other forms of electromagnetic radiation are normally described in terms of their wavelengths. For instance, visible light has wavelengths in the range 0.4 and 0.7 microns, but radio waves have wavelengths larger than about 30 cm [7].
A reflectance spectrum depicts the reflectance of a material measured across a range of wavelengths. Reflectance is the percentage of the light hitting a material that is then reflected by that material.
Certain materials will absorb some wavelengths of light, while other materials will reflect the same wavelengths. These patterns of absorption and reflectance over wavelengths can uniquely distinguish some materials. The resulting spectra appear to be uninterrupted and form continuous curves. If a spectrometer is utilized in an imaging sensor, the resulting images capture a reflectance spectrum for every pixel in the image [7].
Hyperspectral imaging sensors, also called imaging spectrometer are basically advanced digital color cameras with fine spectral resolution at given wavelengths of illumination. Instead of measuring three primary colors-red, green, and blue-these sensors measure the radiation reflected by each pixel at a huge number of invisible or visible frequency (or wavelength) bands [15].
The spectral radiance information in a scene to detect target objects, vehicles, and camouflage in open areas, shadows, and tree lines is measured by hyperspectral imaging sensors. Imaging sensors on satellites or aircraft gather this spectral information, which is a combination of sunlight (the most common form of illumination in the visible and near infrared), atmospheric attenuation, and object spectral signature. The sensor measures the intensity of the energy in different parts of the spectrum and also detects the energy reflected by surface materials. A hyperspectral data set is obtained by processing this information. Each pixel in this data set contains a high-resolution spectrum which is used to identify the materials present in the pixel by an analysis of reflectance or emissivity [15]. The simultaneous spatial and spectral character of the data can be visualized as data cube a set of spectra (left), each for a single pixel, or as a stack of images (right), each for a single spectral channel.
Related Work
Three performance metrics to evaluate FPGA based designs are: speed, area, and power (energy). Extensive previous work has been done towards the design and realization of an FPGA based system
The design and implementation of a high performance, completely parallel matrix multiplier core was presented by Belkacemi et al. [10]. For the first time, partial reconfigurability feature was exploited for the computation of matrix multiplication by Jianwen et al. in [12]. The possibility of changing the design implementation without stopping the whole execution process is offered by partially reconfigurable devices. The matrix multiplier was implemented in Xilinx Virtex-II device, which supports partial reconfiguration. The design was evaluated in terms of latency and area and it was found that area is reduced compared to [10] and the performance further improves for larger matrices.
Major amount of power is consumed by the programmable interconnects, while the remaining power is consumed by the clocking, logic, and I/O blocks in FPGA devices. Resource utilization and switching activity are the other sources of power dissipation in FPGAs. Conventionally, speed and area are the performance metrics for FPGA based designs. With the development of portable devices, it has become increasingly important that the systems are also energy efficient and consume less power. [11].
1.3 Application Domains
The algorithm used is expressed as a series of operation in many existing applications. Some of these applications include image and signal processing which are summarized as follows
To detect and map an extensive variety of materials having charateristic reflectance spectra, hyperspectral imagery has been applied. For instance, vegetation scientists have successfully used hyperspectral imagery to identify vegetation species. Also geologists use hyperspectral images for mineral mapping and to detect soil properties including moisture, organic content, and salinity. [7].
Military personnel have used hyperspectral imagery to notice military vehicles under partial vegetation canopy, study plant canopy chemistry and detect vegetation stress and other varied military target detection objectives [7].
New sensors provide more hyperspectral imagery and new image processing algorithms continue to be developed. Hence, in future hyperspectral imagery is going to become one of the wide known research, exploration and monitoring technologies which will be used in broader areas. Compared to the information obtained earlier, hyperspectral sensors and analyses have provided more information from remotely sensed imagery.
CHAPTER II
SYSTEM OVERVIEW
This chapter provides an overview of real-time implementation of hyperspectral image, the Rx target detection algorithm used for real-time implementation, then a realization of Hardware Architecture and its internal operation.
2.1 Real-time implementation
In remote sensing of hyperspectral image, real-time implementation is required to overcome the disadvantage of time-consuming round about time resulted in the traditional method, where hyperspectral images are sent to ground station for processing. In order to reduce the computation of data processing time in real-time implementation in our research
Research is focused on detection and classification, and the algorithms we implement in real-time include the computation of data correlation matrix R or covariance matrix and their inverses, or . Therefore, their real-time implementation becomes how to update the or as pixels being received. The system for fast real-time detection or classification is summarized as follows[1]. The block diagram is shown in Figure 1
Figure 1: Real-time onboard detection or classification pipeline stages [1]
In our research, a hyperspectral image is acquired from left to right and from top to bottom. An image has N pixels and L spectral bands. Let a pixel vector be denoted as x, and the entire data matrix X = [x1, x2,…, xN]. The target to be detected is denoted as d; if the p targets to be classified are known, then D = [d1, d2,…, dp]. Input image Xt is received at time t.
A sample Correlation matrix R can be defined as
Where,
R: Correlation matrix
N: No of pixels
X: Data matrix
A is initiated using a small random matrix of size LxL for the BIP format. Then detector or classifier is constructed using the algorithm, the detection or classification result for the pixels that have been received up to time t is provided. Three real-time processing fashions exit to fit the three remote sensing data formats: pixel-by-pixel processing for BIP format, line-by-line processing for BIL format, and band-by-band processing for BSQ format. In the pixel- by- pixel fashion, a pixel vector is processed right after it is received and the analysis result is generated within an acceptable delay; in the line-by-line fashion, line of pixel vectors is processed after the entire line is received. Replace the data covariance matrix with the data correlation matrix R to speed up the process and simplify hardware implementation[1].
2.2 Detection Algorithm
If the desired target is unknown, then the pixel itself XT acts as the term to be matched, it is then ideal to use well-known Reed-Xiaoli (RX) algorithm for anomaly detection[1].
The focus is on this algorithm because of the following reasons:
They can outperform other existing algorithms due to their excellent performance in background suppression.
They are suitable to remote sensing images in an unknown circumstance since they require least prior information.
They can be easily implemented in real-time with simple and similar hardware architecture.
Reed-Xiaoli (RX) algorithm
The well-known RX algorithm is a detector for anomalies, defined as pixels whose spectral signatures are different from their surroundings. It is given by
Where,
R: Correlation matrix
R-1: Inverse of Correlation matrix
: Correlation matrix without being normalized by the number of pixels
Xt: Data matrix
Thus, in order to implement the detection or classification algorithms in real-time, it depends on the adaptation of R-1. In other words, R-1 at time t should be quickly calculated by updating the previous R-1 at time t - 1 using the data received at time t, without completely recalculating the R and R-1. As a result, the intermediate data analysis result (e.g., target detection or classification) is available in support of decision-making even when the entire data set is not received; and when the entire data set is received, the data analysis result is finalized. Remote sensing image usually has high-spatial correlation; using a small part of pixels can still capture the major data statistics[1].
2.3 Hardware architecture BIP format
A parallel pipeline hardware realization of where data are arriving one pixel vector at the time is illustrated in Figure 2. In this module is updated each time a new selected pixel vector is processed. Parallel Multiply Accumulators calculate result of the matrix vector multiplication, which is pipelined to calculate the required scalar inner product and outer vector product matrix [1]:
Figure 2: Architecture for BIP [1]
Here in (1) is a scalar. Hence no matrix inversion is involved in each adaptation. Various design techniques and architectures were considered to optimize the performance. Image processing applications for real-time usually works in a pipeline form [6]. As shown in Figure 2, parallel array architecture is used for hardware realization. The proposed parallel architecture consists of identical processing elements (PEs) and the number of PE depends on the size of the matrices. Each PE performs the necessary multiply accumulate (MAC) operation. Each PE operates independently with connection only to the input and output ports. This greatly helps in reducing the interconnection between the PEs and as a result the hardware resource utilization is minimized [13].
2.4 Hardware Realization
Various design techniques and architectures were considered to optimize the performance of the matrix multiplier. The nature of matrix multiplication is such that they are perfectly suited to parallel exploitation of the matrix multiplication C = AÃ-B of two matrices A and B is conformable, if the number of columns of A is equal to the number of rows of B[15]. Matrix multiplication is based on the in equation. In the shown equation i, j, and k are the loop indices. The loop body consists of a single recurrence given by equation
C [i,j] = C[i,j] + A[i,k] Ã- B[k,j]
Where, A [i,k] and B [k,j] are input variables and their values are needed to execute this loop. C [i,j] is an output variable .
Pipelined Operation
To improve the performance, both pipelining and parallel processing techniques are used. In high speed, massive throughput is a requirement and this is achieved by using pipelined architecture that computes the product of two matrices. Pipelining is achieved by inserting registers at appropriate places as shown in Figure 3.
Figure 3: Pipelined operation.
Parallel Processing
Parallel processing is exploited in the design, where the matrix multiplication operation is divided into several smaller operations that are executed in parallel and finally the output is obtained by accumulating all the partial results generated from these parallel processes. This results in higher frequency of operation [1] [13].
Figure 4: Parallel Architecture.
Parallel array architecture as shown in Fig. 4 is used. The parallel array is similar to traditional systolic array except with slight modification. The parallel architecture consists of identical processing elements (PEs) and the number of PE depends on the size of the matrices. Each PE performs the necessary multiply accumulate (MAC) operation. Each PE operates independently with connection only to the input and output ports. This greatly helps in reducing the interconnection between the PEs and as a result the hardware resource utilization is minimized [23].
CHAPTER III
FPGA IMPLEMENTATION
This chapter provides information's to Field Programmable Gate Array (FPGA) technology and a development hardware description languages - VHDL. Then the chapter reviews construction of floating point arithmetic and implementation.
In order to take advantage of the computational specialization and the inherent parallelism, a proper platform for the implementation is the FPGA, applications that permit parallelization. FPGAs can be used as dedicated computers in order to perform certain computations at very high frequencies and alternative to custom ASIC technology.
3.1 FPGA: Field Programmable Gate Array
Field Programmable Gate Arrays (FPGA) is integrated circuits where the functionality can be modified in the field after the fabrication. Therefore, FPGA can be customized for different application as long as the device itself is complex enough to store the logic.
FPGAs contain many built-in system-level blocks. These features allow logic designers to build the highest levels of performance and functionality into their FPGA-based systems. FPGAs are a programmable alternative to custom ASIC technology. FPGAs offer the best solution for addressing the needs of high-performance logic designers, high-performance DSP designers, and high-performance embedded systems designers with unprecedented logic, DSP, connectivity, and soft microprocessor capabilities.
Advantages of Using FPGAs
Fully programmable alternative to a customized chip
Used to implement functions in hardware
Hardwired logic is very fast
Can interface to outside world
Custom hardware/peripherals
Custom co/processors
Can perform bit-level operations not suited for traditional CPU/MPU
In this research, the Xilinx Virtex 6 and , Altera Startix III is used.
The Virtex®-6 families provides the newest, most advanced features in the FPGA market. Built on a 40 nm state-of-the art copper process technology, A single Virtex-6 FPGA CLB comprises two slices, with each containing four 6-input LUTs and eight Flip-Flops (twice the number found in a Virtex-4 FPGA slice), for a total of eight 6-LUTs and 16 Flip-Flops per CLB. Each DSP48E1 slice contains a 25 x 18 multiplier, an adder, and an accumulator. Block RAMs are fundamentally 36 Kbits in size. Each block can also be used as two independent 18 Kb blocks.
Table 1: Vertex - 6 FPGA Summary
3.2 FPGA Design Flow
Several steps are necessary for implementing customized functions on FPGA chips.
They are
• Design Entry - the desired circuit is specified either by using a hardware description language, such as VHDL, or by means of a schematic diagram
Figure 5: Block Diagram Rx Algorithm
VHDL: The language
To construct the datapath, a schematic approach can be used for simple design but it may not be practical to implement some real life applications which often involve thousands of logic gates. Therefore, it is necessary to use a hardware description language such as VHDL [IEE02] when implementing complex logic on the hardware or FPGA. Even though the programming language and hardware description language share some properties like variables versus signal, the nature of hardware description language is totally different from programming language.
Hardware description language, as the name suggests, is used to describe the hardware functionality. Unlike normal programming languages, hardware description languages may run several operations in parallel and explicit specification of the timing is required to make the design work.
• Functional Simulation - the synthesized circuit is tested to verify its functional correctness; the simulation does not take into account any timing issues. A VHDL design is simulated after it has been compiled. The simulator may then be invoked with the name of the configuration or entity/architecture pair
Figure 6: Simulation Environment
• Compilation and Synthesis - the CAD Synthesis tool synthesizes the circuit into a netlist that gives the logic elements (LEs) needed to realize the circuit and the connections between the Les
The VHDL code is processed by several synthesis tools that analyze the code and generate an implementation of it for the FPGAs or the target chip. These tools are controlled by the application program called the Compiler
Figure 7: Compilation and synthesized tool
The synthesis Report provides a lot of information that may be of interest to the designer, such as the speed, area of the implemented circuit. The Synthesis tool performs analysis to determine the expected performance of the circuit.
• Fitting - the CAD Fitter tool determines the placement of the LEs defined in the netlist into the LEs in an actual FPGA chip; it also chooses routing wires in the chip to make the required connections between specific Les
Figure 8: Floor plan
• Timing Simulation - the fitted circuit is tested to verify both its functional correctness and timing. A good measure of the speed is the maximum frequency at which the circuit can be clocked, referred to as fmax. This measure depends on the longest delay along any path between two registers clocked by the same clock.
Figure 9: Timing Analysis
• Programming and Configuration - the designed circuit is implemented in a physical FPGA chip by programming the configuration switches that configure the LEs and establish the required wiring connections
Figure 10: Configure the LEs
In order to program the FPGA, a bit stream is generated by the design tool. The VHDL code will be synthesized into a netlist. The netlist will contain the representation of the hardware such as the function of each basic blocks and the connection between the blocks. The design tool will extract the information in the netlist and map the logical blocks and connect to the specific lookup table. Bit stream will customize the functionality of the FPGA by writing this information onto the chip.
Floating Point Arithmetic
3.3 Floating Point Number Representation
Floating point algorithms are used frequently in modern applications such as speech recognition, image processing and engineering because of its ability to represent a approximation to the real numbers. The advantage of floating point is that precision is always maintained with a wide dynamic range, where fixed point numbers loose precision. In the hardware floating point takes up almost 3X the hardware of fixed-point math.
The IEEE 754 (32 and 64 bit) and IEEE-854 (variable width) specification of the floating point standard has been widely accepted for representing floating point numbers. The floating point arithmetic, including addition, subtraction and multiplication can be retained the same even if the platform of the computation is changed
Floating-point numbers are well defined by IEEE-754 and IEEE-854 Floating point has been used in processors and IP for years and is a well-understood format. There are many concepts in floating point that make it different from our common signed and unsigned number notations [19]. This is a sign magnitude system, where the sign is processed differently from the magnitude.
IEEE-754, 32-bit floating-point number comprises a sign bit (+ or -), a exponent, and a fraction as represented below
S EEEEEE FFFFFFFFFFFFFFFFFFFFFF
31 30 25 24 0
+/- Exp Fraction
For example, a 32-bit floating-point
0 10000001 101000000000000000000000
To convert a floating-point number into an integer, the following equation can be used:
S * 2 ** (exponent - exponent_base) * (1.0 + Fraction/fraction_base)
Where the "exponent_base" is 2**((maximum exponent/2)-1), and "Fraction_base" the maximum possible fraction (unsigned) plus one.
+1 * 2** (129 - 127) * (1.0 + 10485760/16777216) = +1 * 4.0 * 1.625 = 6.5.
In order to use floating point representation floating point VHDL packages files should be compiled first, then the rest of code. These packages have been designed for use in VHDL-2008, where they will be part of the IEEE library [19]. A version of the packages is provided that works with VHDL-93 and is synthesizable.
The floating point VHDL packages needed are [19] "fixed_float_types.vhdl","float_generic_pkg.vhdl", "float_generic_pkg-body.vhdl" "float_pkg.vhdl". "fixed_pkg.vhdl" "fixed_generic_pkg.vhdl" and, "fixed_generic_pkg-body.vhdl".
CHAPTER IV
RESULTS
The proposed architecture is modeled in Very High Speed Integrated Circuit Hardware Description Language (VHDL). The VHDL model is simulated and, synthesized. Synthesized is targeted for Virtex6/Stratix III FPGA device from Xilinx/Altera. The performance in terms of area and speed is summarized, in order to show the performance of the proposed architecture.
FPGAs Device Utilization Summary
Logic Utilization
Used
Available
Utilization
Xilinx Vertex 6 FPGA
Number of Slice LUTs
11455
46560
24%
Number of fully used LUT-FF pairs
0
11455
0%
Number of bonded IOBs
336
240
140%
Number of DSP Elements
24
288
8%
Altera Stratix III FPGA
Number of Slice LUTs
17,592
38,000
46 %
Number of fully used LUT-FF pairs
0
19,000
0%
Number of bonded IOBs
336
488
69 %
Number of DSP Elements
32
216
15 %
Figure 11: Simulation output
Figure 11: Simulation output
Figure 12: Schematic Luts
Conclusion
In the work presented herein,
Simulation and Synthesis of FPGA implementation have been realized.
The VHDL code written is reusable and adjustable to the size of the HIS data
Created a high performance, Real-time High throughput,
Floating point, High accuracy
General architecture Extendable to a generic FPGA core
Re-targetable to ASIC technology