Hardware Architecture Of Real Time Remote Information Technology Essay

Published: November 30, 2015 Words: 4258

The modern Field programmable Gate Arrays (FPGA) abundant hardware resources, flexibility, scalability and improved performance in create a promising solution for on-board high performance, low power computational platforms for real time remote sensing hyperspectral image. This work is focused on developing a target detection architecture based on Reed-Xiaoli (RX) algorithm on FPGAs for real-time target detection using hyperspectral data.

The Reed-Xiaoli (RX) Floating point architecture is modeled in Very High Speed Integrated Circuit Hardware Description Language (VHDL). The VHDL model is simulated and, synthesized for FPGA or Application Specific integrated Circuit (ASIC) devices. The hardware implementation gives massive parallel pipeline realization of algorithm and, the FPGA allows fast IC prototyping and low cost Modifications. The processing time provided by the FPGA hardware implementation is significantly simplified and much higher than that performed by the onboard multiple microprocessors using software while reducing power usage.

ACKNOWLEDGEMENT

I'm gratitude to my graduate advisor, for his illuminating advice, support and constant encouragement in completing thesis. I would also like to thank my friends for their valuable suggestions in completing my thesis work.

TABLE OF CONTENTS

Page

ABSTRACT iii

ACKNOWLEDGEMENTS iv

TABLE OF CONTENTS v

LIST OF FIGURES vii

LIST OF TABLES viii

CHAPTER I INTRODUCTION 1

Hyperspectral Image

Related work

Application

CHAPTER II SYSTEM OVERVIEW 3

2.1 Real Time 3

2.2 Detection Algorithm

2.3 Hardware Architecture of BIP 4

2.4 Hardware realization of Matrix Multiplication 5

CHAPTER III FPGA IMPLEMENTATION

3.1 FPGA: Field Programming Gate Array

3.2 FPGA Design Flow

3.3 Floating Point Representation

CHAPTER IV RESULTS 15

REFERENCES 20

APPENDIX A Matlab Code and Output

APPENDIX B VHDL 37

APPENDIX C Synthesis report for vertex 6 FPGA

APPENDIX D Synthesis report for Startix III FPGA

VITA

LIST OF FIGURES

Page

Figure 1:Real-Time System Detection

Figure 2: Architecture for BIP 7

Figure 3: Pipelined operation 9

Figure 4: Parallel Architecture 12

Figure 5: Block Diagram Rx Algorithm

Figure 6: Simulation Environment

Figure 7: Compilation and synthesized tool

Figure 8: Floor plan

Figure 9: Timing Analysis

Figure 10: configure the LEs

Figure 11: Simulation output 21

Figure 12: Schematic Luts

LIST OF TABLES

Page

Table 1: Vertex 6 FPGA Device Summary

Table 2: FPGAs Device Utilization Summary 5

CHAPTER I

INTRODUCTION

The "hyper" in the term hyperspectral conveys "over" as in "too many" and pertains to the huge number of measured wavelength bands. Compared to the information extracted by other type of remotely sensed data, hyperspectral imagery gives the potential for more accurate and elaborate information. Hyperspectral images offer ample spectral information to recognize and identify spectrally unique materials. Hence they are spectrally over determined. [7].

Traditionally, hyperspectral images are downlinked to ground stations, where data are calculated and examined. As a result, a long turnaround time is obtained, which is a major bottleneck in remote sensing applications. To yield immediate products to assist fast decision-making, real-time processing and analysis are essential. It is extremely desirable that onboard processing can be accomplished. Real-time onboard implementation is greatly required to influence immediate decision-making in vital circumstances. Hence, in this research, real-time data analysis is started immediately after data are received onboard. Thus, intermediate results are available before data collection is finished [1].

The growing demand for real-time processing has been found in signal and image processing fields such as the internet communications, web browsing, telemedicine, etc. In various real-time applications, practical implementation has the benefit of computing data online and renders timely analysis to determine and resolve acute situations. Various algorithms have been created earlier for hyperspectral or multispectral image processing. Due to recent advances of remote sensing instrument technology in applications, images gained by high resolution airborne sensors, like the 224-band airborne visible or infrared imaging spectrometer (AVIRIS) operated by NASA's Jet Propulsion Laboratory, Pasadena, CA, and the 210-band hyperspectral digital imagery collection experiment (HYDICE), can now enlarge applications to law enforcement, battlefield, reconnaissance and surveillance, environmental monitoring, disaster and damage control, etc., where real-time processing becomes essential and provides quick assessment [16].

Numerous designers are currently carrying out various research projects whose goal is the growth of specific circuits for the static and dynamic image processing. These developments are realized in devices such as Field Programmable Gate Array (FPGA) or ASIC, usually by mean of standard hardware description languages such as VHDL [4].

VHDL allows the use a single language throughout all the process of design. But, only this is not enough to do a successfully project. Complex algorithms used to process images and image sequences make necessary to do simulations of his operation to verify the fulfillment of the specifications under which it has been designed [5-6].

Modern FPGAs can accommodate multimillion gates on a single chip. During the last decade, the logic density, functionality and speed of FPGA have improved considerably. Modern FPGAs are now capable of running at speed beyond 500 MHz [8]. FPGAs have a potential for dynamic reconfiguration [9]; that is, reprogramming part of the device at run time so that resources can be reused through time multiplexing. This is an important feature of FPGAs

The contributions of this research thesis is that the proposed design will be providing a better speed and also it requires less area to diminish computation time such that hardware implementation is substantially simplified, as favored for onboard processing.

1.1 Hyperspectral Image

Analyze some fundamental spectral remote sensing concepts initially to understand the benefits of hyperspectral imagery. It may be recalled here that each photon of light has a wavelength defined by its energy level. Light and various other forms of electromagnetic radiation are normally described in terms of their wavelengths. For instance, visible light has wavelengths in the range 0.4 and 0.7 microns, but radio waves have wavelengths larger than about 30 cm [7].

A reflectance spectrum depicts the reflectance of a material measured across a range of wavelengths. Reflectance is the percentage of the light hitting a material that is then reflected by that material.

Certain materials will absorb some wavelengths of light, while other materials will reflect the same wavelengths. These patterns of absorption and reflectance over wavelengths can uniquely distinguish some materials. The resulting spectra appear to be uninterrupted and form continuous curves. If a spectrometer is utilized in an imaging sensor, the resulting images capture a reflectance spectrum for every pixel in the image [7].

Hyperspectral imaging sensors, also called imaging spectrometer are basically advanced digital color cameras with fine spectral resolution at given wavelengths of illumination. Instead of measuring three primary colors-red, green, and blue-these sensors measure the radiation reflected by each pixel at a huge number of invisible or visible frequency (or wavelength) bands [15].

The spectral radiance information in a scene to detect target objects, vehicles, and camouflage in open areas, shadows, and tree lines is measured by hyperspectral imaging sensors. Imaging sensors on satellites or aircraft gather this spectral information, which is a combination of sunlight (the most common form of illumination in the visible and near infrared), atmospheric attenuation, and object spectral signature. The sensor measures the intensity of the energy in different parts of the spectrum and also detects the energy reflected by surface materials. A hyperspectral data set is obtained by processing this information. Each pixel in this data set contains a high-resolution spectrum which is used to identify the materials present in the pixel by an analysis of reflectance or emissivity [15]. The simultaneous spatial and spectral character of the data can be visualized as data cube a set of spectra (left), each for a single pixel, or as a stack of images (right), each for a single spectral channel.

Related Work

Three performance metrics to evaluate FPGA based designs are: speed, area, and power (energy). Extensive previous work has been done towards the design and realization of an FPGA based system

The design and implementation of a high performance, completely parallel matrix multiplier core was presented by Belkacemi et al. [10]. For the first time, partial reconfigurability feature was exploited for the computation of matrix multiplication by Jianwen et al. in [12]. The possibility of changing the design implementation without stopping the whole execution process is offered by partially reconfigurable devices. The matrix multiplier was implemented in Xilinx Virtex-II device, which supports partial reconfiguration. The design was evaluated in terms of latency and area and it was found that area is reduced compared to [10] and the performance further improves for larger matrices.

Major amount of power is consumed by the programmable interconnects, while the remaining power is consumed by the clocking, logic, and I/O blocks in FPGA devices. Resource utilization and switching activity are the other sources of power dissipation in FPGAs. Conventionally, speed and area are the performance metrics for FPGA based designs. With the development of portable devices, it has become increasingly important that the systems are also energy efficient and consume less power. [11].

1.3 Application Domains

The algorithm used is expressed as a series of operation in many existing applications. Some of these applications include image and signal processing which are summarized as follows

To detect and map an extensive variety of materials having charateristic reflectance spectra, hyperspectral imagery has been applied. For instance, vegetation scientists have successfully used hyperspectral imagery to identify vegetation species. Also geologists use hyperspectral images for mineral mapping and to detect soil properties including moisture, organic content, and salinity. [7].

Military personnel have used hyperspectral imagery to notice military vehicles under partial vegetation canopy, study plant canopy chemistry and detect vegetation stress and other varied military target detection objectives [7].

New sensors provide more hyperspectral imagery and new image processing algorithms continue to be developed. Hence, in future hyperspectral imagery is going to become one of the wide known research, exploration and monitoring technologies which will be used in broader areas. Compared to the information obtained earlier, hyperspectral sensors and analyses have provided more information from remotely sensed imagery.

CHAPTER II

SYSTEM OVERVIEW

This chapter provides an overview of real-time implementation of hyperspectral image, the Rx target detection algorithm used for real-time implementation, then a realization of Hardware Architecture and its internal operation.

2.1 Real-time implementation

In remote sensing of hyperspectral image, real-time implementation is required to overcome the disadvantage of time-consuming round about time resulted in the traditional method, where hyperspectral images are sent to ground station for processing. In order to reduce the computation of data processing time in real-time implementation in our research

Research is focused on detection and classification, and the algorithms we implement in real-time include the computation of data correlation matrix R or covariance matrix and their inverses, or . Therefore, their real-time implementation becomes how to update the or as pixels being received. The system for fast real-time detection or classification is summarized as follows[1]. The block diagram is shown in Figure 1

Figure 1: Real-time onboard detection or classification pipeline stages [1]

In our research, a hyperspectral image is acquired from left to right and from top to bottom. An image has N pixels and L spectral bands. Let a pixel vector be denoted as x, and the entire data matrix X = [x1, x2,…, xN]. The target to be detected is denoted as d; if the p targets to be classified are known, then D = [d1, d2,…, dp]. Input image Xt is received at time t.

A sample Correlation matrix R can be defined as

Where,

R: Correlation matrix

N: No of pixels

X: Data matrix

A is initiated using a small random matrix of size LxL for the BIP format. Then detector or classifier is constructed using the algorithm, the detection or classification result for the pixels that have been received up to time t is provided. Three real-time processing fashions exit to fit the three remote sensing data formats: pixel-by-pixel processing for BIP format, line-by-line processing for BIL format, and band-by-band processing for BSQ format. In the pixel- by- pixel fashion, a pixel vector is processed right after it is received and the analysis result is generated within an acceptable delay; in the line-by-line fashion, line of pixel vectors is processed after the entire line is received. Replace the data covariance matrix with the data correlation matrix R to speed up the process and simplify hardware implementation[1].

2.2 Detection Algorithm

If the desired target is unknown, then the pixel itself XT acts as the term to be matched, it is then ideal to use well-known Reed-Xiaoli (RX) algorithm for anomaly detection[1].

The focus is on this algorithm because of the following reasons:

They can outperform other existing algorithms due to their excellent performance in background suppression.

They are suitable to remote sensing images in an unknown circumstance since they require least prior information.

They can be easily implemented in real-time with simple and similar hardware architecture.

Reed-Xiaoli (RX) algorithm

The well-known RX algorithm is a detector for anomalies, defined as pixels whose spectral signatures are different from their surroundings. It is given by

Where,

R: Correlation matrix

R-1: Inverse of Correlation matrix

: Correlation matrix without being normalized by the number of pixels

Xt: Data matrix

Thus, in order to implement the detection or classification algorithms in real-time, it depends on the adaptation of R-1. In other words, R-1 at time t should be quickly calculated by updating the previous R-1 at time t - 1 using the data received at time t, without completely recalculating the R and R-1. As a result, the intermediate data analysis result (e.g., target detection or classification) is available in support of decision-making even when the entire data set is not received; and when the entire data set is received, the data analysis result is finalized. Remote sensing image usually has high-spatial correlation; using a small part of pixels can still capture the major data statistics[1].

2.3 Hardware architecture BIP format

A parallel pipeline hardware realization of where data are arriving one pixel vector at the time is illustrated in Figure 2. In this module is updated each time a new selected pixel vector is processed. Parallel Multiply Accumulators calculate result of the matrix vector multiplication, which is pipelined to calculate the required scalar inner product and outer vector product matrix [1]:

Figure 2: Architecture for BIP [1]

Here in (1) is a scalar. Hence no matrix inversion is involved in each adaptation. Various design techniques and architectures were considered to optimize the performance. Image processing applications for real-time usually works in a pipeline form [6]. As shown in Figure 2, parallel array architecture is used for hardware realization. The proposed parallel architecture consists of identical processing elements (PEs) and the number of PE depends on the size of the matrices. Each PE performs the necessary multiply accumulate (MAC) operation. Each PE operates independently with connection only to the input and output ports. This greatly helps in reducing the interconnection between the PEs and as a result the hardware resource utilization is minimized [13].

2.4 Hardware Realization

Various design techniques and architectures were considered to optimize the performance of the matrix multiplier. The nature of matrix multiplication is such that they are perfectly suited to parallel exploitation of the matrix multiplication C = AÃ-B of two matrices A and B is conformable, if the number of columns of A is equal to the number of rows of B[15]. Matrix multiplication is based on the in equation. In the shown equation i, j, and k are the loop indices. The loop body consists of a single recurrence given by equation

C [i,j] = C[i,j] + A[i,k] Ã- B[k,j]

Where, A [i,k] and B [k,j] are input variables and their values are needed to execute this loop. C [i,j] is an output variable .

Pipelined Operation

To improve the performance, both pipelining and parallel processing techniques are used. In high speed, massive throughput is a requirement and this is achieved by using pipelined architecture that computes the product of two matrices. Pipelining is achieved by inserting registers at appropriate places as shown in Figure 3.

Figure 3: Pipelined operation.

Parallel Processing

Parallel processing is exploited in the design, where the matrix multiplication operation is divided into several smaller operations that are executed in parallel and finally the output is obtained by accumulating all the partial results generated from these parallel processes. This results in higher frequency of operation [1] [13].

Figure 4: Parallel Architecture.

Parallel array architecture as shown in Fig. 4 is used. The parallel array is similar to traditional systolic array except with slight modification. The parallel architecture consists of identical processing elements (PEs) and the number of PE depends on the size of the matrices. Each PE performs the necessary multiply accumulate (MAC) operation. Each PE operates independently with connection only to the input and output ports. This greatly helps in reducing the interconnection between the PEs and as a result the hardware resource utilization is minimized [23].

CHAPTER III

FPGA IMPLEMENTATION

This chapter provides information's to Field Programmable Gate Array (FPGA) technology and a development hardware description languages - VHDL. Then the chapter reviews construction of floating point arithmetic and implementation.

In order to take advantage of the computational specialization and the inherent parallelism, a proper platform for the implementation is the FPGA, applications that permit parallelization. FPGAs can be used as dedicated computers in order to perform certain computations at very high frequencies and alternative to custom ASIC technology.

3.1 FPGA: Field Programmable Gate Array

Field Programmable Gate Arrays (FPGA) is integrated circuits where the functionality can be modified in the field after the fabrication. Therefore, FPGA can be customized for different application as long as the device itself is complex enough to store the logic.

FPGAs contain many built-in system-level blocks. These features allow logic designers to build the highest levels of performance and functionality into their FPGA-based systems. FPGAs are a programmable alternative to custom ASIC technology. FPGAs offer the best solution for addressing the needs of high-performance logic designers, high-performance DSP designers, and high-performance embedded systems designers with unprecedented logic, DSP, connectivity, and soft microprocessor capabilities.

Advantages of Using FPGAs

Fully programmable alternative to a customized chip

Used to implement functions in hardware

Hardwired logic is very fast

Can interface to outside world

Custom hardware/peripherals

Custom co/processors

Can perform bit-level operations not suited for traditional CPU/MPU

In this research, the Xilinx Virtex 6 and , Altera Startix III is used.

The Virtex®-6 families provides the newest, most advanced features in the FPGA market. Built on a 40 nm state-of-the art copper process technology, A single Virtex-6 FPGA CLB comprises two slices, with each containing four 6-input LUTs and eight Flip-Flops (twice the number found in a Virtex-4 FPGA slice), for a total of eight 6-LUTs and 16 Flip-Flops per CLB. Each DSP48E1 slice contains a 25 x 18 multiplier, an adder, and an accumulator. Block RAMs are fundamentally 36 Kbits in size. Each block can also be used as two independent 18 Kb blocks.

Table 1: Vertex - 6 FPGA Summary

3.2 FPGA Design Flow

Several steps are necessary for implementing customized functions on FPGA chips.

They are

• Design Entry - the desired circuit is specified either by using a hardware description language, such as VHDL, or by means of a schematic diagram

Figure 5: Block Diagram Rx Algorithm

VHDL: The language

To construct the datapath, a schematic approach can be used for simple design but it may not be practical to implement some real life applications which often involve thousands of logic gates. Therefore, it is necessary to use a hardware description language such as VHDL [IEE02] when implementing complex logic on the hardware or FPGA. Even though the programming language and hardware description language share some properties like variables versus signal, the nature of hardware description language is totally different from programming language.

Hardware description language, as the name suggests, is used to describe the hardware functionality. Unlike normal programming languages, hardware description languages may run several operations in parallel and explicit specification of the timing is required to make the design work.

• Functional Simulation - the synthesized circuit is tested to verify its functional correctness; the simulation does not take into account any timing issues. A VHDL design is simulated after it has been compiled. The simulator may then be invoked with the name of the configuration or entity/architecture pair

Figure 6: Simulation Environment

• Compilation and Synthesis - the CAD Synthesis tool synthesizes the circuit into a netlist that gives the logic elements (LEs) needed to realize the circuit and the connections between the Les

The VHDL code is processed by several synthesis tools that analyze the code and generate an implementation of it for the FPGAs or the target chip. These tools are controlled by the application program called the Compiler

Figure 7: Compilation and synthesized tool

The synthesis Report provides a lot of information that may be of interest to the designer, such as the speed, area of the implemented circuit. The Synthesis tool performs analysis to determine the expected performance of the circuit.

• Fitting - the CAD Fitter tool determines the placement of the LEs defined in the netlist into the LEs in an actual FPGA chip; it also chooses routing wires in the chip to make the required connections between specific Les

Figure 8: Floor plan

• Timing Simulation - the fitted circuit is tested to verify both its functional correctness and timing. A good measure of the speed is the maximum frequency at which the circuit can be clocked, referred to as fmax. This measure depends on the longest delay along any path between two registers clocked by the same clock.

Figure 9: Timing Analysis

• Programming and Configuration - the designed circuit is implemented in a physical FPGA chip by programming the configuration switches that configure the LEs and establish the required wiring connections

Figure 10: Configure the LEs

In order to program the FPGA, a bit stream is generated by the design tool. The VHDL code will be synthesized into a netlist. The netlist will contain the representation of the hardware such as the function of each basic blocks and the connection between the blocks. The design tool will extract the information in the netlist and map the logical blocks and connect to the specific lookup table. Bit stream will customize the functionality of the FPGA by writing this information onto the chip.

Floating Point Arithmetic

3.3 Floating Point Number Representation

Floating point algorithms are used frequently in modern applications such as speech recognition, image processing and engineering because of its ability to represent a approximation to the real numbers. The advantage of floating point is that precision is always maintained with a wide dynamic range, where fixed point numbers loose precision. In the hardware floating point takes up almost 3X the hardware of fixed-point math.

The IEEE 754 (32 and 64 bit) and IEEE-854 (variable width) specification of the floating point standard has been widely accepted for representing floating point numbers. The floating point arithmetic, including addition, subtraction and multiplication can be retained the same even if the platform of the computation is changed

Floating-point numbers are well defined by IEEE-754 and IEEE-854 Floating point has been used in processors and IP for years and is a well-understood format. There are many concepts in floating point that make it different from our common signed and unsigned number notations [19]. This is a sign magnitude system, where the sign is processed differently from the magnitude.

IEEE-754, 32-bit floating-point number comprises a sign bit (+ or -), a exponent, and a fraction as represented below

S EEEEEE FFFFFFFFFFFFFFFFFFFFFF

31 30 25 24 0

+/- Exp Fraction

For example, a 32-bit floating-point

0 10000001 101000000000000000000000

To convert a floating-point number into an integer, the following equation can be used:

S * 2 ** (exponent - exponent_base) * (1.0 + Fraction/fraction_base)

Where the "exponent_base" is 2**((maximum exponent/2)-1), and "Fraction_base" the maximum possible fraction (unsigned) plus one.

+1 * 2** (129 - 127) * (1.0 + 10485760/16777216) = +1 * 4.0 * 1.625 = 6.5.

In order to use floating point representation floating point VHDL packages files should be compiled first, then the rest of code. These packages have been designed for use in VHDL-2008, where they will be part of the IEEE library [19]. A version of the packages is provided that works with VHDL-93 and is synthesizable.

The floating point VHDL packages needed are [19] "fixed_float_types.vhdl","float_generic_pkg.vhdl", "float_generic_pkg-body.vhdl" "float_pkg.vhdl". "fixed_pkg.vhdl" "fixed_generic_pkg.vhdl" and, "fixed_generic_pkg-body.vhdl".

CHAPTER IV

RESULTS

The proposed architecture is modeled in Very High Speed Integrated Circuit Hardware Description Language (VHDL). The VHDL model is simulated and, synthesized. Synthesized is targeted for Virtex6/Stratix III FPGA device from Xilinx/Altera. The performance in terms of area and speed is summarized, in order to show the performance of the proposed architecture.

FPGAs Device Utilization Summary

Logic Utilization

Used

Available

Utilization

Xilinx Vertex 6 FPGA

Number of Slice LUTs

11455

46560

24%

Number of fully used LUT-FF pairs

0

11455

0%

Number of bonded IOBs

336

240

140%

Number of DSP Elements

24

288

8%

Altera Stratix III FPGA

Number of Slice LUTs

17,592

38,000

46 %

Number of fully used LUT-FF pairs

0

19,000

0%

Number of bonded IOBs

336

488

69 %

Number of DSP Elements

32

216

15 %

Figure 11: Simulation output

Figure 11: Simulation output

Figure 12: Schematic Luts

Conclusion

In the work presented herein,

Simulation and Synthesis of FPGA implementation have been realized.

The VHDL code written is reusable and adjustable to the size of the HIS data

Created a high performance, Real-time High throughput,

Floating point, High accuracy

General architecture Extendable to a generic FPGA core

Re-targetable to ASIC technology