Software Testing Techniques By Using Genetic Algorithms Biology Essay

Published: November 2, 2015 Words: 4971

This chapter presents the study of optimization of software testing techniques by using Genetic Algorithms (GAs) and specification based testing. Some new categories of genetic codes are applied in some problem optimizations for the generation of reliable software test cases based on the specification of the software. These GAs have found their application in detecting errors in the software packages. For example, based on new genetic strategy and GAs symmetric code is developed. In the current chapter, some key definitions of genetic transformation have been used viz. crossover, mutation, and selection. Some of our research shows that genetic techniques have very important influence on the performance of software test cases. This chapter is organized into three parts: part I describes the functionality of GAs, part II presents the usage of GAs in specification based software testing to the alternatives of existing software testing techniques, part III discusses the implementation of GAs using MATLAB for the generation of optimized test cases.

If someone is solving a problem, he usually looks for some solution, which will be the best among others. The all-feasible solutions are called search space. Every point in the search space provides one possible solution. Each solution is "marked" by its fitness. With GA, one seems for the best solution among a number of possible solutions. Finding a solution is then equal to looking for some extreme value i.e. minimum or maximum. The search space at that time may be well defined, but usually one knows only a few points in the search space. In the genetic algorithm, finding solutions generates other possible solutions as evolution proceeds.

The problem is that the search can be very complicated. One must not know where to look for a solution or where to start. There exists many methods for finding a suitable solution, but these methods do not necessarily provide the best solution. Few of these methods are hill climbing, tabu search, simulated annealing, and the genetic algorithm. The solutions these methods are often considered as good solutions, because it is not often possible to prove what the optimum is.

One example of a class of problems, which cannot be solved in the "traditional" way, is NP (Nondeterministic Polynomial) problems. There exist numbers of applications for which it may apply fast (polynomial) algorithms. However, some problems cannot be solved algorithmically. This led to NP-complete problems. NP stands for nondeterministic polynomial and means to "guess" the solution and then check it.

If one had a guessing machine, he might be able to find a solution in some reasonable time. Studying of NP-complete problems is, for simplicity, restricted to the problems where the answer can be Yes/No. Tasks with complicated outputs, a class of problems introduced NP-hard problems. This class not limited as class of NP-complete problems. A characteristic of NP-problems is that a simple algorithm, perhaps obvious at a first sight, can be used to find usable solutions. But this approach provides many possible solutions just trying all possible solutions is very slow process (e.g. O (2^n)). For even slightly bigger instances of these type of problems this approach is not usable at all.

Today nobody knows if some faster algorithm exists to provide exact answers to NP-problems. The discovery of such algorithms remains a big task for researchers. Today many people think that such algorithms do not exist and so they are looking for an alternative method. An example of an alternative method is the genetic algorithm. Examples of the NP problems are satisfiability problem, travelling salesperson problem or knapsack problem.

4.2. Biological Background

4.2.1. Chromosome

All living organisms made up of cells. In each cell, there is set of chromosomes. Chromosomes are strings of DNA (deoxyribonucleic acid) and serve as a model for the whole organism. A chromosome is a complicated and long thread of DNA. Each feature is oblique by combination of DNA. There are four bases of DNA: A (Adenine), C (Cytosine), T (Thymine) and G (Guanine). Significant combinations of these bases create specific instructions to the cell. Each gene instructs the protein i.e. each gene encodes an attribute (e.g., color of eyes). Every gene in the chromosome has its own position and known as locus. Set of all chromosomes known as genome and a particular set of genes in genome is represents genotype.

4.2.2. Reproduction

For reproduction, first crossover takes place. Parents Genes combine to form an entirely new chromosome. The fitness of an organism is measured by survival of the organism in its life. Changes may occur during reproduction. The chromosomes (by a process called crossover) from the parents are randomly exchanged. Therefore, the offspring show some qualities of the father and some of the mother. The newly generated offspring then may be mutated. Mutation means change a bit of the elements of DNA. Mutation also changes some behavior. Sometimes during copying of chromosomes, an error may occur. Suppose if the parent cell may have -A-C-G-C-T- but a mishap may occur and changes the new cell to -A-C-T-C-T-. Usually this result in an absurd word and these cells does not survive. However, more than millions of years, sometimes this change produces a more beautiful traits, thus producing a better kind.

4.2.3. Evolution

A number of inventions learned and inspired by nature (e.g. artificial neural networks). Genetic Algorithms (GAs) also another example of this. GAs explore by simulating evolution, starting with an initial set of solutions (population) or hypotheses, and generates consecutive "generations" of solutions. The main idea behind this is survival of the fittest.

4.2.4. Natural Selection

In nature, the being that has better survival qualities will survive for a longer period. Again, provides it a better chance to produce new offspring with its genetic material. Therefore, the entire population will have lots of genes from the superior individuals and few from the inferior individuals after a specified period. As a result, the fittest being survived and the unfit traits died out. This power of nature is known as natural selection.

For simulating the process of natural selection in a computer, it requires representation of an individual, encoding and fitness function.

4.3. Representation of an individual

During the searching process, a "generation" of "individuals" must be maintained with each individual representing the "genetic structure" of a all promising solution or hypothesis. Similar to chromosome, the genetic structure of an individual depicted with a fixed and finite alphabet. In GAs, {0, 1} is frequently used. The string of {0, 1} is understood as a solution to the problem that one trying to solve.

4.3.1. Encoding

Encoding heavily depends on the problem. There exist different types of encoding schemes:

Binary encoding

Permutation encoding

Value encoding

Tree encoding

Binary Encoding

Most commonly used encoding is binary, because of, the first research of Genetic algorithm use this encoding scheme and of its relative simplicity. Every chromosome represented with a string of bits - 0 or 1 in binary encoding. For example

Chromosome A

101101111100011011100101

Chromosome B

111111100000110000011111

Figure 4.1.Example binary encoding

Binary encoding offers a numbers of possible chromosomes even with a small number of alleles. However, binary encoding is not habitually natural for many problems and sometimes requires corrections after crossover and/or mutation.

Permutation Encoding

Permutation encoding mostly used in ordering problems, like traveling salesman problem etc. Every chromosome in permutation encoding, is a string of numbers, which represents a position in a sequence. For example

Chromosomes

Representation

A

1 5 3 2 6 4 7 9 8

B

8 5 1 3 6 7 2 4 9

Figure 4.2 Chromosomes with permutation encoding

Permutation encoding is highly useful for ordering problems. Sometimes crossover and mutation corrections may be made to leave the chromosome consistent.

Value Encoding

In the problems where real numbers are used, direct value encoding can be used. In the value encoding, each chromosome is a sequence of some values such as (real) numbers, chars or any objects. For example

Chromosome

Representation

A

3.6789,1.2324 5.3243 0.4556 2.3293 2.4545

B

BCHDTRUIMBJHIDIERJFDLDFLFEGT

C

(back), (back), (right), (forward), (left),(up)

Figure 4.3 Representation of Chromosomes with value encoding

Tree Encoding

For genetic programming, tree encoding is used. In the representation of tree encoding, every chromosome is a tree of some objects, likes commands or functions in programming language.

Chromosome A

Chromosome B

( * A ( + X Y ) )

( Repeat Update stop )

Figure 4.4 Chromosomes with tree encoding

Tree encoding is useful for evolving programs, functions or any other structures. Mostly LISP Programming language is used for this. As programs in LISP are directly represented in the form of tree, so the crossover and mutation can be done comparatively easily.

4.4. Fitness function

For a given individual, tester must evaluate how good a solution it is so that he can rank individuals. Fitness function mostly a real number. For example, if an individuals is represented as a length-30 binary number. Then individual as an integer i will be in the range 0 to 230 - 1. As a result, possible fitness function is given as Fitness (i) = (i/230 - 1)10. This value lies between 0 and 1 and is monotonically increasing. Fitness functions may not be monotonic but frequently have multiple local maxima.

4.5. Reproduction methods in Genetic Algorithm

Mutation and crossover are the two basic methods of reproduction.

4.5.1. Mutation

Mutation is the randomly change in one or more digits that representing an individual. For example,

Chromosome A =1 3 5 6 2 then after mutation, chromosome may be 1 3 8 6 2 or 1 4 5 6 2 giving two new offspring. How frequently to do mutation, how many digits changes and how much change to make are adjustable parameters.

Mutation in binary encoding

Selected bits are inverted.

Figure 4.5 Mutation in binary encoding

Mutation in Permutation encoding

For mutation in permutation encoding order of bits are changed i.e. two bits are selected and exchanged.

[1 2 3 4 5 6 7 8 9] => [1 2 3 4 9 6 7 8 5]

Mutation in value encoding

For real value encoding mutation is done through adding/subtracting a small number to selected values.

[1.29 5.68 2.86 4.11 5.55] => [1.29 5.68 2.73 4.22 5.55]

Mutation in Tree encoding

Mutation in the tree-encoding scheme changes either operator or number at selected nodes.

Figure 4.6 Mutation in Tree encoding

4.5.2. Crossover

Choose one or more pairs of parents and swap segments of the parents. Crossover may be 1-point if one crossover point is selected. Similarly, 2-point crossover, which selects two points in each parents defining three intervals; the middle intervals are exchanged to produce the two offspring. The rate of crossover, the number of crossover points, and the positions of the crossover points and the number of parent pairs are changeable parameters.

Crossover for Binary and Value Encoding

Single point crossover - only one crossover point is selected, string from the beginning of the chromosome to the crossover point is copied from the first parent, and the rest is copied from the second parent.

00111001+10110010 = 10111001

Figure 4.7 Crossover for Binary and Value Encoding

Two-point crossover - In two point crossover, two points are selected, binary string from the beginning to the first crossover point is copied from the first parent, first to the second crossover point string is copied from the second parent and the left is copied from the first parent again.

00111001+10110010 = 00110001

Figure 4.8 Two-point crossovers

Uniform crossover - In this type, bits are arbitrarily copied from the first or from the second parent to generate a new offspring.

00111001+10110010 = 10110010

Figure 4.9 Uniform crossover

Arithmetic crossover - some arithmetic operation is applied to the parents to make a new offspring.

00111001+10110010 = 10111011

Figure 4.10 Arithmetic crossover

(OR) Arithmetic crossover works enormous for vectors, but fails to permutations.

Crossover in Permutation Encoding

For Permutation Encoding only single point crossover is applicable. After selecting one crossover point, the permutation is copied from the first parent upto the crossover point, then the other parent is selected and if the number is not yet in the offspring, it is added. For example

(1 2 3 4 5 6 7 8 9) + (4 5 3 6 8 9 7 2 1) = (1 2 3 4 5 6 8 9 7)

Crossover for Tree Encoding

In tree encoding again one crossover point is selected in both parents, parents are divided in that point and the parts below crossover points are exchanged to produce new offspring.

Figure 4.11 Crossovers for Tree Encoding

4.6. Selection criteria

Fitter individuals are selected from a population of individuals, to give the better chance to survive to the next generation. It is not easy to use the simple criteria "keep the best n individuals". It is fount out that nature never kills all the unfit genes. So they typically become recessive for a long period. As a result, they may mutate or crossover to something useful. Therefore, there is always a tradeoff for better individuals and diversity.

One major problem that associated with the selection method is called crowding. Crowding happens when the highly fit individuals quickly reproduce so that a maximum percentage of the whole population looks very similar. This can reduce diversity in the population and may delay the long-run progress of the algorithm.

4.6.1. Deterministic

In deterministic selection, only the best survive individuals are selected. Two types of deterministic selection techniques are most commonly used first embraces parents in determining the best solutions, and second one replaces all parents with offspring. Deterministic selection depends heavily on converges and evaluation function, which is the fastest of all methods.

4.6.2. Proportional Fitness

In the proportional fitness, parents are selected according to their fitness. According to the chromosomes fitness they get more chances to be selected. Consider a roulette wheel where all the chromosomes in the population are placed. The area in the roulette wheel is proportional to the value of the fitness function of each chromosome - the bigger the value of fitness functions, the larger the area is. See the following picture of five chromosomes for an example.

Chromosome is selected where thrown marble in the roulette wheel is stops. Obviously, the chromosomes with highest fitness value will be selected more times. This process Proportional Fitness can be described by the following algorithm.

Compute the sum (S) of all chromosome fitness's in population.

Generate random number(r) from the interval 0 to sum (S).

Repeat this by go through the population and sum the fitness's from zero to sum(S). If the sum(S) becomes greater then random number (r), stop and return the chromosome where you are.

Naturally, step 1 is performed only once for every population.

4.6.3. Rank Selection

In place of taking the best, individuals can be selected proportionally to their evaluation score. Suppose tester has the following population:

Individual

Score

A

24

B

10

C

9

D

7

The sum of their scores is 50. Individual A gets 48% chance of being selected, individual B 20%, individual C 18%, and individual D 14%. First, implement this with "roulette wheel selection". Pick a random number between 0 and 1. Then gradually add on the probabilities of each individual in order, until this sum is greater than the random number. The roulette wheel selection will have problems when there are big differences between the fitness values i.e. the best chromosome fitness is 80% of the sum of all fitness's, therefore, the other chromosomes will gets very few chances to be selected.

Rank selection provides ranks to the population first and then each chromosome receives fitness value calculated by this ranking. The worst chromosome gets the fitness 1, the second worst chromosome gets 2 etc., and the best one will have fitness N.

This can be seen in below picture i.e. how the state changes after change in fitness to the numbers determined by the ranking.

Site before ranking (graph of fitness's)

Situation after ranking (graph of order numbers)

After ranking, all the chromosomes have a chance to be selected. However, this method can cause slower convergence due to the best chromosomes do not differ so much from other chromosomes.

4.6.4. Tournament Selection

In Tournament Selection scheme, two individuals are selected at random with from the population, and the one individual with the best score is selected to reproduce. Using the same example, tournament selection could choose B and D for contest. B would then be selected, as its score of 10 larger than D's score of seven. Repeat this procedure to get the next population. Tournament selection does not care about the distribution of the scores but only consider the ranking. The nth ranked individual in a population of size N will have a (2N - 2n + 1) / N2 chance of reproducing. As a result puts an upper and lower bound on the selection of any individual to reproduce for the next generation. Tournament selection can be considering more than two individuals being chosen for competition and selecting the best from population.

4.6.5. Steady-State Selection

Steady-State Selection is not a method of selecting parents. The basic idea in selecting to the new population is that a big part of chromosomes can survive to next generation i.e. in every generation; a few fit chromosomes are selected for creating new offspring. Afterward, few lower fitness chromosomes are removed and the new offspring is placed in their place. The left of population survives to next generation.

4.6.6. Elitism

While creating a next population by crossover and mutation, there is a big chance to loose the best chromosome. Elitism selection method first copies the best chromosome to the next population. The left of the population is created in ways by considering above selection techniques.

4.6.7. Observations

There are several general observations about the generation of test cases via genetic algorithm:

In problems with adequate complexity, GAs may have an affinity to converge towards local optima rather than the global optimum. The possibility of this occurrence depends on the fitness landscape i.e. few problems may provide an easy way towards a global optimum while others may make it easier for the function to find the local optima.

With GA, handling dynamic data sets is difficult, as genomes starts converge early on towards solutions, which may cause no longer be valid for later data. Prevention of early convergence can be done either by increasing the probability of mutation when the solution quality drops, or by sporadically introducing entirely new, randomly generated elements into the pool of gene.

Selection is very important genetic operator; however, attitude is toward the importance of crossover versus mutation. There is dispute over that crossover is the most important, whereas mutation is only necessary to guarantee that potential solutions are not lost. While, some argue that crossover in a largely uniform population only serves to propagate improvements found by mutation, and in a non-uniform population crossover is always equivalent to a very large mutation.

GAs can quickly found good solutions, even in case of difficult search spaces.

For particular optimization problems and problem instantiations, simpler optimization algorithms may provides better solutions than genetic algorithms.

For speed and efficiency of the algorithm, the implementation and evaluation of the fitness function is most important.

Genetic Algorithm in software testing

Genetic Algorithms have been introduced in the sixties by Professor John Holland at university of Michigan as models of an Artificial Evolution [[GOL 1994], [GRE 1997]]. In the thirty past years, they have been successfully applied to a wide range of problems such as Natural Systems Modeling (e.g. Artificial Life environments, immune system modeling [[GRE 1997], Haeseleer et al. [HAE 1996]] Machine Learning systems, and optimization. GAs handle a population of individual (chromosomes) often modeled by vector of binary genes that encodes a potential solution to the considered problem and is named by a so-called fitness value, which is directly correlated to how good it is to solve the problem. In general, the basic approaches are to test software consists of using formal specifications to design an application. This approach is very strict but unfortunately not often used because the breadth of formal specification methods does not encompass all the functionality needed in today's complex applications. The second approach consists of doing test as part of the traditional engineering models (e.g. waterfall, spiral, prototyping) that have a specific phase for testing generally occurring after the application has been implemented. The modifications to these traditional models have being incorporating testing in every phase of the software development with methodologies such as extreme programming [Williams [WIL 1999]] used in the implementation of Windows XP. Despite all the claims, the truth here is that current approaches are insufficient to test software appropriately, thus causing the status of the field, which clearly seems to be loosing the battle of providing users with reliable software. It has been noticed that complete reliability is hard to achieve in empirical approaches to complete testing is impossible. This does not mean that a good set representing the full space of possible tests cannot be automatically generated thus reducing the cost of software development [Watkins [WAT 1995], Wegener et al. [WEG 2002]].

Therefore, in the present chapter, an attempt has been made to describe the basic nature of existing genetic transformations used for software testing and their related scenarios and applications for generating the efficient software test cases. Keeping in mind the above-mentioned requirement, researcher has been engaged in a number of activities involving study of software testing, genetic algorithms by using practical and theoretical analysis. Efforts has been made to understand the problem, and develop the corresponding high-level modules in C++ by using MATLAB version 7.0 tools and libraries provided by the language using the basic parameters of genetic algorithms for the generation of reliable and cost effective test cases.

This can be simulated by following algorithm: -

Step 1: Calculate the sum(S) of all chromosome finesses in population.

Step 2: Generate random number (r) from interval (0, S).

Step 3: Go through the population and sum finesses from 0 to S. if the sum S is greater then r, stop and return the chromosome where you are. Step 1 is performed once for each population.

The new chromosome generated by either crossover or mutation can then be introduced to the population and evaluated. Mutation helps to ensure diversity in the population. This can be done immediately (incremental GAs) or en masse when a certain number of new chromosomes have been generated (generational GAs). Whichever approach is used, it is normally the case that an equivalent number of chromosomes are deleted from the population. In terms of the search space, it helps the GA to jump to other parts of the space.

Algorithmic approach used for Genetic Algorithms

The basis of all GAs is the algorithm illustrated as:

a) Start with a set of genes

b) Evaluate genes in population until the maximum numbers of generations have been created

d) Crossover genes in population

e) Mutate genes in new population

f) Evaluate genes in extended population

g) Select genes to populate next generation

Figure 4.12: Information Flow for the GA Steps

The algorithm uses point crossover as depicted in Figure 1. Point crossover works by selecting a break point in the chromosome of the two selected individuals and recombining them using each-others' half [Vavak and Fogarty [VAV 1996], Grefenstette [GRE 1985]].

Test case generation using Genetic Algorithms

In this, implementation of genetic algorithm using MATLAB software has been dealt. Given the versatility of MATLAB's high-level language, problems can be coded in m-files in a fraction of the time that it would take to create C or FORTRAN programs for the same purpose. Couple this with MATLAB's advanced data analysis, visualization tools and special purpose applications domain toolbox and the user is presented with a uniform environment with which to explore the potential of genetic algorithms. The Genetic Algorithm GUI Toolbox plays a major role for obtaining an optimized solution and the best fitness value. GA's also require that a number of parameters be set. The first parameter is the population size. It has an impact on the speed of the GA convergence towards a near optimal solution and its capability to avoid local optima. Larger population sizes increase the amount of variation present in the initial population at the expense of requiring more cost function evaluations and longer execution times. Typical population sizes in the literature range between 25 and 100. However, for longer chromosomes and challenging optimization problems, larger population sizes needed to ensure diversity among the chromosomes and hence allow a more thorough exploration of the solution space.

In the context of integration orders, the population size consequently driven by the number of classes in the class diagram. As a heuristic, having a population size of two or three times the number of classes should be sufficient. Another parameter is the crossover rate that is the probability that a chromosome will undergo a crossover. Mutation prevents the GA search to fall into local optima, but they should not happen too often or the search will converge towards a random search. The mutation rate defined as the probability for a chromosome to undergo a mutation.

The algorithms presented in this chapter have been implemented on MATLAB version 7.0 for the generation of optimized test cases using GAs. These algorithms have been tested extensively with different test inputs containing many special cases, including random testing and testing based on specification of software as shown in fig 4.13.

Figure4.13. Shows the generation of test cases

The executional steps of genetic programming (that is, the flowchart of genetic Programming) are as follows:

Randomly create an initial population (generation 0) of individual computer programs composed of the available functions and terminals.

Iteratively perform the following sub-steps (called a generation) on the population Until the termination criterion i.e. specification of software is satisfied:

Execute each program in the population and ascertain its fitness using the problem's fitness measure.

Select one or two individual program(s) from the population with a probability based on fitness operations.

Create new individual program(s) for the population by applying the Genetic operations with specified probabilities

Results

For the test case generation, the following figure 4.14 has been designed, which consists of program P1 (as black box) having multiple input and one output variables.

Figure 4.14 depicts the program P1 to find largest number.

Test data for inputs can be defined in terms of preconditions that describe valid and invalid data values for each input. These preconditions may be determined from several sources, including the program's specification and the constraints of the computing environment. To create a test set, apply black-box test data selection criteria (such as equivalence-class partitioning, boundary value analysis, etc.) to each input variable with respect to the preconditions. After applying test selection criteria to each variable, researcher will have a set of test data values for each of the input variables. Since program p1 has multiple input variables, one must now consider how to make test combinations of program inputs. The most thorough approach is to test every possible combination of the selected test data values using GAs. The fitness functions based on other factors used to generate a sequence. In fact, in a more elaborate use of GAs, the data itself updated in a feedback loop based on the result of the execution of the test plan.

The algorithms presented in this chapter have been implemented on MATLAB version 7.0 for the generation of optimized test cases using GAs. These algorithms have been tested extensively with different test inputs containing many special cases, including random testing and testing based on specification of software. The graph 4.1 shows the fitness value for the different number of input variables.

Graph 4.1 shows the fitness value vs. variables.

Graph 4.2 shows the fitness value vs. variable numbers

Graph 4.3 shows the distance between individual, as the numbers of generations increases, the distance between the individual are decreases to zero.

Graph 4.3 shows the Average distance between individuals vs. generation.

The figure 4.14 shows the results generated by the MATLAB for the software under test after inserting mutation in the software. The value differs from the specification (expected value) of the software is depict as error.

Figure 4.15 shows the output generated by MATLAB

Graph 4.4 describes the errors detected in the software under test

Graph 4.4 shows the error detected vs. generation

These results are the focus for the software i.e. the software is mainly designed to process the optimization of test cases using Gas.

Perspective of work

Genetic encoding techniques have significant influence on GAS' performance in solving some problems with big algorithm complexity. The simulation shows that the proposed GAs with the specification can find solutions with better quality in shorter time. This chapter has presented the optimised behaviour for the generation of software test cases using Genetic Algorithms as a new efficient class. The strategy presented in this chapter relies on a technique that not only helps the tester to separate failure clusters, but also provides the developer with more information concerning the faults in the software and the input values that triggered them. The developer uses this information to search, locate, and segregate the faults that caused the failures. While each of these areas for future consideration could be further investigated with respect to applicability for software testing, as demonstrated by the examples of this chapter, the simple genetic algorithm approach presented in this chapter provides in itself a useful contribution to the selection of test cases and a focused examination of test results. . Further work will consist in studying in more details how specification of software be evolved and what properties it must feature thus enlightening us on GAs dynamics.