In conventional agricultural system, one of the major concerns is to reduce the abundance of unwanted plants known as weeds. In most of the cases, removal of weed population in agricultural fields involves the application of chemical herbicides, which has been successful in increasing crop productivity and quality. But concerns regarding the environmental and economic impacts of herbicide applications have promoted interest to seek alternative weed control approaches. An automated machine vision system which can classify crops and weeds from digital images can be a cost-effective alternative to reduce the excessive use of herbicides. Instead of uniform application of herbicides in the field, this real-time system can be deployed to reduce the amount of herbicide use by identifying and spraying only the weeds. This paper deals with the use of support vector machine (SVM) for classification of weeds and crops from digital images. The objective is to find out whether good classification accuracy can be achieved if SVM is used as a classification model in an automated weeding system. A total of fourteen features were investigated to find the best feature combination which provides the highest accuracy ratio. The analysis of the classification results shows over 97% accuracy over 224 sample images.
Keywords: weed control; herbicides; machine vision system; support vector machine; RBF kernel; stepwise feature selection.
I. Introduction
Increasing productivity and upgrading plantation systems are the major concerns for accelerating agricultural development. Weeds are unwanted plants which can survive and reproduce in agricultural fields and hamper the agricultural development by disturbing production and quality by competing with crops for water, light, soil nutrients and space. So, weed control strategies are required to sustain crop productivity. There are several strategies for weed control such as removing weeds manually by human laborers, mechanical cultivation and using agricultural chemicals known as herbicides. Using herbicides is the most common method which has adverse impacts on environment and human health. It also raises some economic concerns. In United States, total cost of herbicides was about $16 billion in 2005 [1]. One of the main cost ineffective and strategic problems in using herbicides system is that, herbicides are applied uniformly within crop field in most of the cases. There can be many portions of field having no or few weeds but herbicides are also applied to those portions. On the other hand, human involvement in applying herbicides is time consuming and costly. If same types of herbicides are applied in a field again and again for the removal of weed population, there is a possibility of emergence of weeds which are tolerant to those types of herbicides. Over 290 biotypes of herbicide tolerant weeds have been found in different places worldwide [2].
In most of the developing countries, economy is primarily supported by agriculture. The performance of this sector has an overwhelming impact on food security, poverty alleviation and economic development. To reduce the population pressure on agricultural sector, crop production and quality must be increased with minimal cost for weed control. Spraying herbicides with a knapsack sprayer is the most commonly used technique in agricultural fields. This technique is considered to be inefficient and time consuming and recommended safety measures are rarely maintained. So, a machine vision system which has the ability to classify crops and weeds and put herbicides where there are weeds can be a novel approach which will enhance the profitability and lessen environmental degradation. In this approach, images will be taken from crop field and weeds will be identified and treated accordingly by an automated real-time system. The main objective of this work is to use support vector machines as a classification model to classify crops and weeds from digital images and to determine whether this model can be used in real-time.SVM was chosen because of significant advantages of SVM such as good generalization performance, the absence of local minima and the sparse representation of solution [3].
Many researchers have investigated different methodologies for the automation of the weed control process. Shearer and Jones developed a photo sensor based plant detection system in 1991 [4]. A system developed by Hanks (1996) had the ability to detect and spray only the green plants [4]. Islam et al. (2005) classified narrow and broad leaves by measuring Weed Coverage Rate (WCR) and PDA was used as processing device for this system [1]. Ahmad I. et al. (2007) developed an algorithm to distinguish images into narrow and broad class based on Histogram Maxima with threshold for selective herbicide application having accuracy of 95% [4]. Ghazali et al. (2008) achieved above 80% accuracy rate using statistical GLCM and structural approach FFT and SIFT for intelligent real time weed control system in oil palm plantation [5].
II. MATERIALS AND METHODS
2.1 Image Acquisition
The images to be used for this study were taken from a chilli field. Also five weed species were chosen which are common in chilli fields of Bangladesh. TABLE I lists the English and Scientific names of chilli and the selected weed species.
TABLE
SELECTED SPECIES
Class Label
English Name
Scientific Name
1
Chilli
Capsicum frutescens
2
Pigweed
Amaranthus viridis
3
Marsh herb
Enhydra fluctuans
4
Lamb's quarters
Chenopodium album
5
Cogongrass
Imperata cylindrica
6
Burcucumber
Sicyos angulatus
The images were taken with a digital camera equipped with a 4.65 to 18.6 mm lens. The camera was pointed towards the ground directly while taking the images. The lens of the camera was 40 cm above the ground level. An image would cover a 30 cm by 30 cm ground area with these settings. No flash was used while taking the picture and the image scenes were protected against direct sunlight. The image resolution of the camera was set to 1200-768. The images taken were all colour images. 'Fig. 1A-F' shows sample images of chilli and other five weed species.
A B C
D E F
Figure 1: Sample images of different plants; (A) chilli (B) pigweed (C) marsh herb
(D) lamb's quarter (E) cogongrass (F) burcucumber.
2.2 Pre-processing
Segmentation method was used to separate the plants from soil in images. Thresholding technique was used for this purpose. The fact that plants are greener than soil was used to do segmentation. Let 'G' denotes the green colour component of a RGB image. A gray-scale image was obtained from the original image by considering only the 'G' value. A threshold value of 'G' was then calculated. Let 'T' denotes this threshold value. The pixels with 'G' value greater than 'T' were considered as plant pixels and the pixels with 'G' value lower than 'T' were considered as soil pixels. For each image, a binary image was obtained by segmentation, where pixels with value '0' represent soil and pixels with value '1' represent plant.
For removing noise from the images, an opening operation was first applied to the binary images. In opening, an erosion operation is applied after a dilation operation on the image. It has the effect of smoothing the contour of an object, breaking narrow isthmuses and eliminating thin protrusions from an image [6]. Then a closing operation was applied. In closing, a dilation operation is applied after an erosion operation on the image. It has the effect of eliminating small holes and filling gaps in the contour in an image [6]. 'Fig. 2A-C' shows the pre-processing steps of a sample image of pigweed.
A B C
Figure 2: Images of a pigweed; (A) RGB image (B) gray-scale image (C) segmented binary image
2.3 Feature Extraction
A total number of fourteen features were extracted from each image. These features can be divided into three categories: colour features, size independent shape features and moment invariants.
2.3.1 Colour Features
Let 'R', 'G' and 'B' denote the red, green and blue colour components respectively. Every colour component was divided by the sum of all three colour components. It has the effect of making the colour features consistent to different light conditions.
r = (1)
g = (2)
b = (3)
Only plant pixels were used when calculating the colour features, so the features are only based on plant colour not soil colour. The colour features used were: mean value of 'r', mean value of 'g', mean value of 'b', standard deviation of 'r', standard deviation of 'g' and standard deviation of 'b'.
2.3.2 Size Independent Shape Features
The size independent features used for this study were:
Formfactor = (4)
Elongatedness = (5)
Convexity = (6)
Solidity = (7)
Here, area is defined as the number of pixels with value '1' in a binary image. Perimeter is defined as the number of pixels with value '1' for which at least one of the eight neighbouring pixels has the value '0', which means perimeter is the number of border pixels. Thickness is twice the number of shrinking steps to make an object within an image disappear [7]. Shrinking process is defined as the elimination of border pixels one layer per step [7]. Convex area is defined as the area of the smallest convex hull which covers all the plant pixels in an image. Convex perimeter is the perimeter of the convex hull that contains all the plant pixels in an image.
2.3.3 Moment Invariant Features
Moment invariants refer to certain functions of moments, which are invariant to geometric transformations such as translation, scaling and rotation [8]. Only central moments are considered in our study.
Let, f(x,y) be a binary image of a plant. So, f(x,y) is '1' for those pairs of (x,y) that correspond to plant pixels and '0' for those pairs that correspond to soil pixels. Under a translation of co-ordinates, xʹ = x + α, yʹ = y + β, the invariants of (p+q)th order central moments are:
µp,q = ∑x∑y (x − xÌ…)p (y − yÌ…)q f(x,y), p, q = 0,1, 2, … … (8)
Here, 'xÌ…' and 'yÌ…' are the mean values of 'x' and 'y' respectively. Normalized moments [8], which are invariants under a scale change xʹ = αx and yʹ = αy, can be defined as:
(9)
where
(10)
These normalized moments are invariants to size change. The moment invariants used for this study are listed below:
Φ1 = η2,0 + η0,2 (11)
Φ2 = (η2,0 + η0,2)2 + 4η1,12 (12)
Φ3 = (η3,0 − 3η1,2)2 + (η0,3 − 3η2,1)2 (13)
Φ4 = (η3,0 + η1,2)2 + (η0,3 + η2,1)2 (14)
These moment features are invariant to rotation and reflection. The moment invariants were calculated on object area. Natural logarithm was used to make the moment invariants more linear.
2.4 Classification Using Support Vector Machines
A classification task in SVM requires to separate the dataset into two different parts. One is used for training and the other is for testing purpose. Each instance in the training set contains one class label and some features. Based on the given training data, SVM generates a model which is used to predict the class labels of the test data when only the feature values are provided. A training set of tuples and their associated class labels was used. Each tuple is represented by an n-dimensional feature vector,
X =(x1, x2,… …, xn) where n = 14
Here, 'X' depicts n measurements made on the tuple from n features. There are six classes labelled 1 to 6 as listed in TABLE I.
In case of SVM, it is required to represent all the data instances as a vector of real numbers. As the feature value for the dataset can have the value in dynamic range, the dataset needs to be normalized. Thus there will be no possibility y of features having greater numeric ranges dominating features having smaller numeric ranges. LIBSVM 2.91 was used for support vector classification [9]. Each feature value of the dataset was scaled to the range of [0, 1]. RBF (Radial-Basis Function) kernel was used for SVM training and testing. Provided samples are mapped nonlinearly by the RBF kernel to a higher dimensional space. Thus, this kernel is able to handle cases like nonlinear relation between class labels and features. A commonly used radial basis function is:
K(xi , xj) = exp(−γ || xi − xj ||2), γ>0 (15)
where
|| xi - xj ||2 = (xi - xj)t (xi − xj) (16)
Implementation of RBF kernel in LIBSVM 2.91 requires two parameters: 'γ' and a penalization parameter, 'C' [9]. Appropriate values of 'C' and 'γ' should be calculated to achieve high accuracy rate in classification. For the purpose of this study, selected values of these two parameters were C = 1.00 and γ = 1 / total number of features.
III. RESULT AND DISCUSSION
For testing the system, full dataset is divided into two sets- training dataset and testing dataset set. Training set is used to train the system and testing dataset is used to test the accuracy of the system. Cross validation is an improved testing procedure which prevents the overï¬tting problem. Ten-fold cross validation was selected for the testing purpose. In ten-fold cross validation, it is required to split the whole training set into ten subsets having equal number of instances. Sequentially one subset is tested using the classiï¬er trained on the remaining nine subsets. The cross validation accuracy is the percentage of correctly classified testing data as each instance of the whole training set is used for testing purpose once. The cross validation result of the developed system using all features was 95.9% over 224 samples.
All crop images were identified correctly by SVM. But in case of weed images, there were some misclassifications. Five images of pigweed were misclassified as burcucumber. One image of burcucumber was misclassified as pigweed. Two images of marsh herb were misclassified as lamb's quarters. No weed image was misclassified as Chilli. The overall classification result is shown in TABLEII.
TABLE II
CLASSIFICATION RESULT USING ALL FEATURES
English Name of Samples
Number of Samples
Number of Misclassified Samples
Success Rate
Chilli
40
0
100%
Pigweed
40
5
87.5%
Marsh herb
31
2
93.5%
Lamb's quarters
33
0
100%
Cogongrass
45
0
100%
Burcucumber
35
2
94.3%
Average Success Rate
95.9%
To select the set of features which gives the best classification result, both forward-selection and backward- elimination methods were used. In forward-selection, selection process starts with a set having only one feature. Then other features are added to the set one at a time. At each step, each feature that is not the member of the set is tested whether it can be included in the set or not. If no further improvement is achieved this feature selection procedure is stopped otherwise it continues to find better success rate. In backward-elimination, the best feature selection procedure starts with a set initially having all the features included. Then the feature having the least discriminating ability is removed from the set. This process continues until the best classification result is obtained. Forward-selection and backward-elimination methodologies are combined in stepwise feature selection procedure to find out the best feature combination. In stepwise selection, features are added in set at each step one at a time just like forward-selection. After each feature is added in set by forward-selection, backward-elimination is applied on the set. Using this method, a set of nine features was selected which produce the best classification rate. Those nine features were:
Solidity
Elongatedness
Mean value of 'r'
Mean value of 'b'
Standard deviation of 'r'
Standard deviation of 'b'
ln(Φ1) of area
ln(Φ2) of area
ln(Φ4) of area
The result of ten-fold cross validation using these nine features was 97.3%. Four images of pigweed were misclassified as burcucumber and two images of burcucumber were misclassified as pigweed. All other images were classified accurately. The overall classification result using these nine features is given in TABLEIII.
TABLE III
CLASSIFICATION RESULT USING BEST FEATURES
English Name of Samples
Number of Samples
Number of Misclassified Samples
Success Rate
Chilli
40
0
100%
Pigweed
40
4
90%
Marsh herb
31
0
100%
Lamb's quarters
33
0
100%
Cogongrass
45
0
100%
Burcucumber
35
2
94.3%
Average Success Rate
97.3%
IV. CONCLUSION
An automated weeding system must have the ability to identify crops and weeds automatically and treat them accordingly. Machine vision system based on digital image processing is recognized as the most efficient sensor detection technique. For real-time implementation of this machine vision system, an efficient classification model is required which can classify crops and weeds with a high accuracy ratio. The goal of this paper was to test the feasibility of support vector machines in crops and weeds classification. From the results, it is clear that SVM provides very high accuracy ratio and it is also quite robust. To achieve higher success rate in real-time implementation of SVM in a weed classification system, good image segmentation and noise reduction techniques are required.