Extracting knowledge from the universe and identifying the association between attributes values is a challenging task today. Granular computing proposed by Pawlak [1], Louis [2] and Zadeh [3] is a tool to identify the associations between the attributes values that are indiscernible. However, in much information system the attribute values are almost indiscernible rather than discernible. Therefore, it is essential to identify the associations between the attribute values where the attribute values are almost indiscernible. In this paper, we use rough set on fuzzy approximation spaces and ordering values to deal with almost indiscernibility. Finally, we use granular computing to find the associations between the attribute values.
There is a huge repository of the data available across various domains. At the present age of internet, data can be easily collected and accumulated. It is very hard to extract useful information from the voluminous data available in the universe. So, it has become one of the most popular areas of recent research to find knowledge about the universe. In order to transform the processed data intelligently and automatically into useful information and knowledge, there is need of new techniques and tools. Development of these techniques and tools are studied under different domains like knowledge discovery in database, computational intelligence and knowledge engineering etc. Various traditional tools are developed by researchers to mine knowledge from the accumulated voluminous data, but most of them are crisp and deterministic in nature. However, if we see the real time dataset, it is inconsistent and ambiguous. So, there is need of classification among the objects of the universe into similarity classes. The basic building block of knowledge about the universe is called granule. Creating granules from the objects of the universe by classification is called knowledge granulation and processing of these granules in order to find the knowledge about of the universe is termed as granular computing. It is observed that classification of the universe can be done on the basis of indiscernibility relation among the objects. Granular computing as proposed by Pawlak[1],Louie[2] and Zadeh[3] is a tool to identify the associations between the attribute values, that are indiscernible. However, if we consider the real life situation, the attribute values are almost indiscernible. In order to model the real life situation, the fundamental concept of classical sets has been extended in various directions. One of the approach in the direction was introduced by Fayyad et al. (1996), who developed and illustrated the knowledge discovery in database(KDD) and identifies some useful and understandable pattern in data, but if we take into account the factors affecting KDD, its complexity increases. Amit Singhal(2001) and Donaovan(2003) also provided some classification to classify dataset that are crisp in nature. Zadeh's (1965) introduces fuzzy set theory concepts which were applied to knowledge discovery database, the concept was further extended to L fuzzy set by Goguen (1967); intuitionistic fuzzy set by Atanasov(1986);twofold fuzzy set by Dubosis et al.(1987) to name a few. But all of these methods lack uniqueness in choosing the membership and non membership function. Rough set introduced by Pawlak(1991) is a tool that depends upon the notion of equivalence relation defined over a universe. This concept is further extended to rough set on fuzzy approximation space which depends upon fuzzy proximity relation as discussed by D.P. Acharjya & Tripathy (2008). Rough set on fuzzy approximation space is an intelligent tool that finds out the significance of attributes in the given data set using the member function.
Here, in this paper we use rough set on fuzzy approximation space, deals with data that are almost indiscernible and use granular computing approach to find out association between the attribute values that are almost indiscernible rather than being indiscernible.
2. ROUGH SET
The classical set i.e. crisp set has been studied and extended in many directions to model real life situations. The notion of fuzzy set studied by Zadeh (1965), its generalizations and the notion of rough sets was studied by Pawlak and Skowron (2007) were the major research in this field. The rough set philosophy is based on the concept that there is some information associated with each object of the universe. So there is need to classify objects of the universe based on the indiscernibility relation among them, as there are various objects in the universe that are similar to each other. Rough set is a mathematical tool that is used to classify the objects of the universe based on the indiscernibility relation among the objects of the universe. The basic idea of rough set is based upon the approximation of sets by pair of sets known as lower approximation and upper approximation with respect to some imprecise information. In this section, we will study the basic concepts, definitions and different notations that will be used in the rest of the paper.
Let U ï¦ï€ be a set of objects called the universe, and R be an equivalence relation over U. Then U/R we denote the family of equivalence classes of R (or classification of U) referred to as categories or concepts of R and [x]R denotes a category in R containing an element xU . By a knowledge base, we understand a relational system K ï€½ï€ (U,R) , where U is as above and R is a family of equivalence relations over U. For any subset P ïƒï€ R and P ï¦ï€ , IND(P) is the intersection of all equivalence relations in P. IND(P) is called the indiscernibility relation over P. The equivalence classes of IND(P) are called P-basic knowledge about U in K. For any QR , every equivalence class of Q is called Q-elementary concepts of knowledge R.
The family of all P-basic categories P ï¦ï€ , P ïƒï€ R will be called the family of basic categories in knowledge base K ï€½ï€ (U,R) . By IND(K) we denote the family of all equivalence relations defined in K; equivalently,
IND(K) ï€½ï€ {IND(P) : P ïƒï€ R,P ï¦ï€ }
For any X ïƒU and an equivalence relation RIND(K) , we associate two subsets X and RX called the R-upper and R-lower approximations of X respectively, which are given by:
RX ï€½ï€ U{Y U /R :Y ïƒï€ X} and
X ï€½ï€ U{Y U/ R :Y I X ï¦}
The R-boundary of X is denoted by BNR(X) and is given as BNR(X ) ï€½ï€ X ï€ï€ RX . We say X is rough with respect to R if and only if X ï‚¹ï€ RX or equivalently BNR(X) ï¦ï€ . If BNR(X)ï¦ï€ or X ï€½ï€ RX then the target set X is a crisp set. It indicates that a rough set is more generalized then crisp set.
3. ROUGH SETS ON FUZZY APPROXIMATION SPACE
Let U be a universe. The elements of the universe may have crisp or fuzzy relations among them, based on the nature of the dataset. In this section, we present the definitions, notations and results on fuzzy approximation space and rough set on fuzzy approximation space. We will refer to these concepts in later section of the paper and it will be the base of our further discussion.
Definition 3.1: Let U be a universe. We define a fuzzy relation on U as a fuzzy relation on U as a fuzzy subset of (U Ã- U).
Definition 3.2: A fuzzy relation R on U is said to be a fuzzy proximity relation if
µR (x, x) = 1 for all x U and (3.1)
µR (x, y) =µR (y, x) for x, y U (3.2)
Definition 3.3: Let R is a fuzzy proximity relation on U. Then for a given α  [0, 1], we say that two elements x and y are α-similar with respect to R if (x, y)  Ra and we write xRay.
Definition 3.4: Let R is a fuzzy proximity relation on U. Then for a given α  [0, 1], we say that two elements x and y are α-identical with respect to R if either x is α-similar to y or x is transitively α-similar to y with respect to R, i.e., there exists a sequence of elements u1,u2,u3,........un in U such that xRau1, u1Rau2, u2Rau3,........... unRay.
If x and y are α-identical with respect to fuzzy proximity relation R, then we write xR(α)y, where the relation R(α) for each fixed α  [0,1] is an equivalence relation on U.
Definition 3.5: The pair (U, R) is called a fuzzy approximation space. For any α  [0, 1], we denote by R*α,the set of all equivalence classes of R(α). Also, we call (U, R(α)), the generated approximation space associated with R and α.
Definition 3.6: Let (U, R) be a fuzzy approximation space and let X ïƒ U. Then the rough set of X in (U, R (α)) is denoted by (Xα, α) where Xα is the α-lower approximation of X whereas α is the α-upper approximation of X. We define Xα and α as
Xα =ï€ ïƒˆ{Y: Y R*α and Y ïƒ X} and (3.3)
α= {Y: Y R*α and Y ∩ X ≠} (3.4)
Definition 3.7: Let X ïƒï€ U. Then X is said to be α-discernible if and only if Xα=α and X is said to be α-rough if and only if Xα≠α.
Many properties of α-lower and α-upper approximations have been studied by De [4].
4. ORDERED INFORMATION SYSTEM
Let I= (U, A, {Va: aA}, {fa: a A}) be an information system, where U is finite non-empty set of objects called the universe and A is a non empty finite set of attributes. For every aA, Va is the set of values that attribute may take and fa: U→ Va is an information function. In practical applications object can be cases, companies, institutions, processes and observations. Attributes can be interpreted as features, variables, and characteristics. A special case of information systems called information table or attribute value table where the columns are labelled by attributes and rows are by objects. For example: The information table assigns a value a(x) from Va to each attribute and object in the universe U. With any PA there is an associated equivalence relation such that
IND(P)={(x,y) U2 |aP,a(x) = a(y)} (4.1)
The relation is called a P-indiscernibility relation. The partition of U is a family of all equivalence classes of IND(P) and is denoted by U/IND(P) or U/P .If (x,y) IND(P), then x and y are indiscernible by attributes from P. For example, consider the information table 1.In the given table 1, we have U= {p1, p2, p3, p4, p5, p6, p7, p8}, A= {Company, Model, Price}, and VCompany={Nokia,Samsung,Blackberry,Micromax,LG,Motorola}. Similarly, VModel ={E7,Wave,N8,Torch,W900,Renoir,Quench,Metro}, and VPrice= {29000, 7000, 23174, 12575, 5990}. Knowledge representation in Rough Set Data Analysis is done via information systems, which are a form of data table. It provides the available information about the object under consideration in the data set. In information system objects are perceived and studied using their properties. At the same time, it does not consider any semantic relationships between distinct values of a particular attribute [1]. Different values of the same attribute are considered as distinct symbols without any connections, and therefore on simple pattern matching we consider horizontal analyses to a large extent. Hence, in general one uses the trivial equality relation on values of an attribute as discussed in standard rough set theory [2].
Table 1. Information Table
Cell Phone Company Model Price
P1 Nokia E7 29000
P2 Samsung Wave 7000
P3 Nokia N8 23174
P4 Blackberry Torch 29000
P5 Micromax W900 7000
P6 LG Renoir 29000
P7 Motorola Quench 12575
P8 Samsung Metro 5990
However, in real life situation, the attribute values are almost indiscernible rather than being discernible. Finding association between such attributes is a tricky job. For example, if objects are patients suffering from certain disease, symptoms of the disease form information about patients. These symptoms are almost identical rather full identical. In information table some semantics can be added to make it more generalized. For the problem of knowledge mining, we introduce order relations on attribute values [9].
In this paper we use rough sets on fuzzy approximation space to find the attribute values that are -identical before introducing the order relation. This is because exact ordering is not possible when the attribute values are almost identical. For, the almost indiscernibility relation, reduces to the indiscernibility relation. Therefore, it generalizes the Pawlak's indiscernibility relation. An ordered information table (OIT) is defined as :
OIT = {IT, {a: a A}} (4.2)
Where, IT is a standard information table and a is an order relation on attribute a. An ordering of values of a particular attribute a naturally induces an ordering of objects:
X (a) y fa (x) a fa (y) (4.3)
Where, (a) denotes an order relation on U induced by the attribute a. An object x is ranked ahead of object y if and only if the value of x on the attribute is ranked ahead of the value of y on the attribute a. For example, a sample ordered information table of eight companies with three attributes {IC, IF, PP} is shown in table 2, where attribute IC represents Intellectual Capital; IF represents Infrastructure Facility and PP represents Placement Performance.
Table 2. Ordered Information table
Institutions IC IF PP
I1 High Very High Good
I2 High Very High Good
I3 Average Very High Good
I4 Low High Average
I5 Low Very High Good
I6 Average Very High Good
I7 Low High Average
I8 Low High Average
IC: High Average Low
IF: Very High High
PP: Good Average
For a subset of attributes P A, we define:
x p y fa (x) a fa (y) aP
fa (x) a fa (y)
a P
{a} (4.4)
a P
It indicates that x is ranked ahead of y if and only if x is ranked ahead of y according to all attributes in P. The above definition is a straightforward generalization of the standard definition of equivalence relations in rough set theory [10, 11], where the equality relation used. Knowledge mining based on order relations is a concrete example of applications on generalized rough set model with non equivalence relations [6, 8].
5. PROPOSED KNOWLEDGE GRANULATION MODEL
In this section, we propose our knowledge mining model that consists of problem definition, target data, pre-processed data, processed data, and granular computing view, ordered information table, knowledge association rules and knowledge granulation as shown in Figure 1.
Figure 1. Proposed knowledge Granulation model
The fundamental steps of identifying the right problem is about clearly knowing the problem definition and incorporating the prior knowledge associated to it. Extracting knowledge from the universe and identifying the desired knowledge out of the voluminous data is highly problematic. Granular computing approach can be applied to identify the association between the attributes values that are indiscernible, but in the case of almost indiscernible attributes values, it is very difficult to identify the association. So, here in the proposed model we apply data cleaning process on the target data to obtain pre-processed data, which is further applied by fuzzy proximity relation to get processed data. Rough set on fuzzy approximation space is used as a tool on the processed data and we get a classified data, which is the granular computing view of the system. The resultant data is further processed by ordering rules to obtain the ordered information system. Thus we get the ordered information of the data that was almost indiscernible. Now we can apply the granular computing tool to the ordered information system and can identify knowledge association rules. Finally, we extract the knowledge; classify the pattern in the data that was almost indiscernible.
6. CASE STUDY
In this section, we demonstrate in real life problem how we can apply the above concepts for extracting information. We consider the example in which we study the different cosmetic company's business strategies in a country. In the table 1 given below, we consider few parameters for business strategies to get maximum sales; their possible range of values and a fuzzy proximity relation which characterized the relationship between parameters. The companies with more expenditure in marketing, more expenditure in advertisement, more expenditure in distribution, more expenditure in miscellaneous, and more expenditure on research and development is being an ideal case. But such type of cases is rare in practice. So, a company may not excel in all the parameters in order to get top position. However, out of these, some parameters may have greater influence on the scoring than the others. For different values of α parameters may be different. In fact, if we decrement the value of α more and more number of parameters shall become indispensable.
The membership function has been adjusted such that its value should lie in [0, 1] and also the function must be symmetric. The companies can be judged by the sales outputs that are produced. The amount of sales can be judged by the different parameters of the companies. These parameters forms the attribute set for our analysis. Here the marketing expenditure means, all expenditure incurred for corporate promotion, which includes event marketing, sales promotion, direct marketing etc. which comes to around 6%. The advertising expenditure includes promotional activities using various medium like television, newspaper, internet etc which comes around 36%. The miscellaneous expenditure is mainly incurred through activities like corporate social responsibility and it leads to maximum of 28%. The distribution cost includes expenses on logistic, supply chain etc. and it comes around 24%. The investment made on new product development and other research activities are taken on research and development activities and it takes around 6%. However we have not considered many other parameters that do not influence the sales of a company and to make our analysis simple. The average of the data collected is considered to be the representative figure and tabulated below. The notations and abbreviations used in the following analysis are presented in table 3.
Table 3. Notations and abbreviations
Parameter
Attribute
Possible Range
Membership Function
Expenditure on marketing
Mkt.
1-150
1-|x-y|/2(x+y)
Expenditure on advertisement
Advt.
1-900
1-|x-y|/2(x+y)
Expenditure on distribution
Dist.
1-600
1-|x-y|/2(x+y)
Expenditure on miscellaneous
Misc.
1-700
1-|x-y|/2(x+y)
Expenditure on research and development
R&D
1-150
1-|x-y|/2(x+y)
In the following table 4 we present the data obtained from ten different companies. However, we keep confidential their identity due to various official reasons. Here, we use the notation, Ci , i=1,2,3,....,10 for different companies for the purpose of the study to demonstrate the method and not to probe the performance of individual company. It is to be noted that, in the information table all non-ratio figures shown in the table are ten million INR.
Table 4. Information table
Company
Mkt.
Advt.
Dist.
Misc.
R&D
C1
18.276
162.236
30.236
72.146
9.156
C2
2.076
5.393
6.793
8.290
0.383
C3
0.496
1.330
0.433
2.733
0.393
C4
0.940
0.060
0.666
5.890
1.243
C5
27.333
38.660
16.496
24.343
1.523
C6
7.033
866.916
508.676
637.530
38.963
C7
4.323
4.173
1.753
3.176
0.003
C8
38.516
40.046
3.126
8.026
0.056
C9
0.466
0.460
0.993
3.803
0.053
C10
0.603
0.036
0.393
0.613
0.016
Now, in order to minimize the computation, we have combined both fuzzy proximity relations. We have designed five relations based on the attributes and computed the similarity between them. The fuzzy proximity relation R1 corresponding to attribute 'Mkt.' is given below in table 5.
Table 5. Fuzzy proximity relation for attribute Mkt.
R1
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C1
1.000
0.602
0.526
0.549
0.901
0.778
0.691
0.822
0.525
0.532
C2
0.602
1.000
0.693
0.812
0.571
0.728
0.824
0.551
0.683
0.725
C3
0.526
0.693
1.000
0.845
0.518
0.566
0.603
0.513
0.984
0.951
C4
0.549
0.812
0.845
1.000
0.533
0.618
0.679
0.524
0.831
0.891
C5
0.901
0.571
0.518
0.533
1.000
0.705
0.637
0.915
0.517
0.522
C6
0.778
0.728
0.566
0.618
0.705
1.000
0.881
0.654
0.562
0.579
C7
0.691
0.824
0.603
0.679
0.637
0.881
1.000
0.601
0.597
0.622
C8
0.822
0.551
0.513
0.524
0.915
0.654
0.601
1.000
0.512
0.515
C9
0.525
0.683
0.984
0.831
0.517
0.562
0.597
0.512
1.000
0.936
C10
0.532
0.725
0.951
0.891
0.522
0.579
0.622
0.515
0.936
1.000
The fuzzy proximity relation R2 corresponding to attribute 'Advt.' is given below in table 6.
Table 6. Fuzzy proximity relation for attribute Advt.
R2
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C1
1.000
0.532
0.508
0.500
0.692
0.658
0.525
0.698
0.503
0.500
C2
0.532
1.000
0.698
0.511
0.622
0.506
0.936
0.619
0.579
0.507
C3
0.508
0.698
1.000
0.543
0.533
0.502
0.742
0.532
0.757
0.526
C4
0.500
0.511
0.543
1.000
0.502
0.500
0.514
0.501
0.615
0.875
C5
0.692
0.622
0.533
0.502
1.000
0.543
0.597
0.991
0.512
0.501
C6
0.658
0.506
0.502
0.500
0.543
1.000
0.505
0.544
0.501
0.500
C7
0.525
0.936
0.742
0.514
0.597
0.505
1.000
0.594
0.599
0.509
C8
0.698
0.619
0.532
0.501
0.991
0.544
0.594
1.000
0.511
0.501
C9
0.503
0.579
0.757
0.615
0.512
0.501
0.599
0.511
1.000
0.573
C10
0.500
0.507
0.526
0.875
0.501
0.500
0.509
0.501
0.573
1.000
The fuzzy proximity relation R3 corresponding to attribute 'Dist.' is given below in table 7.
Table 7. Fuzzy proximity relation for attribute Dist.
R3
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C1
1.000
0.683
0.514
0.522
0.853
0.556
0.555
0.594
0.532
0.513
C2
0.683
1.000
0.560
0.589
0.792
0.513
0.705
0.815
0.628
0.555
C3
0.514
0.560
1.000
0.894
0.526
0.501
0.698
0.622
0.804
0.976
C4
0.522
0.589
0.894
1.000
0.539
0.501
0.775
0.676
0.901
0.871
C5
0.853
0.792
0.526
0.539
1.000
0.531
0.596
0.660
0.557
0.523
C6
0.556
0.513
0.501
0.501
0.531
1.000
0.503
0.506
0.502
0.501
C7
0.555
0.705
0.698
0.775
0.596
0.503
1.000
0.859
0.862
0.683
C8
0.594
0.815
0.622
0.676
0.660
0.506
0.859
1.000
0.741
0.612
C9
0.532
0.628
0.804
0.901
0.557
0.502
0.862
0.741
1.000
0.784
C10
0.513
0.555
0.976
0.871
0.523
0.501
0.683
0.612
0.784
1.000
The fuzzy proximity relation R4 corresponding to attribute 'Misc.' is given below in table 8.
Table 8. Fuzzy proximity relation for attribute Misc.
R4
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C1
1.000
0.603
0.536
0.575
0.752
0.602
0.542
0.600
0.550
0.508
C2
0.603
1.000
0.748
0.915
0.754
0.513
0.777
0.992
0.814
0.569
C3
0.536
0.748
1.000
0.817
0.601
0.504
0.963
0.754
0.918
0.683
C4
0.575
0.915
0.817
1.000
0.695
0.509
0.850
0.923
0.892
0.594
C5
0.752
0.754
0.601
0.695
1.000
0.537
0.615
0.748
0.635
0.525
C6
0.602
0.513
0.504
0.509
0.537
1.000
0.505
0.512
0.506
0.501
C7
0.542
0.777
0.963
0.850
0.615
0.505
1.000
0.784
0.955
0.662
C8
0.600
0.992
0.754
0.923
0.748
0.512
0.784
1.000
0.821
0.571
C9
0.550
0.814
0.918
0.892
0.635
0.506
0.955
0.821
1.000
0.639
C10
0.508
0.569
0.683
0.594
0.525
0.501
0.662
0.571
0.639
1.000
The fuzzy proximity relation R5 corresponding to attribute 'R&D.' is given below in table 9.
Table 9. Fuzzy proximity relation for attribute R&D
R5
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C1
1.000
0.540
0.541
0.620
0.643
0.690
0.500
0.506
0.506
0.502
C2
0.540
1.000
0.994
0.736
0.701
0.510
0.508
0.628
0.622
0.540
C3
0.541
0.994
1.000
0.740
0.705
0.510
0.508
0.625
0.619
0.539
C4
0.620
0.736
0.740
1.000
0.949
0.531
0.502
0.543
0.541
0.513
C5
0.643
0.701
0.705
0.949
1.000
0.538
0.502
0.535
0.534
0.510
C6
0.690
0.510
0.510
0.531
0.538
1.000
0.500
0.501
0.501
0.500
C7
0.500
0.508
0.508
0.502
0.502
0.500
1.000
0.551
0.554
0.658
C8
0.506
0.628
0.625
0.543
0.535
0.501
0.551
1.000
0.986
0.722
C9
0.506
0.622
0.619
0.541
0.534
0.501
0.554
0.986
1.000
0.732
C10
0.502
0.540
0.539
0.513
0.510
0.500
0.658
0.722
0.732
1.000
Analysis
Now, we derive the degree of the dependency for the membership value α ≥ 0.85. The different equivalence classes corresponding to the attributes A = {Mkt., Advt., Dist., Misc., R&D} are given below.
U/R1α={{C1,C5,C8},{C6,C7},{C2},{C3,C4,C9,C10}}
U/R2α={{C1},{C2,C7},{C3},{C4,C10},{C5,C8},{C6},{C9}}
U/R3α={{C1,C5},{C2},{C3,C4,C7,C8,C9,C10},{C6}}
U/R4α={{C1},{C2,C3,C4,C7,C8,C9},{C5},{C6},{C10}}
U/R5α={{C1},{C2,C3},{C4,C5},{C6},{C7},{C8,C9},{C10}}
Therefore according to the attribute Expenditure on marketing C1, C5 and C8 are α -identical; C6 and C7 are α -identical; C3, C4, C9 and C10 are α -identical; C2 is not identical to any company for α ≥ 0.85. Therefore, the values of the attribute Expenditure on marketing are classified into four categories namely very low, low, average, and high and hence can be ordered. The values of the attribute Expenditure on advertisement are classified into seven categories namely poor, very low, low, average, high, very high and outstanding. The values of the attribute Expenditure on distribution are classified into four categories namely low, average, high and very high. The values of the attribute Expenditure on miscellaneous are classified into five categories namely very low, low, average, high and very high and hence can be ordered. Similarly, the values of the attribute Expenditure on research and development are classified into seven categories namely poor, very low, low, average, high, very high and outstanding. Therefore, the ordered information table 10 of the of business strategies of different cosmetic companies table 4 is given below.
Table 10. Ordered information table of business strategies of different cosmetic companies
Company
Mkt.
Advt.
Dist.
Misc.
R&D
C1
High
Very High
High
High
Very High
C2
Low
Average
Average
Low
Average
C3
Very Low
Low
Low
Low
Average
C4
Very Low
Poor
Low
Low
High
C5
High
High
High
Average
High
C6
Average
Outstanding
Very High
Very High
Outstanding
C7
Average
Average
Low
Low
Poor
C8
High
High
Low
Low
Low
C9
Very Low
Very Low
Low
Low
Low
C10
Very Low
Poor
Low
Very Low
Very Low
Mkt. : High Average Low Very Low
Advt. : Outstanding Very High High Average Low Very Low Poor
Dist. : Very High High Average Low
Misc. : Very High High Average Low Very Low
R&D : Outstanding Very High High Average Low Very Low Poor
7. MINING ASSOCIATION RULES USING GRANULAR COMPUTING:
In this section, we show how granular computing is used in finding the association rules.
In previous section, an ordered information table has been prepared for the indiscernible attribute values data set. Now granular computing approach can be applied to the given data set i.e. having ordered data. In granular computing, a set of attribute values is called an association rule if they satisfy certain criteria. If a set of attribute values is an association rule, all the attribute values in the set are associated with one another. To find association rules we find granules based on each attribute as shown in the Table 11.
To check whether a set of attribute values is an association rule or not, we perform the AND operation among the bit representation of the attribute values of this set. If number of 1's in the result of AND operation is greater than or equal to minimum support, then it is an association rule, otherwise it is not an association rule.
Table 11: Granules based on individual attributes
Granules Based on
Attribute Values
Granules as List
Granules as Bits
Expenditure on marketing
High
{C1,C5,C8}
1000100100
Average
{C6,C7}
0000011000
Low
{C2}
0100000000
Very Low
{C3,C4,C9,C10}
0011000011
Expenditure on advertisement
Outstanding
{C6}
0000010000
Very High
{C1}
1000000000
High
{C5,C8}
0000100100
Average
{C2,C7}
0100001000
Low
{C3}
0010000000
Very Low
{C9}
0000000010
Poor
{C4,C10}
0001000001
Expenditure on distribution
Very High
{C6}
0000010000
High
{C1,C5}
1000100000
Average
{C2}
0100000000
Low
{C3,C4,C7,C8,C9,C10}
0011001111
Expenditure on miscellaneous
Very High
{C6}
0000010000
High
{C1}
1000000000
Average
{C5}
0000100000
Low
{C2,C3,C4,C7,C8,C9}
0111001110
Very Low
{C10}
0000000001
Expenditure on research and development
Outstanding
{C6}
0000010000
Very High
{C1}
1000000000
High
{C4,C5}
0001100000
Average
{C2,C3}
0110000000
Low
{C8,C9}
0000000110
Very Low
{C10}
0000000001
Poor
{C7}
0000001000
For example, let us assume that the minimum support is 2. To check whether {MktHigh, AdvtHigh} is an association rule, we perform AND operation between bit representation of the attribute values MktHigh and AdvtHigh as shown in the Table 12. Since the number of 1's in the result of AND operation is greater than or equal to the minimum support 2, thus {MktHigh, AdvtHigh } is an association rule. Let us consider the attribute values {MktAvg,, AdvtOutstanding, R&DVery High }. The AND operation among bit representation of attribute values MktAvg, AdvtOutstanding, R&DVery High is shown in the Table 11. Since number of 1's in the result of AND operation is 1 which is less than minimum support 2, {MktAvg, AdvtOutstanding, R&DVery High } is not an association rule.
Table 12: AND Operation between two attribute values
Attribute values
Granules
MktHigh
AdvtHigh
1000100100
0000100100
AND
0000100100
Table 13: AND Operation between two attribute values
Attribute values
Granules
MktAvg,
AdvtOutstanding
R&DLow
0000011000
0000010000
0000000110
AND
0000000000
From the analysis {MktHigh, AdvtHigh} is an association rule with bit representation of MktHigh and AdvtHigh as 1000100100 and 0000100100 respectively. The bit representation of the AND operation between them is 0000100100. It means that, the attribute values MktHigh and AdvtHigh are present in the two objects of the company. i.e. {C5, C8}. Also, {MktHigh, AdvtHigh} is an association rule of length two since it consists of two attribute values of two different attributes. First we find all possible association rules of length two, and then all possible association rules of length three and so on. For better understanding the AND operation between different attribute values is presented in Table 12.
Table 14: AND Operation among different attribute values
Set of Attribute Values
AND Operation among Attribute Values
MktVery Low, DistLow, MiscLow
0011000010
AdvtAvg,, MiscVery High, R&DOutstanding
0000000000
DistLow, MiscLow, R&DLow
0000000110
MiscHigh, AdvtVery Low, DistAvg
0000000000
MktVery Low, AdvPoor, DistLow
0001000001
MktHigh, DistAvg, R&DVery Low
0000000000
If we consider attribute values { MktVery Low, DistLow, MiscLow }, AND operation among these attribute values is 0011000010 provides an association rule as shown in Table 14 with minimum support 2.Using the same manner we can identify all the association rules existing between the different attributes values of any length.
8. CONCLUSION
The discovery of the knowledge about the universe requires classification of the objects of the universe based on the indiscernibility relation. Mining association rules using granular computing approach cannot be applied directly to find the association between the attributes values that are almost indiscernible in nature. So, in this paper we study and use rough set on fuzzy approximation space, ordering rules to deal with almost indiscernibility and finally we use granular computing tool to find out the associations between almost indiscernible attribute values. We have taken a real life example of expenditure of 10 cosmetic companies according to different attributes. We have shown how analysis can be performed by taking rough set on fuzzy approximation space, ordering of objects and granular computing as a model for mining knowledge.