Study Of Knowledge Granulation Rules Computer Science Essay

Published: November 9, 2015 Words: 5076

Extracting knowledge from the universe and identifying the association between attributes values is a challenging task today. Granular computing proposed by Pawlak [1], Louis [2] and Zadeh [3] is a tool to identify the associations between the attributes values that are indiscernible. However, in much information system the attribute values are almost indiscernible rather than discernible. Therefore, it is essential to identify the associations between the attribute values where the attribute values are almost indiscernible. In this paper, we use rough set on fuzzy approximation spaces and ordering values to deal with almost indiscernibility. Finally, we use granular computing to find the associations between the attribute values.

There is a huge repository of the data available across various domains. At the present age of internet, data can be easily collected and accumulated. It is very hard to extract useful information from the voluminous data available in the universe. So, it has become one of the most popular areas of recent research to find knowledge about the universe. In order to transform the processed data intelligently and automatically into useful information and knowledge, there is need of new techniques and tools. Development of these techniques and tools are studied under different domains like knowledge discovery in database, computational intelligence and knowledge engineering etc. Various traditional tools are developed by researchers to mine knowledge from the accumulated voluminous data, but most of them are crisp and deterministic in nature. However, if we see the real time dataset, it is inconsistent and ambiguous. So, there is need of classification among the objects of the universe into similarity classes. The basic building block of knowledge about the universe is called granule. Creating granules from the objects of the universe by classification is called knowledge granulation and processing of these granules in order to find the knowledge about of the universe is termed as granular computing. It is observed that classification of the universe can be done on the basis of indiscernibility relation among the objects. Granular computing as proposed by Pawlak[1],Louie[2] and Zadeh[3] is a tool to identify the associations between the attribute values, that are indiscernible. However, if we consider the real life situation, the attribute values are almost indiscernible. In order to model the real life situation, the fundamental concept of classical sets has been extended in various directions. One of the approach in the direction was introduced by Fayyad et al. (1996), who developed and illustrated the knowledge discovery in database(KDD) and identifies some useful and understandable pattern in data, but if we take into account the factors affecting KDD, its complexity increases. Amit Singhal(2001) and Donaovan(2003) also provided some classification to classify dataset that are crisp in nature. Zadeh's (1965) introduces fuzzy set theory concepts which were applied to knowledge discovery database, the concept was further extended to L fuzzy set by Goguen (1967); intuitionistic fuzzy set by Atanasov(1986);twofold fuzzy set by Dubosis et al.(1987) to name a few. But all of these methods lack uniqueness in choosing the membership and non membership function. Rough set introduced by Pawlak(1991) is a tool that depends upon the notion of equivalence relation defined over a universe. This concept is further extended to rough set on fuzzy approximation space which depends upon fuzzy proximity relation as discussed by D.P. Acharjya & Tripathy (2008). Rough set on fuzzy approximation space is an intelligent tool that finds out the significance of attributes in the given data set using the member function.

Here, in this paper we use rough set on fuzzy approximation space, deals with data that are almost indiscernible and use granular computing approach to find out association between the attribute values that are almost indiscernible rather than being indiscernible.

2. ROUGH SET

The classical set i.e. crisp set has been studied and extended in many directions to model real life situations. The notion of fuzzy set studied by Zadeh (1965), its generalizations and the notion of rough sets was studied by Pawlak and Skowron (2007) were the major research in this field. The rough set philosophy is based on the concept that there is some information associated with each object of the universe. So there is need to classify objects of the universe based on the indiscernibility relation among them, as there are various objects in the universe that are similar to each other. Rough set is a mathematical tool that is used to classify the objects of the universe based on the indiscernibility relation among the objects of the universe. The basic idea of rough set is based upon the approximation of sets by pair of sets known as lower approximation and upper approximation with respect to some imprecise information. In this section, we will study the basic concepts, definitions and different notations that will be used in the rest of the paper.

Let U ï‚¹ï¦ï€ be a set of objects called the universe, and R be an equivalence relation over U. Then U/R we denote the family of equivalence classes of R (or classification of U) referred to as categories or concepts of R and [x]R denotes a category in R containing an element xU . By a knowledge base, we understand a relational system K ï€½ï€ (U,R) , where U is as above and R is a family of equivalence relations over U. For any subset P ïƒï€ R and P ï‚¹ï¦ï€ , IND(P) is the intersection of all equivalence relations in P. IND(P) is called the indiscernibility relation over P. The equivalence classes of IND(P) are called P-basic knowledge about U in K. For any QR , every equivalence class of Q is called Q-elementary concepts of knowledge R.

The family of all P-basic categories P ï‚¹ï¦ï€ , P ïƒï€ R will be called the family of basic categories in knowledge base K ï€½ï€ (U,R) . By IND(K) we denote the family of all equivalence relations defined in K; equivalently,

IND(K) ï€½ï€ {IND(P) : P ïƒï€ R,P ï‚¹ï¦ï€ }

For any X U and an equivalence relation RIND(K) , we associate two subsets X and RX called the R-upper and R-lower approximations of X respectively, which are given by:

RX ï€½ï€ U{Y U /R :Y ïƒï€ X} and

X ï€½ï€ U{Y U/ R :Y I X }

The R-boundary of X is denoted by BNR(X) and is given as BNR(X ) ï€½ï€ X ï€­ï€ RX . We say X is rough with respect to R if and only if X ï‚¹ï€ RX or equivalently BNR(X) ï‚¹ï¦ï€ . If BNR(X)ï€½ï¦ï€ or X ï€½ï€ RX then the target set X is a crisp set. It indicates that a rough set is more generalized then crisp set.

3. ROUGH SETS ON FUZZY APPROXIMATION SPACE

Let U be a universe. The elements of the universe may have crisp or fuzzy relations among them, based on the nature of the dataset. In this section, we present the definitions, notations and results on fuzzy approximation space and rough set on fuzzy approximation space. We will refer to these concepts in later section of the paper and it will be the base of our further discussion.

Definition 3.1: Let U be a universe. We define a fuzzy relation on U as a fuzzy relation on U as a fuzzy subset of (U Ã- U).

Definition 3.2: A fuzzy relation R on U is said to be a fuzzy proximity relation if

µR (x, x) = 1 for all x U and (3.1)

µR (x, y) =µR (y, x) for x, y U (3.2)

Definition 3.3: Let R is a fuzzy proximity relation on U. Then for a given α  [0, 1], we say that two elements x and y are α-similar with respect to R if (x, y)  Ra and we write xRay.

Definition 3.4: Let R is a fuzzy proximity relation on U. Then for a given α  [0, 1], we say that two elements x and y are α-identical with respect to R if either x is α-similar to y or x is transitively α-similar to y with respect to R, i.e., there exists a sequence of elements u1,u2,u3,........un in U such that xRau1, u1Rau2, u2Rau3,........... unRay.

If x and y are α-identical with respect to fuzzy proximity relation R, then we write xR(α)y, where the relation R(α) for each fixed α  [0,1] is an equivalence relation on U.

Definition 3.5: The pair (U, R) is called a fuzzy approximation space. For any α  [0, 1], we denote by R*α,the set of all equivalence classes of R(α). Also, we call (U, R(α)), the generated approximation space associated with R and α.

Definition 3.6: Let (U, R) be a fuzzy approximation space and let X  U. Then the rough set of X in (U, R (α)) is denoted by (Xα, α) where Xα is the α-lower approximation of X whereas α is the α-upper approximation of X. We define Xα and α as

Xα =ï€ ïƒˆ{Y: Y R*α and Y  X} and (3.3)

α= {Y: Y R*α and Y ∩ X ≠} (3.4)

Definition 3.7: Let X ïƒï€ U. Then X is said to be α-discernible if and only if Xα=α and X is said to be α-rough if and only if Xα≠α.

Many properties of α-lower and α-upper approximations have been studied by De [4].

4. ORDERED INFORMATION SYSTEM

Let I= (U, A, {Va: aA}, {fa: a A}) be an information system, where U is finite non-empty set of objects called the universe and A is a non empty finite set of attributes. For every aA, Va is the set of values that attribute may take and fa: U→ Va is an information function. In practical applications object can be cases, companies, institutions, processes and observations. Attributes can be interpreted as features, variables, and characteristics. A special case of information systems called information table or attribute value table where the columns are labelled by attributes and rows are by objects. For example: The information table assigns a value a(x) from Va to each attribute and object in the universe U. With any PA there is an associated equivalence relation such that

IND(P)={(x,y) U2 |aP,a(x) = a(y)} (4.1)

The relation is called a P-indiscernibility relation. The partition of U is a family of all equivalence classes of IND(P) and is denoted by U/IND(P) or U/P .If (x,y) IND(P), then x and y are indiscernible by attributes from P. For example, consider the information table 1.In the given table 1, we have U= {p1, p2, p3, p4, p5, p6, p7, p8}, A= {Company, Model, Price}, and VCompany={Nokia,Samsung,Blackberry,Micromax,LG,Motorola}. Similarly, VModel ={E7,Wave,N8,Torch,W900,Renoir,Quench,Metro}, and VPrice= {29000, 7000, 23174, 12575, 5990}. Knowledge representation in Rough Set Data Analysis is done via information systems, which are a form of data table. It provides the available information about the object under consideration in the data set. In information system objects are perceived and studied using their properties. At the same time, it does not consider any semantic relationships between distinct values of a particular attribute [1]. Different values of the same attribute are considered as distinct symbols without any connections, and therefore on simple pattern matching we consider horizontal analyses to a large extent. Hence, in general one uses the trivial equality relation on values of an attribute as discussed in standard rough set theory [2].

Table 1. Information Table

Cell Phone Company Model Price

P1 Nokia E7 29000

P2 Samsung Wave 7000

P3 Nokia N8 23174

P4 Blackberry Torch 29000

P5 Micromax W900 7000

P6 LG Renoir 29000

P7 Motorola Quench 12575

P8 Samsung Metro 5990

However, in real life situation, the attribute values are almost indiscernible rather than being discernible. Finding association between such attributes is a tricky job. For example, if objects are patients suffering from certain disease, symptoms of the disease form information about patients. These symptoms are almost identical rather full identical. In information table some semantics can be added to make it more generalized. For the problem of knowledge mining, we introduce order relations on attribute values [9].

In this paper we use rough sets on fuzzy approximation space to find the attribute values that are -identical before introducing the order relation. This is because exact ordering is not possible when the attribute values are almost identical. For, the almost indiscernibility relation, reduces to the indiscernibility relation. Therefore, it generalizes the Pawlak's indiscernibility relation. An ordered information table (OIT) is defined as :

OIT = {IT, {a: a A}} (4.2)

Where, IT is a standard information table and a is an order relation on attribute a. An ordering of values of a particular attribute a naturally induces an ordering of objects:

X (a) y fa (x) a fa (y) (4.3)

Where, (a) denotes an order relation on U induced by the attribute a. An object x is ranked ahead of object y if and only if the value of x on the attribute is ranked ahead of the value of y on the attribute a. For example, a sample ordered information table of eight companies with three attributes {IC, IF, PP} is shown in table 2, where attribute IC represents Intellectual Capital; IF represents Infrastructure Facility and PP represents Placement Performance.

Table 2. Ordered Information table

Institutions IC IF PP

I1 High Very High Good

I2 High Very High Good

I3 Average Very High Good

I4 Low High Average

I5 Low Very High Good

I6 Average Very High Good

I7 Low High Average

I8 Low High Average

IC: High Average Low

IF: Very High High

PP: Good Average

For a subset of attributes P A, we define:

x p y fa (x) a fa (y) aP

fa (x) a fa (y)

a P

{a} (4.4)

a P

It indicates that x is ranked ahead of y if and only if x is ranked ahead of y according to all attributes in P. The above definition is a straightforward generalization of the standard definition of equivalence relations in rough set theory [10, 11], where the equality relation used. Knowledge mining based on order relations is a concrete example of applications on generalized rough set model with non equivalence relations [6, 8].

5. PROPOSED KNOWLEDGE GRANULATION MODEL

In this section, we propose our knowledge mining model that consists of problem definition, target data, pre-processed data, processed data, and granular computing view, ordered information table, knowledge association rules and knowledge granulation as shown in Figure 1.

Figure 1. Proposed knowledge Granulation model

The fundamental steps of identifying the right problem is about clearly knowing the problem definition and incorporating the prior knowledge associated to it. Extracting knowledge from the universe and identifying the desired knowledge out of the voluminous data is highly problematic. Granular computing approach can be applied to identify the association between the attributes values that are indiscernible, but in the case of almost indiscernible attributes values, it is very difficult to identify the association. So, here in the proposed model we apply data cleaning process on the target data to obtain pre-processed data, which is further applied by fuzzy proximity relation to get processed data. Rough set on fuzzy approximation space is used as a tool on the processed data and we get a classified data, which is the granular computing view of the system. The resultant data is further processed by ordering rules to obtain the ordered information system. Thus we get the ordered information of the data that was almost indiscernible. Now we can apply the granular computing tool to the ordered information system and can identify knowledge association rules. Finally, we extract the knowledge; classify the pattern in the data that was almost indiscernible.

6. CASE STUDY

In this section, we demonstrate in real life problem how we can apply the above concepts for extracting information. We consider the example in which we study the different cosmetic company's business strategies in a country. In the table 1 given below, we consider few parameters for business strategies to get maximum sales; their possible range of values and a fuzzy proximity relation which characterized the relationship between parameters. The companies with more expenditure in marketing, more expenditure in advertisement, more expenditure in distribution, more expenditure in miscellaneous, and more expenditure on research and development is being an ideal case. But such type of cases is rare in practice. So, a company may not excel in all the parameters in order to get top position. However, out of these, some parameters may have greater influence on the scoring than the others. For different values of α parameters may be different. In fact, if we decrement the value of α more and more number of parameters shall become indispensable.

The membership function has been adjusted such that its value should lie in [0, 1] and also the function must be symmetric. The companies can be judged by the sales outputs that are produced. The amount of sales can be judged by the different parameters of the companies. These parameters forms the attribute set for our analysis. Here the marketing expenditure means, all expenditure incurred for corporate promotion, which includes event marketing, sales promotion, direct marketing etc. which comes to around 6%. The advertising expenditure includes promotional activities using various medium like television, newspaper, internet etc which comes around 36%. The miscellaneous expenditure is mainly incurred through activities like corporate social responsibility and it leads to maximum of 28%. The distribution cost includes expenses on logistic, supply chain etc. and it comes around 24%. The investment made on new product development and other research activities are taken on research and development activities and it takes around 6%. However we have not considered many other parameters that do not influence the sales of a company and to make our analysis simple. The average of the data collected is considered to be the representative figure and tabulated below. The notations and abbreviations used in the following analysis are presented in table 3.

Table 3. Notations and abbreviations

Parameter

Attribute

Possible Range

Membership Function

Expenditure on marketing

Mkt.

1-150

1-|x-y|/2(x+y)

Expenditure on advertisement

Advt.

1-900

1-|x-y|/2(x+y)

Expenditure on distribution

Dist.

1-600

1-|x-y|/2(x+y)

Expenditure on miscellaneous

Misc.

1-700

1-|x-y|/2(x+y)

Expenditure on research and development

R&D

1-150

1-|x-y|/2(x+y)

In the following table 4 we present the data obtained from ten different companies. However, we keep confidential their identity due to various official reasons. Here, we use the notation, Ci , i=1,2,3,....,10 for different companies for the purpose of the study to demonstrate the method and not to probe the performance of individual company. It is to be noted that, in the information table all non-ratio figures shown in the table are ten million INR.

Table 4. Information table

Company

Mkt.

Advt.

Dist.

Misc.

R&D

C1

18.276

162.236

30.236

72.146

9.156

C2

2.076

5.393

6.793

8.290

0.383

C3

0.496

1.330

0.433

2.733

0.393

C4

0.940

0.060

0.666

5.890

1.243

C5

27.333

38.660

16.496

24.343

1.523

C6

7.033

866.916

508.676

637.530

38.963

C7

4.323

4.173

1.753

3.176

0.003

C8

38.516

40.046

3.126

8.026

0.056

C9

0.466

0.460

0.993

3.803

0.053

C10

0.603

0.036

0.393

0.613

0.016

Now, in order to minimize the computation, we have combined both fuzzy proximity relations. We have designed five relations based on the attributes and computed the similarity between them. The fuzzy proximity relation R1 corresponding to attribute 'Mkt.' is given below in table 5.

Table 5. Fuzzy proximity relation for attribute Mkt.

R1

C1

C2

C3

C4

C5

C6

C7

C8

C9

C10

C1

1.000

0.602

0.526

0.549

0.901

0.778

0.691

0.822

0.525

0.532

C2

0.602

1.000

0.693

0.812

0.571

0.728

0.824

0.551

0.683

0.725

C3

0.526

0.693

1.000

0.845

0.518

0.566

0.603

0.513

0.984

0.951

C4

0.549

0.812

0.845

1.000

0.533

0.618

0.679

0.524

0.831

0.891

C5

0.901

0.571

0.518

0.533

1.000

0.705

0.637

0.915

0.517

0.522

C6

0.778

0.728

0.566

0.618

0.705

1.000

0.881

0.654

0.562

0.579

C7

0.691

0.824

0.603

0.679

0.637

0.881

1.000

0.601

0.597

0.622

C8

0.822

0.551

0.513

0.524

0.915

0.654

0.601

1.000

0.512

0.515

C9

0.525

0.683

0.984

0.831

0.517

0.562

0.597

0.512

1.000

0.936

C10

0.532

0.725

0.951

0.891

0.522

0.579

0.622

0.515

0.936

1.000

The fuzzy proximity relation R2 corresponding to attribute 'Advt.' is given below in table 6.

Table 6. Fuzzy proximity relation for attribute Advt.

R2

C1

C2

C3

C4

C5

C6

C7

C8

C9

C10

C1

1.000

0.532

0.508

0.500

0.692

0.658

0.525

0.698

0.503

0.500

C2

0.532

1.000

0.698

0.511

0.622

0.506

0.936

0.619

0.579

0.507

C3

0.508

0.698

1.000

0.543

0.533

0.502

0.742

0.532

0.757

0.526

C4

0.500

0.511

0.543

1.000

0.502

0.500

0.514

0.501

0.615

0.875

C5

0.692

0.622

0.533

0.502

1.000

0.543

0.597

0.991

0.512

0.501

C6

0.658

0.506

0.502

0.500

0.543

1.000

0.505

0.544

0.501

0.500

C7

0.525

0.936

0.742

0.514

0.597

0.505

1.000

0.594

0.599

0.509

C8

0.698

0.619

0.532

0.501

0.991

0.544

0.594

1.000

0.511

0.501

C9

0.503

0.579

0.757

0.615

0.512

0.501

0.599

0.511

1.000

0.573

C10

0.500

0.507

0.526

0.875

0.501

0.500

0.509

0.501

0.573

1.000

The fuzzy proximity relation R3 corresponding to attribute 'Dist.' is given below in table 7.

Table 7. Fuzzy proximity relation for attribute Dist.

R3

C1

C2

C3

C4

C5

C6

C7

C8

C9

C10

C1

1.000

0.683

0.514

0.522

0.853

0.556

0.555

0.594

0.532

0.513

C2

0.683

1.000

0.560

0.589

0.792

0.513

0.705

0.815

0.628

0.555

C3

0.514

0.560

1.000

0.894

0.526

0.501

0.698

0.622

0.804

0.976

C4

0.522

0.589

0.894

1.000

0.539

0.501

0.775

0.676

0.901

0.871

C5

0.853

0.792

0.526

0.539

1.000

0.531

0.596

0.660

0.557

0.523

C6

0.556

0.513

0.501

0.501

0.531

1.000

0.503

0.506

0.502

0.501

C7

0.555

0.705

0.698

0.775

0.596

0.503

1.000

0.859

0.862

0.683

C8

0.594

0.815

0.622

0.676

0.660

0.506

0.859

1.000

0.741

0.612

C9

0.532

0.628

0.804

0.901

0.557

0.502

0.862

0.741

1.000

0.784

C10

0.513

0.555

0.976

0.871

0.523

0.501

0.683

0.612

0.784

1.000

The fuzzy proximity relation R4 corresponding to attribute 'Misc.' is given below in table 8.

Table 8. Fuzzy proximity relation for attribute Misc.

R4

C1

C2

C3

C4

C5

C6

C7

C8

C9

C10

C1

1.000

0.603

0.536

0.575

0.752

0.602

0.542

0.600

0.550

0.508

C2

0.603

1.000

0.748

0.915

0.754

0.513

0.777

0.992

0.814

0.569

C3

0.536

0.748

1.000

0.817

0.601

0.504

0.963

0.754

0.918

0.683

C4

0.575

0.915

0.817

1.000

0.695

0.509

0.850

0.923

0.892

0.594

C5

0.752

0.754

0.601

0.695

1.000

0.537

0.615

0.748

0.635

0.525

C6

0.602

0.513

0.504

0.509

0.537

1.000

0.505

0.512

0.506

0.501

C7

0.542

0.777

0.963

0.850

0.615

0.505

1.000

0.784

0.955

0.662

C8

0.600

0.992

0.754

0.923

0.748

0.512

0.784

1.000

0.821

0.571

C9

0.550

0.814

0.918

0.892

0.635

0.506

0.955

0.821

1.000

0.639

C10

0.508

0.569

0.683

0.594

0.525

0.501

0.662

0.571

0.639

1.000

The fuzzy proximity relation R5 corresponding to attribute 'R&D.' is given below in table 9.

Table 9. Fuzzy proximity relation for attribute R&D

R5

C1

C2

C3

C4

C5

C6

C7

C8

C9

C10

C1

1.000

0.540

0.541

0.620

0.643

0.690

0.500

0.506

0.506

0.502

C2

0.540

1.000

0.994

0.736

0.701

0.510

0.508

0.628

0.622

0.540

C3

0.541

0.994

1.000

0.740

0.705

0.510

0.508

0.625

0.619

0.539

C4

0.620

0.736

0.740

1.000

0.949

0.531

0.502

0.543

0.541

0.513

C5

0.643

0.701

0.705

0.949

1.000

0.538

0.502

0.535

0.534

0.510

C6

0.690

0.510

0.510

0.531

0.538

1.000

0.500

0.501

0.501

0.500

C7

0.500

0.508

0.508

0.502

0.502

0.500

1.000

0.551

0.554

0.658

C8

0.506

0.628

0.625

0.543

0.535

0.501

0.551

1.000

0.986

0.722

C9

0.506

0.622

0.619

0.541

0.534

0.501

0.554

0.986

1.000

0.732

C10

0.502

0.540

0.539

0.513

0.510

0.500

0.658

0.722

0.732

1.000

Analysis

Now, we derive the degree of the dependency for the membership value α ≥ 0.85. The different equivalence classes corresponding to the attributes A = {Mkt., Advt., Dist., Misc., R&D} are given below.

U/R1α={{C1,C5,C8},{C6,C7},{C2},{C3,C4,C9,C10}}

U/R2α={{C1},{C2,C7},{C3},{C4,C10},{C5,C8},{C6},{C9}}

U/R3α={{C1,C5},{C2},{C3,C4,C7,C8,C9,C10},{C6}}

U/R4α={{C1},{C2,C3,C4,C7,C8,C9},{C5},{C6},{C10}}

U/R5α={{C1},{C2,C3},{C4,C5},{C6},{C7},{C8,C9},{C10}}

Therefore according to the attribute Expenditure on marketing C1, C5 and C8 are α -identical; C6 and C7 are α -identical; C3, C4, C9 and C10 are α -identical; C2 is not identical to any company for α ≥ 0.85. Therefore, the values of the attribute Expenditure on marketing are classified into four categories namely very low, low, average, and high and hence can be ordered. The values of the attribute Expenditure on advertisement are classified into seven categories namely poor, very low, low, average, high, very high and outstanding. The values of the attribute Expenditure on distribution are classified into four categories namely low, average, high and very high. The values of the attribute Expenditure on miscellaneous are classified into five categories namely very low, low, average, high and very high and hence can be ordered. Similarly, the values of the attribute Expenditure on research and development are classified into seven categories namely poor, very low, low, average, high, very high and outstanding. Therefore, the ordered information table 10 of the of business strategies of different cosmetic companies table 4 is given below.

Table 10. Ordered information table of business strategies of different cosmetic companies

Company

Mkt.

Advt.

Dist.

Misc.

R&D

C1

High

Very High

High

High

Very High

C2

Low

Average

Average

Low

Average

C3

Very Low

Low

Low

Low

Average

C4

Very Low

Poor

Low

Low

High

C5

High

High

High

Average

High

C6

Average

Outstanding

Very High

Very High

Outstanding

C7

Average

Average

Low

Low

Poor

C8

High

High

Low

Low

Low

C9

Very Low

Very Low

Low

Low

Low

C10

Very Low

Poor

Low

Very Low

Very Low

Mkt. : High Average Low Very Low

Advt. : Outstanding Very High High Average Low Very Low Poor

Dist. : Very High High Average Low

Misc. : Very High High Average Low Very Low

R&D : Outstanding Very High High Average Low Very Low Poor

7. MINING ASSOCIATION RULES USING GRANULAR COMPUTING:

In this section, we show how granular computing is used in finding the association rules.

In previous section, an ordered information table has been prepared for the indiscernible attribute values data set. Now granular computing approach can be applied to the given data set i.e. having ordered data. In granular computing, a set of attribute values is called an association rule if they satisfy certain criteria. If a set of attribute values is an association rule, all the attribute values in the set are associated with one another. To find association rules we find granules based on each attribute as shown in the Table 11.

To check whether a set of attribute values is an association rule or not, we perform the AND operation among the bit representation of the attribute values of this set. If number of 1's in the result of AND operation is greater than or equal to minimum support, then it is an association rule, otherwise it is not an association rule.

Table 11: Granules based on individual attributes

Granules Based on

Attribute Values

Granules as List

Granules as Bits

Expenditure on marketing

High

{C1,C5,C8}

1000100100

Average

{C6,C7}

0000011000

Low

{C2}

0100000000

Very Low

{C3,C4,C9,C10}

0011000011

Expenditure on advertisement

Outstanding

{C6}

0000010000

Very High

{C1}

1000000000

High

{C5,C8}

0000100100

Average

{C2,C7}

0100001000

Low

{C3}

0010000000

Very Low

{C9}

0000000010

Poor

{C4,C10}

0001000001

Expenditure on distribution

Very High

{C6}

0000010000

High

{C1,C5}

1000100000

Average

{C2}

0100000000

Low

{C3,C4,C7,C8,C9,C10}

0011001111

Expenditure on miscellaneous

Very High

{C6}

0000010000

High

{C1}

1000000000

Average

{C5}

0000100000

Low

{C2,C3,C4,C7,C8,C9}

0111001110

Very Low

{C10}

0000000001

Expenditure on research and development

Outstanding

{C6}

0000010000

Very High

{C1}

1000000000

High

{C4,C5}

0001100000

Average

{C2,C3}

0110000000

Low

{C8,C9}

0000000110

Very Low

{C10}

0000000001

Poor

{C7}

0000001000

For example, let us assume that the minimum support is 2. To check whether {MktHigh, AdvtHigh} is an association rule, we perform AND operation between bit representation of the attribute values MktHigh and AdvtHigh as shown in the Table 12. Since the number of 1's in the result of AND operation is greater than or equal to the minimum support 2, thus {MktHigh, AdvtHigh } is an association rule. Let us consider the attribute values {MktAvg,, AdvtOutstanding, R&DVery High }. The AND operation among bit representation of attribute values MktAvg, AdvtOutstanding, R&DVery High is shown in the Table 11. Since number of 1's in the result of AND operation is 1 which is less than minimum support 2, {MktAvg, AdvtOutstanding, R&DVery High } is not an association rule.

Table 12: AND Operation between two attribute values

Attribute values

Granules

MktHigh

AdvtHigh

1000100100

0000100100

AND

0000100100

Table 13: AND Operation between two attribute values

Attribute values

Granules

MktAvg,

AdvtOutstanding

R&DLow

0000011000

0000010000

0000000110

AND

0000000000

From the analysis {MktHigh, AdvtHigh} is an association rule with bit representation of MktHigh and AdvtHigh as 1000100100 and 0000100100 respectively. The bit representation of the AND operation between them is 0000100100. It means that, the attribute values MktHigh and AdvtHigh are present in the two objects of the company. i.e. {C5, C8}. Also, {MktHigh, AdvtHigh} is an association rule of length two since it consists of two attribute values of two different attributes. First we find all possible association rules of length two, and then all possible association rules of length three and so on. For better understanding the AND operation between different attribute values is presented in Table 12.

Table 14: AND Operation among different attribute values

Set of Attribute Values

AND Operation among Attribute Values

MktVery Low, DistLow, MiscLow

0011000010

AdvtAvg,, MiscVery High, R&DOutstanding

0000000000

DistLow, MiscLow, R&DLow

0000000110

MiscHigh, AdvtVery Low, DistAvg

0000000000

MktVery Low, AdvPoor, DistLow

0001000001

MktHigh, DistAvg, R&DVery Low

0000000000

If we consider attribute values { MktVery Low, DistLow, MiscLow }, AND operation among these attribute values is 0011000010 provides an association rule as shown in Table 14 with minimum support 2.Using the same manner we can identify all the association rules existing between the different attributes values of any length.

8. CONCLUSION

The discovery of the knowledge about the universe requires classification of the objects of the universe based on the indiscernibility relation. Mining association rules using granular computing approach cannot be applied directly to find the association between the attributes values that are almost indiscernible in nature. So, in this paper we study and use rough set on fuzzy approximation space, ordering rules to deal with almost indiscernibility and finally we use granular computing tool to find out the associations between almost indiscernible attribute values. We have taken a real life example of expenditure of 10 cosmetic companies according to different attributes. We have shown how analysis can be performed by taking rough set on fuzzy approximation space, ordering of objects and granular computing as a model for mining knowledge.