Adopted Comparative Approach An Overview Accounting Essay

Category: Accounting

ABSTRACT

To withstand different challenges created by the World Wide Web in various fields like engineering, medicine, finance, production, marketing and so on; opinion mining plays its importance. Apart from different tasks in opinion mining, feature-based opinion mining is taken into account. The polarities and the missing values of different features of the similar services provided by two online stores are adopted. The final goal is to present a statistical report on the polarities of the services (otherwise named as features) provided by targeting on a hypothesis: BestBuyPcs.com is better than IBuyplasma.com. This report is used to find the following: (1) whether both the online-stores provide similar quality of services to its customers or else (2) the services offered by one online-store is better than that of its rival in terms of polarity. Different ideas are adopted step-by-step to identify which online-store offers a better service to its customers. Missing values are considered and resolved using multiple imputations.

Finally from the statistical report, a clear conclusion that which online-store offers a better service than the other to its customers is drawn.

Conference'2011, Month 1-2, 2011, Manchester, State, Country.

Copyright 2011 ACM 1-58113-000-0/00/0004…$5.00.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.Keywords: Feature-based opinion mining, Feature polarity, Statistical analysis, Missing value analysis, Multiple imputations.

1. INTRODUCTION

At present, the World Wide Web has become an important place where up-to-date customers' views about the products or services are found in abundance in the form of forums or blogs. Naturally a customer's review or blog about a particular service or a product directly or indirectly either boosts the product's sales or not. Hence the internet platforms or else called as Web2.0 (Hu and

Liu, 1998; Bodendorf and Kaiser, 2010) is widely welcomed by the companies for their marketing purposes. Usually these reviews, blogs or forums are in the form of text or unstructured data. This text can be mainly divided into facts and opinions. Here, only the opinions are considered. On the basis of polarity, a general opinion can be divided into three parts - negative, neutral and positive.

To illustrate about an opinion, we considered the following customer review about a camera: "This camera's lens is great. The picture quality is good. But its battery life is short." This particular forum consists of positive sentiments like 'great', 'good' and the negative sentiment 'short'. To extract, mine, evaluate, analyse and to summarize these opinions, opinion mining or otherwise named as sentiment analysis (Ding et al., 2008) is introduced. Likewise, opinion mining has started to play a vital role in various profitable and competitive technological areas of marketing like the automotives, aviation, movies, airline industries, online stores etc.

1.1. Adopted Comparative Approach- An Overview

When a customer review is specified, generally the text entered by the customer comes into the mind. But, when seen visually, a customer forum or review may consist of the text as well as star ratings for the specific features of a particular product or the service provided. Based on the star-rated features of the service provided by two online-stores, a comparative study is made. The following steps illustrate the method adopted for the comparative study:

Two online-stores having equal star-ratings are used.

Consider the star-rated features of the service provided by the stores. Here, the services provided are referred as the features for both the stores.

The polarity of the features is differentiated as negative, neutral and positive.

The dataset is formed by manually entering the values for the star-rated features of two online stores.

Two sets of datasets are made. One with missing values and another with non- missing values.

The missing values are filled in automatically by numbers that are generated randomly by the software (SPSS) employed.

A statistical comparative study is made to find out the merits and demerits of the services offered by the stores using SPSS [1] software.

From the outcome, a conclusion that whether both the online-stores are providing equal service to its customers or else one store offers a better service than the other is been drawn.

There is no similar report for this task. But, the idea of "comparative study on customer opinions" is adopted from the papers by Bodendorf and Kaiser (2010), Liu et al (1998), Liu et al., and Liu (2010). A comparative statistical illustration is presented using SPSS software. A general fact: "When there are missing values, a precise result cannot be expected or produced". Hence, in the experiment carried out for this study, missing values are given importance and a precise outcome is generated by employing multiple imputations procedure.

1.2. Outline of the Upcoming Discussions

The problem statement section discusses on how opinion mining came into light and the different techniques involved in feature-based opinion mining. In the literature review, the works that are closely related to the selected task - feature based opinion mining are discussed. The methodology adopted for this study is discussed in section 3. Here, the general steps that are to be carried out and tested are presented diagrammatically and are explained in detail. Section 4 explains how the proposed experiment is been carried out and the final statistical output is also presented. The conclusions that are made and the works that can be carried out in the future are said in section 5.

2. PROBLEM STATEMENT

As mentioned in the introduction, the main concept for this study is to make a comparative test based on the blogs or the reviews mentioned by the customers. The test is to find out whether both the online-stores are offering same quality of services or else one store's services are better than its rival. Apart from the general formats of the reviews that are classified by Liu et al., the star-rated features of the customer reviews are used here and are illustrated in figure-2.1. The reviews that are used by Liu et al. are free-form text where the opinion sentence can contain both negative and positive opinions, Pros and Cons where Pros specifies the advantages or positive opinions and Cons means negative opinions and Detailed view with Pros and Cons. In this case, the customers can write the Pros and Cons including detailed view of the product. Whichever customer review format is chosen, the chosen review format has to be processed before a final decision is made.

Figure-2.1. Adopted Customer review format.

Though opinions can be processed in different ways using rule-based approaches (Jin et al., 2009), machine-language approaches and so on, statistical approaches (Zhuang et al., 2006) are adopted for this comparative study. To engage statistical approaches, a customer's opinion has to be processed using some of the different available tasks. Of them, sentiment classification and opinion summarisation are given importance. Section 2.1 describes the related works. As the topic focuses mainly on the features' polarity, the works specify on how the customer reviews are classified based on the sentiments or polarity i.e. to identify whether the opinion is positive, negative or neutral. Comparative study techniques are discussed to explain opinion summarisation.

2.2. Related Works

The main aim of sentiment classification is to find out the semantic orientation and classify the opinion phrase. To make the classification more accurate, some used subjectivity detectors to remove the objective sentences and Whitelaw et al., (2005) used appraisal theory. In Natural Language processing method, the sentiment classification was a two step process. In one step, the clued phrases were marked as polar and then in the next step these polar phrases were classified as positive, negative or neutral. Lee et al., (2008) specified that most methods use statistical approach for sentiment classification, some use linguistic resources and some use star ratings for feature sentiment classification. We followed the features' star-ratings. Similarly, the main goal of this study is to find the features' sentiment classification.

Bodendorf and Kaiser (2010) discussed about the feature-based opinion summarisation task. Here, support vector machines are used to train the data set. Their main target was to make a comparative study between an automotive industry's products' merits and demerits with that of its competing rival products. Here, the association between the product's features was also analysed. Similarly as per Liu et al (1998), the main target was to present a visualised view to the customers, the strength and weaknesses of two digital cameras. According to Bodendorf and Kaiser (2010), the opinion summarisation task was done in four steps: selection, extraction, aggregation and analysis. Normally for evaluation, polarity and intensity are used. In the analysis step, the association between the product features were analysed. Finally a conclusion was drawn by a pictorial representation to highlight the strength and the weakness of the product features which will help the automotive industry to improve their product quality. From the above footsteps, we employed the association step and in the conclusion stage, the positives and the negatives are indicated.

A visual comparison on the product features of two different cameras was given by Liu et al., and Liu (2010). For this comparative study, a new analysis technique called 'Opinion Observer' was introduced. Opinion Observer was used to analyse the consumer reviews and visually presented the analysed results to the consumers. Here, the analysis can be either on a single product or two different products. Opinion Observer helped both the customers and the manufacturers to visualise the advantages and disadvantages of the product feature side by side. Instead of using Opinion Observer, SPSS software is employed here to represent a visualized view on the merits and demerits about the services of two online-stores.

3. METHODOLOGY

Mining a customer's opinion is not a very easy process. Since, it is a wide field and different tasks are involved in opinion mining, in this paper we finalised to focus only on one task: feature-based opinion summarisation.

By analysing different websites like CNet.com, Epinions.com and the other works related to opinion mining, it is clear that with an unlimited growth in the manufacturing field, new and more advanced products are introduced every day. This led to the availability of the product reviews by the customers in abundance. Generally, these reviews are related to the product features. To mine and to summarise the opinions made by the consumers, feature-based opinion summarisation task is employed here. The experiment conducted on the basis of this opinion mining task is Feature-Based Comparative Study using consumer reviews. The design of the technique engaged to implement the comparative study is shown in figure 3.1.

Referring to figure-3.1, in step-1, the features that have already been rated by the customers from two websites (BesbuyPcs.com [2] and IBuyPlasma.com [3] ) are extracted. There is a possibility that some users may have ignored or failed to rate some features. Such values are treated as system missing values. This step is repeated again by automatically assigning

2. Assign the features' polarity (Table 3.1) and form the dataset

3. Experimental Evaluation

1. Read the rated features (figure 2.1) from Customer reviews

IBuyPlasma.com

BestBuyPcs.com

Figure-3.1.Comparative Study methodology

values by the SPSS software using multiple imputations. On the basis of the customers' opinions about the features of both the online-stores are assigned with their respective polarities in step-2. As this study is based on five star-ratings, features having one or two stars are given negative polarity, three stars are assigned as neutral and star four and five are assumed as positive polarities. Again, in this step, the variables are declared and the dataset is formed by reading certain number of records or the customer reviews. The variable declarations for the two online-stores are shown in table-3.1. The last step includes two sub-steps: implementation and testing and finally the results are evaluated and analysed.

Features

Variables for

BestBuyPcs.com

Variables for

IBuyPlasma.com

Ease of Ordering

Best_Ease

IBuy_Ease

Customer service

Best_Service

IBuy_Service

Selection

Best_Selection

IBuy_Selection

On-Time Delivery

Best_Delivery

IBuy_Delivery

Table-3.1. Variable declaration

4. EXPERIMENTAL EVALUATION

This section explains the different stages on how the experiment is conducted and evaluated. The stages involve the system setup including the variable selection and the methods undertaken for the experimental implementation. Then the outcomes from different procedures of the experiment are illustrated and discussed.

4.1. Experimental Setup

With a goal to achieve the desired hypothesis, certain methods are undertaken to implement the experiment. The areas where the methods are engaged include online-stores selection, the dataset, impact of the missing values and procedure selection for experiment testing.

Online store selection: Here two online stores having similar overall ratings are selected. From an overall view on the customers' reviews, the online-store 'BestBuyPcs.com' seemed to have less number of missing values than the other store. To prove this and to perform the specified hypothesis, a comparative study based on the feature ratings is made. Figures-4.1 and 4.2 show that both the stores are given equal overall rating based on its features.

The dataset: Here, SPSS software is used to form the dataset. This dataset contains nine variables which includes the customers and the features of the two online stores that are specified in table-3.1.

Figure-4.1. the overall rating of the online store- BestBuyPcs.com

Figure-4.2. the overall rating of the online store- IBuyPlasma.com

The dataset consists of the variable view and the data view. In the data view, the variables' polarities are assigned as ordinal values. Also the customers or else the online-stores features can be grouped into categories. Henceforth, they are assigned as the categorical variables. For easy software implementation, the variables have to be differentiated as independent and dependent variables. Since this paper focuses only on the features' polarity, the customer is assigned as independent variable and the features are specified as dependent variables.

Impact of the Missing values: In this part of the study, a clear decision is made with an aim to produce an accurate and a precise result from the experiment. As an impact of the just said statement, the experiment is carried out twice. In stage one; only accurate results are produced with the presence of missing values. Then using Multiple Imputations procedure of the SPSS software, a precise output is generated in the second stage.

Procedure selection for experiment testing: To make a comparative study, in this experiment different kinds of procedures found in the SPSS software are followed. Some procedures cannot be employed because; they accept only nominal or scalable values but not ordinal values. Since, the features' polarities are mentioned as numerals, the procedures accepting the ordinal values are used. The selected procedures include:

The Frequencies procedure - Here, for the selected variable, this procedure generates frequency tables where both the number and the percentage of cases are displayed.

Crosstabulation procedure (contingency tables) - Used to make a comparative study between two or more categorical variables.

Two- step cluster analysis - Engaged to display the overall quality of the stores.

Missing value analysis- To find the missing value patterns, this procedure is selected.

Multiple imputations - Apart from missing value analysis, the remaining procedures cannot be able to produce a precise result. This procedure randomly generates values for the missing cases on the basis of the initial seed values that are been already assigned to the variables. Finally, it produces a result much precise than the results produced by the previously assigned procedures.

After following the rules mentioned in section 4.1., the outcomes from each of the chosen procedures are displayed in section 4.2.

4.2. Experimental Results

The procedures that are highlighted in section4.1 are adopted to generate statistical results of the experiment. Other than the Multiple Imputation procedure, the remaining procedures are implemented using missing values and they produce only a precise output. Each procedure is chosen with a fixed hypothesis in mind "BestBuyPcs.com is better than IBuyPlasma.com". The discussions below will explain in detail how and what kind of results are obtained by using the mentioned procedures (refer section 4.1). Then the outcomes of the procedures are critically analysed and evaluated in section 4.3.

4.2.1. The Frequencies procedure: This procedure generates the following: (1) a general comparative statistical table and (2) frequency tables of the two online-stores using syntax 4.1.

FREQUENCIES VARIABLES=Best_Ease IBuy_Ease Best_Service IBuy_Service Best_Selection IBuy_Selection Best_Delivery IBuy_Delivery

/Format=NOTABLE

/ORDER=ANALYSIS

Syntax 4.1.Frequencies Procedure

4.2.1.1. Comparative statistical table: Table-4.1 displays a comparative statistical study of the two rival online-stores. Here, 'valid' represents the number of cases that are considered for the study and 'missing' refers to the cases that are omitted from the study. This table gives a general overview of the number of valid and missing cases, but there is no classification based on the feature's polarity. So, to study the polarity, the valid percentage calculated in the frequency tables (shown in section 4.1.2) is used. But from the table below, it is precise that BestBuyPcs.com has less number of missing values when compared to its competitor.

Table-4.1. Comparative Statistical table

4.2.1.2. Frequency tables: The frequency tables of all the features of BestBuyPcs.com are shown in tables-4.2, 4.3, 4.4 and 4.5.

Frequency

Percent

Valid Percent

Cumulati

ve Percent

Valid

Negative

9

12.0

12.7

12.7

Neutral

13

17.3

18.3

31.0

Positive

49

65.3

69.0

100.0

Total

71

94.7

100.0

Missing

System

4

5.3

Total

75

100.0

Table-4.2. Frequency Table for Best_Ease

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

Negative

12

16.0

16.9

16.9

Neutral

16

21.3

22.5

39.4

Positive

43

57.3

60.6

100.0

Total

71

94.7

100.0

Missing

System

4

5.3

Total

75

100.0

Table-4.3. Frequency Table for Best_Service

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

Negative

10

13.3

14.1

14.1

Neutral

15

20.0

21.1

35.2

Positive

46

61.3

64.8

100.0

Total

71

94.7

100.0

Missing

System

4

5.3

Total

75

100.0

Table-4.4. Frequency Table for Best_Selection

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

Negative

11

14.7

15.9

15.9

Neutral

6

8.0

8.7

24.6

Positive

52

69.3

75.4

100.0

Total

69

92.0

100.0

Missing

System

6

8.0

Total

75

100.0

Table-4.5. Frequency Table for Best_Delivery

Only 'valid percent' is accounted in this study. After reading from the frequency tables above, it is found that the feature 'On-time-delivery' has 75.4% positive polarity when compared to the other features. The next better is the 'Ease-of-ordering' which has 69% positive polarity.

The positive polarities for all the features are clearly above average. The average of the overall percentage of positive polarity is 67.45.

Likewise from the frequency tables for IBuyPlasma.com's features, it is obvious that 'Ease-of-ordering' feature has 68.2% positive polarity when compared to the other features. The next better is the 'Selection' which has 66.1% positive polarity. Here, like the previously specified online-store, the positive polarities for all the features are clearly above average. The average positive polarity percentage for all the four features is 64.76.

Result: Comparing the average polarity percentage of both the online-stores, we can say that BestBuyPcs.com has better positive opinions than IBuyPlasma.com approximately by 2%. Also the average number of missing cases for BestBuyPcs.com is less than its rival approximately by 4. From the above two reasons, a half-minded solution that 'BestBuyPcs.com is better than IBuyPlasma.com' can be made. The upcoming procedures except multiple imputations will guide the way to get an accurate solution because they only deal with cases with missing values.

Also, after implementing one procedure, a conclusion with respect to the hypothesis cannot be drawn because of the following reasons: (1) from one single output, it is not feasible to make an immediate conclusion and (2) the output from one procedure may differ from the other. For this statement to true or not, refer the frequency table of the upcoming Contingency tables. But as the experiment goes on, an important part has to be reminded i.e. the missing values. Because of the missing values, there is a possibility of concluding "higher the missing values, lower is the positive polarity for the respective online-store".

4.2.2. CrossTabulation procedure: As found from the Frequencies procedure, in BestBuyPcs.com, 'On-Time-Delivery' and 'Ease-Of-Ordering' helped to maintain a high positive polarity when compared with other features. Similarly, the features 'Ease-of-Ordering' and 'Selection' supported IBuyPlasma.com to get an above average positive polarity. To calculate the association strength of these above mentioned features as well as their relationship with the less positive polarity features, CrossTabulation procedure is used. In this experiment, statistics- Gamma of the CrossTabulation procedure is employed to test the associative strength between the less opted features and the more opted features for both the stores. Value of -1 indicates discordance (negative association) and a value of +1 represents concordance (positive association) between two or more variables. This procedure is applied separately for both the stores in the beginning and then a joint conclusion is drawn.

Crosstabulation in BestBuyPcs.com:

Here, 'Best_Ease' is cross tabulated with 'Best_Selection', 'Best_Service' and 'Best_Delivery'. From the table-4.6 showing the summary of Crosstabs between 'Best_Ease' and the other features, we discovered the following: the number of missing cases between 'Best_Ease' and 'Best_Delivery' is more than the others by 2.6%, but the number of positive opinion cases is higher than the others by a very small difference.

From the chosen statistical option Gamma, the respective Gamma-statistical tables-4.7, 4.8 and 4.9 of the cross tabulated features of the mentioned online-store are displayed.

Cases

Valid

Missing

Total

N

Percent

N

Percent

N

Percent

Best_Ease * Best_Service

67

89.3%

8

10.7%

75

100.0%

Best_Ease * Best_Selection

67

89.3%

8

10.7%

75

100.0%

Best_Ease * Best_Delivery

65

86.7%

10

13.3%

75

100.0%

Table-4.6. Summary of the Cross tabulation between Best_Ease and other features.

Value

Asymp. Std. Errora

Approx. Tb

Approx. Sig.

Ordinal by Ordinal

Gamma

.516

.165

2.354

.019

N of Valid Cases

67

Table-4.7. Gamma - Statistics: Best_Ease * best_Selection

Value

Asymp. Std. Errora

Approx. Tb

Approx. Sig.

Ordinal by Ordinal

Gamma

.445

.160

2.230

.026

N of Valid Cases

67

Table-4.8. Gamma -statistics: Best_Ease * Best_Service

Value

Asymp. Std. Errora

Approx. Tb

Approx. Sig.

Ordinal by Ordinal

Gamma

.311

.209

1.303

.193

N of Valid Cases

65

Table-4.9. Gamma -statistics: Best_Ease * Best_Delivery

From the tables, it is precise that there is less positive relationship between 'Best_Ease' and 'Best_Delivery'. But there is a high positive association between 'Best_Ease' and 'Best_Selection'. But from the Frequency procedure, it is found that 'Best_Ease' helped to generate positive polarity. So if the 'Selection' is improved, then the 'Ease-of-Ordering' will provide a very high positive opinion than the other features.

Crosstabulation in BuyPlasma.com: Here, 'IBuy_Ease' is cross tabulated with 'IBuy_Selection', 'Best_Service' and 'Best_Delivery'. Similar to the situation as in BestBuyPcs.com, when compared with other features, the number of missing cases between IBuy_Ease and IBuy_Selection is higher by 7%.

Consider the respective Gamma-statistical tables-4.10 and 4.11 of the cross tabulated features for the specified online-store.

Value

Asymp. Std. Errora

Approx. Tb

Approx. Sig.

Ordinal by Ordinal

Gamma

.497

.170

2.356

.018

N of Valid Cases

59

Table-4.10. Gamma-statistics: IBuy_Ease * IBuy_Service

Value

Asymp. Std. Errora

Approx. Tb

Approx. Sig.

Ordinal by Ordinal

Gamma

.159

.237

.647

.517

N of Valid Cases

52

Table-4.11. Gamma -statistics: IBuy_Ease * IBuy_Selection

In table-4.11, there is less concordance between the displayed features. But from table-4.10, there is a strong concordance between 'IBuy_Ease' and 'IBuy_Service'. Hence, improving 'Ease-of-ordering' helps to improve 'Service'. This in turn increases the possibility of getting high positive opinions.

Conclusion: Though Crosstab procedure fails to calculate the stores' overall polarity, we found that there is no discordance between the features in both the stores. This proves that the effect of one feature's quality affects the store's overall rating.

4.2.3. Two-Step Cluster Analysis: This procedure can be implemented to analyse large dataset and helps to discover clusters automatically in a dataset with categorical variables. The Cluster Features (CF) tree summarises the data. Like the Frequency procedure which is used to find out the feature with highest positive frequency, this procedure helps to discover the feature with the highest importance by applying it separately for both the online-stores. Here, the clusters are classified on the basis of the most important feature of the respective stores. Figure-4.1 shows the feature (Selection) with the highest importance.

Figure-4.1.Clusters showing the feature importance for BestBuyPcs.com (Selection)

Here, BestBuyPcs.com has two clusters clustered on the basis of the feature 'Selection' and is shown in figure-4.2. In this case, for cluster-1 it is more accurate that the customers are giving high positive opinion to the highlighted feature. In cluster-2 there is no positive polarity, but since cluster-1 has more number of cases and the positive percentage is high, the overall positive rating for 'Selection' is high.

Figure-4.2. Distribution of 'Selection' among the customers in cluster-1.

Similarly for IBuyPlasma.com, figure-4.3 has three clusters grouped using 'Service' as the most important feature. After comparing the distribution of 'Service' among the customers, the overall positive polarity for 'Service' is high.

Figure-4.3. Clusters showing the feature importance IBuyPlasma.com

Result: On comparing both the online-stores, BestBuyPcs.com is giving more importance to 'Selection' and its rival stores customers are mainly focussing on 'Service'. In general, it is found that both are not focussing to improve their features. Also from the figures 4.1 and 4.3, the number of missing cases for IBuyPlasma.com is more than its rival by 16. Since the missing cases are higher, there is a possibility that the overall positive rating of IBuyPlasma.com will be less than that of BestBuyPcs.com. But this question can be solved by employing multiple imputations discussed in section- 4.2.5. Before employing multiple imputations it is necessary to understand about the missing value patterns of the two online-stores which is said in the following section.

4.2.4. Missing Value Analysis: This procedure is used to handle situations when the data is incomplete. Presence of missing values can make the entire method much complicated since major decisions made are based on the statistical procedures. When the missing value analysis option is selected, it generates the univariate statistics and the tabulated patterns. Univariate statistics method displays the missing values of both the online stores' features and the corresponding statistical table-4.12 is displayed below.

N

Missing

Count

Percent

Best_Ease

71

4

5.3

Best_Service

71

4

5.3

Best_Selection

71

4

5.3

Best_Delivery

69

6

8.0

IBuy_Ease

66

9

12.0

IBuy_Service

67

8

10.7

IBuy_Selection

59

16

21.3

IBuy_Delivery

66

9

12.0

Table-4.12. Univariate Statistics of the two online-stores

The table above displays the missing value percentage of both the online stores. Here,

N represents the non-missing values and the number of missing values and its percentage is specified in the 'Missing' column. From the table it is proven that, the missing value percentage of the feature 'On-Time-Delivery' (12%) of BestbuyPcs.com is more when compared to the other features. But for IbuyPlasma.com, the missing percentage for the feature 'Selection' (21.3%) is higher when compared with that of its other features. Also the average missing percentage for BestBuyPcs.com (6%) is less than that of IbuyPlasma.com (14%).

To determine whether the feature values are missing jointly or not, tabulated pattern tables are used. The output showed that there are no jointly missing values (0%) for the store BestBuyPcs.com or else the same condition can be interpreted as: the customers are much satisfied in overall with the services provided by the online-store. So, in this case, the missing values can be clearly considered as positive.

But in the case of IBuyPlasma.com, there is 2% jointly missing values between Service and Delivery, Delivery and Ease-of-Ordering and Ease-of-ordering and Selection. So, it is crystallised that when a customer is not satisfied with the Service, then he is not satisfied with the Delivery and so on. Henceforth, for this online-store, the missing values can be clearly considered as negative. A finalised result cannot be told in this section because this procedure focussed on analysing and understanding the missing value patterns.

4.2.5. Multiple Imputations: This procedure helps to produce random values for the missing values and a complete set or sets of data are created. It also creates a

complete set with a pooled output showing that what results could have been expected when the original data set has no missing values. The main advantage is that the pooled result will be much precise and accurate than a single imputation. Two major steps of multiple imputations are: Analyse the missing value patterns and Impute missing data values.

Step-1: To analyse the missing value patterns of the two online-stores, consider the dataset with missing values. When the variables of this BestBuyPcs.com are chosen, it displays the warning table-4.13.

The Variable Summary table is not displayed because no variable has more than 10 % missing values.

Table-4.13. Warning table showing the less than 10% missing cases for BestBuyPcs.com

This indicates that the percentage of missing values for every feature is not more than 10%. This can be made as a proof that BestBuyPcs.com is a better option for the customers when compared with IBuyPcs.com. The overall summary of missing values for BestBuyPcs.com in figure-4.4 displays the following: (1). The Variables chart shows that out of four analysis variables (features), there is one missing value for a case or a record. (2).The Cases chart specify that 18 out of 75 cases have at least one missing value on a variable and (3).The Values chart displaying that 18 of the 300 values (cases Ã- variables i.e. 75 x 4) are missing.

Figure-4.5. overall missing values of BestBuyPcs.com

To understand the missing value patterns of BestBuyPcs.com, refer figure-4.5.

Figure-4.5. Missing value pattern for BestBuyPcs.com

Analysing the discussions about the missing value pattern in section 4.2.4 and the figure-4.5, it is obvious that there are no jointly missing values. This proves that in overall, the customers of BestBuyPcs.com are highly satisfied.

But for IBuyPlasma.com, there is more than 10% missing values and is proved in table-4.14.

Missing

Valid N

N

Percent

IBuy_Selection

16

21.3%

59

IBuy_Delivery

9

12.0%

66

IBuy_Ease

9

12.0%

66

IBuy_Service

8

10.7%

67

Table-4.14. Missing value is > 10% for IBuyplasma.com

Here, since the number of missing values is more than 10%, a dilemma that whether the customers are satisfied with the online-stores or not is created. In this situation, a random conclusion that the customers are not satisfied with the services provided by stores can be made. The overall summary of missing values highlights that, for the Variables chart, out of 4 analysis variables (features), there is one missing value for a case or a record. The Cases chart shows that 34 out of 75 cases have at least one missing value on a variable and the Values chart shows that 42 of the 300 values (cases Ã- variables i.e. 75 x 4) are missing.

Like the missing value patterns for BestBuyPcs.com, figure-4.6 displays the missing value patterns for IBuyPlasma.com.

Figure-4.6. Missing value pattern for IbuyPlasma.com

Comparing the results in section 4.2.4 and figure-4.6, we found that there is 2% jointly missing values. From the above mentioned comparative study, a conclusion the customers of IBuyPlasma.com are not much satisfied with any of its services can be made.

Result: From the above procedures followed, solutions are made only by making a guess that if a store has more missing values, then the customers are not satisfied with those stores' features. Here, an accurate conclusion that 'BestBuyPcs.com is better than IBuyPlasma.com' is made. But a precise solution can be drawn after the missing values are imputed. Step-2 will illustrate how the missing values are imputed.

Step-2: To demonstrate the option, Imputing missing data values, consider the dataset with missing values. Here, the missing values are left blank and no value is assigned. To produce a precise result, the missing values need to be removed by assigning random values. Using random number generator option, random numbers are generated.

After selecting the random number generator option, perform multiple imputations. In this experiment, five imputations are performed and the original and the randomly generated numbers for the missing values are stored in a new dataset. When the original data (with missing values) is imputed, the statistical tables for the features of the two online-stores are generated.

Result: From the above specified statistical tables for BestBuyPcs.com, for each feature, the highest positive polarity percentage among the five imputations is considered. For this, the complete dataset after imputation is used. From this, the overall average positive polarity percentage for all the four services provided is calculated and is found to be 67.32%. Similarly, the positive polarity percentage for IBuyPcs.com is found to be 63%. The result from the imputations can be guaranteed as precise instead of accurate.

On the basis of the result after using missing value imputation, a precise conclusion that the services provided by BestBuyPcs.com are better than IBuyPlasma.com is made and BestBuyPcs.com is better than its rival by approximately 5%.

The results of the chosen procedures are critically evaluated and analysed in the next section.

4.3. Critical Evaluation and analysis

In this experiment, a comparative study is carried out. While carrying out the study, not many difficulties were faced because, the features are already decided and finding out the polarity was not hectic. But the main drawback was that many statistical procedures cannot be applied due to lack of scalable variables. Apart from the mentioned problems the main critics included in the study and the final evaluation from the results of different procedures are explained below.

4.3.1 Critics: As the features' polarities are entered manually, one cannot prove that the data entered is accurate. Manual handling mistakes can take place. For many cases, the consumers refused or ignored to enter the ratings for the features. So these values are considered as missing values and the output from the tests using these missing values can be proved as precise though they are proved to be accurate. When testing procedures are introduced to perform statistical analysis, some tests like T-statistics cannot be conducted because, they accept only scalable values and in this experiment, the variables can be assigned only ordinal values.

Random numbers are generated to the missing values using multiple imputations. This helps to produce a precise result. But it is time consuming because only after entering all the available values to the dataset, multiple imputations can be carried out.

4.3.2. Analysing the tested results

The results of section 4.2 are analysed here. Different statistical procedures are employed to produce a precise result. But this conclusion is made precise only after carrying out after multiple imputations on the dataset with missing values.

In this experiment, though the Frequencies Procedure clearly depict that the BestBuyPcs.com has a better positive polarity than its rival, it shows that 'Ease-of-Ordering' and 'On-Time-Delivery' is better than the remaining features. When Crosstabs is used, it is found that there is no strong relationship between the above two features but there is strong relationship between 'Ease-of-Ordering' and 'Selection' and from Two-step clustering Analysis it is clear that 'Selection' is given the highest importance. Similarly, in IBuyPlasma.com, there is no strong relationship between 'Ease-of-Ordering' and 'Selection'. But there is strong relation between 'Ease-of-Ordering' and 'Service'. From Two-step clustering analysis, it is obvious that 'Service' is given the highest importance. From both the stores, there is an overall positive polarity. But, when compared, BestBuyPcs.com has a better polarity than the rival apart from missing values. By employing multiple imputations, it is clear that BestBuyPcs.com is better than IBuyPlasma.com only by 5%.

5. Conclusions:

Though opinion mining included different tasks, feature-based opinion mining seemed to be more innovative. For analysing these features, new software is used. In this journal, the missing values are analysed and a precise output is presented.

In the papers by Liu et al., and Liu (2010) missing values were not given more importance. Again the pre-defined and rated features were not used by them. But, here, all these points were taken into account. On the basis of these, a comparative hypothetical study on two online stores is made. After performing statistical analysis along with missing values, it is found that BestBuyPcs.com is much better than IBuyPlasma.com. But, after multiple imputations, the difference is less and the results produced are precise.

Future Works

In this journal, the polarities are entered manually for the experiment. But by referencing the literature review in section 2, there is a possibility that the feature's polarity can also be assigned automatically. Again, after performing multiple imputations for missing values, only one procedure is employed for statistical analysis. The remaining procedures can be implemented to get more precise outputs in the near future.