Outlier Checking Mahalanobis Distance Biology Essay

Published: November 2, 2015 Words: 1428

This section is considered in the data for analysis and investigated the causes that might affect the finding. However, the data screening is also observing the data position for all the remaining data. The 398 data are input into SPSS version 20 and analyzed using AMOS version 18.0. The data screening included outlier detection, missing data, descriptive statistic, reliability, univariate normality, multicollinearity, linearity etc. Each part of data screening will be discussed in the next sections.

4.2.1 Outlier Checking (Mahalanobis Distance)

Statistical evidence has established outliers as any observations which are numerically distant if compared to the rest of the dataset (Bryne, 2010). In the line with this are several existing literatures that have been conducted on the different methods of detecting outliers within a given research, among which includes classifying data points based on an observed (Mahalanobis) distance from the research expected values (Hair et al., 2010; Hau & Marsh, 2004). Part of the constructive argument in favor of outlier treatments based on Mahalanobis distance is that it serves as an effective means of detecting outliers through the settings of some predetermined threshold that will assist in defining whether a point could be categorized as outlier or not (Gerrit et al., 2002).

For this research, the table of chi-square statistics has been used as the threshold value to determine the empirical optimal values for the research. This decision is in line with the arguments of Hair et al. (2010) which emphasized on the need to create a new variable in the SPSS excel to be called “response” numbering from the beginning to the end of all variables. The Mahalanobis can simply be achieved by running a simple linear regression through the selection of the newly created response number as the dependent variable and selecting all measurement items apart from the demographic variables as independent variables. Doing this has assisted this study in creating a new output called Mah2 upon which a comparism was made between the chi-square as stipulated in the table and the newly Mahalanobis output.

It was under this Mah2 that this current study identified 6 items out of the total of 398 respondents as falling under outliers because their Mah2 is greater than the threshold value as indicated in the table of chi-square statistics that is related to the 21 measurement items in the independent variable of this study and was subsequently deleted from the dataset. Sequel to the treatment of these outliers, the final regressions in this study was done using the remaining 392 samples in the dataset.

Multivariate ateliers detections refer and are characterized as normal analysis from the observation within the context of data analysis. Multivariate outliers can be detected in SPSS by calculation of Mahalanobis Distance for each respondent. This method measures statistic that allows for significance testing.

Table 4.1: Outlier Detection (Mahalanobis Distance)

Minimum

Maximum

Mean

Std. Deviation

N

Predicted Value

-6.32

451.63

199.50

77.192

398

Std. Predicted Value

-2.666

3.266

.000

1.000

398

Standard Error of Predicted Value

5.899

59.259

28.507

6.600

398

Adjusted Predicted Value

-31.67

463.81

198.32

78.266

398

Residual

-253.756

193.111

.000

85.293

398

Std. Residual

-2.817

2.144

.000

.947

398

Stud. Residual

-2.911

2.314

.006

.997

398

Deleted Residual

-270.853

226.881

1.179

94.793

398

Stud. Deleted Residual

-2.942

2.329

.006

1.000

398

Mahal. Distance

.706

170.845

40.897

18.576

398

Cook's Distance

.000

.065

.003

.005

398

Centered Leverage Value

.002

.430

.103

.047

398

The value of Mahalanobis Distance (D2) is greater than a critical value and used as the threshold level for D2/df measure which should be conversation of significance (0.005 or 0.001) for designation on outliers (Hair et al., 2010).

For this study, the Maximum of D2 is 170.845 that are greater than the critical value. The critical value mentioned to Chi-square value is 74.745. This means that Mahalanobis Distance has an insight as particular value leads to a high of critical value. Once the potential outliers are identified, if the data is a large, but a viable segment of the population, and then perhaps the value should be retained. As outliers are deleted, it will run the risk of data. However, this study also measures observations as to the status as outlier to identify a complementary set of data perspectives.

4.2.2 Missing Data

Since the questionnaire is collected, the first step in data screening will be to identify. The extent of missing data concerns the effect of the unit data that is a risk to the analysis result. Normally, the missing data under 10 percent for an individual case or observation might not be a problem, except when the missing data occurs in a specific nonrandom (Hair et al., 2010). In this study, the missing data does not exist in each questionnaire. Thus, this study should determine the number of cases without missing any of the variables, which provide the sample size variable for data analysis still remedies.

Table 4.3: Missing Data

Cases

Valid

Missing

Total

N

Percent

N

Percent

N

Percent

CL1

398

100.0%

0

0.0%

398

100.0%

CL2

398

100.0%

0

0.0%

398

100.0%

CL3

398

100.0%

0

0.0%

398

100.0%

ID1

398

100.0%

0

0.0%

398

100.0%

ID2

398

100.0%

0

0.0%

398

100.0%

ID3

398

100.0%

0

0.0%

398

100.0%

TL1

398

100.0%

0

0.0%

398

100.0%

TL2

398

100.0%

0

0.0%

398

100.0%

TL3

398

100.0%

0

0.0%

398

100.0%

ES1

398

100.0%

0

0.0%

398

100.0%

ES2

398

100.0%

0

0.0%

398

100.0%

ES3

398

100.0%

0

0.0%

398

100.0%

EM1

398

100.0%

0

0.0%

398

100.0%

EM2

398

100.0%

0

0.0%

398

100.0%

EM3

398

100.0%

0

0.0%

398

100.0%

SC1

398

100.0%

0

0.0%

398

100.0%

SC2

398

100.0%

0

0.0%

398

100.0%

SC3

398

100.0%

0

0.0%

398

100.0%

SL1

398

100.0%

0

0.0%

398

100.0%

SL2

398

100.0%

0

0.0%

398

100.0%

SL3

398

100.0%

0

0.0%

398

100.0%

OI1

398

100.0%

0

0.0%

398

100.0%

OI2

398

100.0%

0

0.0%

398

100.0%

OI3

398

100.0%

0

0.0%

398

100.0%

OI4

398

100.0%

0

0.0%

398

100.0%

OI5

398

100.0%

0

0.0%

398

100.0%

OI6

398

100.0%

0

0.0%

398

100.0%

OI7

398

100.0%

0

0.0%

398

100.0%

OI8

398

100.0%

0

0.0%

398

100.0%

OI9

398

100.0%

0

0.0%

398

100.0%

OI10

398

100.0%

0

0.0%

398

100.0%

OI11

398

100.0%

0

0.0%

398

100.0%

OP1

398

100.0%

0

0.0%

398

100.0%

OP2

398

100.0%

0

0.0%

398

100.0%

OP3

398

100.0%

0

0.0%

398

100.0%

OP4

398

100.0%

0

0.0%

398

100.0%

OP5

398

100.0%

0

0.0%

398

100.0%

OP6

398

100.0%

0

0.0%

398

100.0%

OP7

398

100.0%

0

0.0%

398

100.0%

OP8

398

100.0%

0

0.0%

398

100.0%

OP9

398

100.0%

0

0.0%

398

100.0%

4.2.3 Descriptive Statistic

The following profile was found among the data screening process. In general, the descriptive latent constructs include maximum, minimum, mean, standard deviation, mode, and median. The nine latent constructs (continuous learning (CL), Inquiry and dialogue (ID), team learning (TL), embedded system (ES), empowerment (EM), system connection (SC), strategic leadership (SL), organizational innovativeness (OI), and organizational performance (OP)) are presented in Table 4.4.

Table 4.4: Descriptive Statistics of Variables

CL

ID

TL

ES

EM

SC

SL

OI

OP

N

Valid

392

392

392

392

392

392

392

392

392

Missing

0

0

0

0

0

0

0

0

0

Mean

3.169

3.379

3.304

3.401

3.480

3.098

3.361

3.342

2.998

Std. Error of Mean

.064

.058

.062

.057

.057

.065

.059

.048

.052

Median

3.000

3.333

3.333

3.667

3.500

3.000

3.333

3.364

3.000

Mode

4.00

4.00

3.00

3.00

3.00

3.00

4.00

4.00

4.00

Std. Deviation

1.263

1.147

1.223

1.134

1.129

1.294

1.174

.945

1.037

Minimum

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

Maximum

5.00

5.00

5.00

5.00

5.00

5.00

5.00

5.00

5.00

The mean value of nine constructs with 41 items. Organizational performance (OP) is lowest for mean value (2.998) while the highest mean is empowerment (EM = 3.480). For Standard Deviation, system connection (SC) is the highest value (1.294), continuous learning (CL) is 1.263, team learning (TL) is 1.223, strategic leadership (SL) is 1.174, inquiry and dialogue (ID) is 1.147, embedded system (ES) is 1.134, empowerment (EM) is 1.129, and organizational performance (OP) is 1.037, but the lowest value is the organizational innovativeness (OI) with 0.945. Besides, the highest of Standard Deviation Error of Mean is system connection (SC) (0.065) when organizational innovativeness (OI) is the lowest value (0.048). The nine constructs have the value of the median range from 3.000 to 3.667 and mode is 3.00 and 4.00. The maximum and minimum are 5.00 and 1.00 respectively.