Understanding Of Business Forecasting Biology Essay

Published: November 2, 2015 Words: 2708

The aim of this report is to show my understanding of business forecasting using data which was drawn from the UK national statistics. It is a quarterly series of total consumer credit gross lending in the UK from the second quarter 1993 to the second quarter 2009.

The report answers four key questions that are relevant to the coursework.

Question 1

Figure 1 - Line graph of credit lending Figure 2 - ACF graph of credit lending

In this section the data will be examined, looking for seasonal effects, trends and cycles. Each time period represents a single piece of data, which must be split into trend-cycle and seasonal effect. The line graph in Figure 1 identifies a clear upward trend-cycle, which must be removed so that the seasonal effect can be predicted.

Figure 1 displays long-term credit lending in the UK, which has recently been hit by an economic crisis. Figure 2 also proves there is evidence of a trend because the ACF values do not come down to zero. Even though the trend is clear in Figure 1 and 2 the seasonal pattern is not. Therefore, it is important the trend-cycle is removed so the seasonal effect can be estimated clearly. Using a process called differencing will remove the trend whilst keeping the pattern.

Drawing scattering plots and calculating correlation coefficients on the differenced data will reveal the pattern repeat.

Scatter Plot correlation

The following diagram (Figure 3) represents the correlation between the original credit lending data and four lags (quarters). A strong correlation is represented by is showed by a straight-line relationship.

Figure 3 - Scatter plot displaying correlation between original credit lending data and the fourth lag.

As depicted in Figure 3, the scatter plot diagrams show that the credit lending data against lag 4 represents the best straight line. Even though the last diagram represents the straightest line, the seasonal pattern is still unclear. Therefore differencing must be used to resolve this issue.

Differencing

Differencing is used to remove a trend-cycle component. Figure 4 results display an ACF graph, which indicates a four-point pattern repeat. Moreover, figure 5 shows a line graph of the first difference. The graph displays a four-point repeat but the trend is still clearly apparent. To remove the trend completely the data must differenced a second time.

Figure 4 - ACF graph of the first difference Figure 5 - Line graph of the first difference

First differencing is a useful tool for removing non-stationary. However, first differencing does not always eliminate non-stationary and the data may have to be differenced a second time. In practice, it is not essential to go beyond second differencing, because real data generally involve non-stationary of only the first or second level.

Figure 6 and 7 displays the second difference data. Figure 6 displays an ACF graph of the second difference, which reinforces the idea of a four-point repeat. Suffice to say, figure 7 proves the trend-cycle component has been completely removed and that there is in fact a four-point pattern repeat.

Figure 6. ACF graph of the second difference Figure 7. Line graph of the second difference

Question 2

Multiple regression involves fitting a linear expression by minimising the sum of squared deviations between the sample data and the fitted model. There are several models that regression can fit. Multiple regression can be implemented using linear and nonlinear regression. The following section explains multiple regression using dummy variables.

Dummy variables are used in a multiple regression to fit trends and pattern repeats in a holistic way. As the credit lending data is now seasonal, a common method used to handle the seasonality in a regression framework is to use dummy variables. The following section will include dummy variables to indicate the quarters, which will be used to indicate if there are any quarterly influences on sales. The three new variables can be defined:

Q1 = first quarter

Q2 = second quarter

Q3 = third quarter

Trend and seasonal models using model variables

The following equations are used by SPSS to create different outputs. Each model is judged in terms of its adjusted R2.

Linear trend + seasonal model

Data = a + c time + b1 x Q1 + b2 x Q2 + b3 x Q3 + error

Quadratic trend + seasonal model

Data = a + c time + b1 x Q1 + b2 x Q2 + b3 x Q3 + error

Cubic trend + seasonal model

Data = a + c time + b1 x Q1 + b2 x Q2 + b3 x Q3 + error

Initially, data and time columns were inputted that displayed the trends. Moreover, the sales data was regressed against time and the dummy variables. Due to multi-collinearity (i.e. at least one of the variables being completely determined by the others) there was no need for all four variables, just Q1, Q2 and Q3.

Linear regression

Linear regression is used to define a line that comes closest to the original credit lending data. Moreover, linear regression finds values for the slope and intercept that find the line that minimizes the sum of the square of the vertical distances between the points and the lines.

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.971a

.943

.939

3236.90933

Figure 8. SPSS output displaying the adjusted coefficient of determination R squared

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

17115.816

1149.166

14.894

.000

time

767.068

26.084

.972

29.408

.000

Q1

-1627.354

1223.715

-.054

-1.330

.189

Q2

-838.519

1202.873

-.028

-.697

.489

Q3

163.782

1223.715

.005

.134

.894

Figure 9

The adjusted coefficient of determination R squared is 0.939, which is an excellent fit (Figure 8). The coefficient of variable 'time', 767.068, is positive, indicating an upward trend. All the coefficients are not significant at the 5% level (0.05). Hence, variables must be removed. Initially, Q3 is removed because it is the least significant variable (Figure 9). Once Q3 is removed it is still apparent Q2 is the least significant value. Although Q3 and Q2 is removed, Q1 is still not significant. All the quarterly variables must be removed, therefore, leaving time as the only variable, which is significant.

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

16582.815

866.879

19.129

.000

time

765.443

26.000

.970

29.440

.000

Figure 10

The following table (Table 1) analyses the original forecast against the holdback data using data in Figure 10. The following equation is used to calculate the predicted values.

Predictedvalues = 16582.815+765.443*time

Original Data

Predicted Values

50878.00

60978.51

52199.00

61743.95

50261.00

62509.40

49615.00

63274.84

47995.00

64040.28

45273.00

64805.72

42836.00

65571.17

43321.00

66336.61

Table 1

Suffice to say, this model is ineffective at predicting future values. As the original holdback data decreases for each quarter, the predicted values increase during time, showing no significant correlation.

Non-Linear regression

Non-linear regression aims to find a relationship between a response variable and one or more explanatory variables in a non-linear fashion.

(Quadratic)

Model Summaryb

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.986a

.972

.969

2305.35222

Figure 11

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

11840.996

1099.980

10.765

.000

time

1293.642

75.681

1.639

17.093

.000

time2

-9.079

1.265

-.688

-7.177

.000

Q1

-1618.275

871.540

-.054

-1.857

.069

Q2

-487.470

858.091

-.017

-.568

.572

Q3

172.861

871.540

.006

.198

.844

Figure 12

The quadratic non-linear adjusted coefficient of determination R squared is 0.972 (Figure 11), which is a slight improvement on the linear coefficient (Figure 8). The coefficient of variable 'time', 1293.642, is positive, indicating an upward trend, whereas, 'time2', is -9.079, which is negative. Overall, the positive and negative values indicate a curve in the trend.

All the coefficients are not significant at the 5% level. Hence, variables must also be removed. Initially, Q3 is removed because it is the least significant variable (Figure 9). Once Q3 is removed it is still apparent Q2 is the least significant value. Once Q2 and Q3 have been removed it is obvious Q1 is under the 5% level, meaning it is significant (Figure 13).

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

11698.512

946.957

12.354

.000

time

1297.080

74.568

1.643

17.395

.000

time2

-9.143

1.246

-.693

-7.338

.000

Q1

-1504.980

700.832

-.050

-2.147

.036

Figure 13

Table 2 displays analysis of the original forecast against the holdback data using data in Figure 13. The following equation is used to calculate the predicted values:

QuadPredictedvalues = 11698.512+1297.080*time+(-9.143)*time2+(-1504.980)*Q1

Original Data

Predicted Values

50878.00

56172.10

52199.00

56399.45

50261.00

55103.53

49615.00

56799.29

47995.00

56971.78

45273.00

57125.98

42836.00

55756.92

43321.00

57379.54

Table 2

Compared to Table 1, Table 2 presents predicted data values that are closer in range, but are not accurate enough.

Non-Linear model (Cubic)

Model Summaryb

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.997a

.993

.992

1151.70013

Figure 14

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

17430.277

710.197

24.543

.000

time

186.531

96.802

.236

1.927

.060

time2

38.217

3.859

2.897

9.903

.000

time3

-.544

.044

-2.257

-12.424

.000

Q1

-1458.158

435.592

-.048

-3.348

.002

Q2

-487.470

428.682

-.017

-1.137

.261

Q3

12.745

435.592

.000

.029

.977

Figure 15

The adjusted coefficient of determination R squared is 0.992, which is the best fit (Figure 14). The coefficient of variable 'time', 186.531, and 'time2', 38.217, is positive, indicating an upward trend. The coefficient of 'time3' is -.544, which indicates a curve in trend. All the coefficients are not significant at the 5% level. Hence, variables must be removed. Initially, Q3 is removed because it is the least significant variable (Figure 15). Once Q3 is removed it is still apparent Q2 is the least significant value. Once Q3 and Q2 have been removed Q1 is now significant but the 'time' variable is not so it must also be removed.

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

18354.735

327.059

56.120

.000

time2

45.502

.956

3.449

47.572

.000

time3

-.623

.017

-2.586

-35.661

.000

Q1

-1253.682

362.939

-.042

-3.454

.001

Figure 16

Table 3 displays analysis of the original forecast against the holdback data using data in Figure 16. The following equation is used to calculate the predicted values:

CubPredictedvalues = 18354.735+45.502*time2+(-.623)*time3+(-1253.682)*Q1

Original Data

Predicted Values

50878.00

49868.69

52199.00

48796.08

50261.00

46340.25

49615.00

46258.51

47995.00

44786.08

45273.00

43172.89

42836.00

40161.53

43321.00

39509.31

Table 3

Suffice to say, the cubic model displays the most accurate predicted values compared to the linear and quadratic models. Table 3 shows that the original data and predicted values gradually decrease.

Figure 17 Figure 18

Figure 17 and 18 display the original credit lending data, predicted values, upper and lower coefficient limits. Figure 18 displays the cubic pattern and is a better representation of the data, compared to the quadratic pattern, Figure 17. Figure 18 matches the original data line graph most accurately.

Question 3

Box Jenkins is used to find a suitable formula so that the residuals are as small as possible and exhibit no pattern. The model is built only involving a few steps, which may be repeated as necessary, resulting with a specific formula that replicates the patterns in the series as closely as possible and also produces accurate forecasts.

The following section will show a combination of decomposition and Box-Jenkins ARIMA approaches.

For each of the original variables analysed by the procedure, the Seasonal Decomposition procedure creates four new variables for the modelling data:

SAF: Seasonal factors

SAS: Seasonally adjusted series, i.e. de-seasonalised data, representing the original series with seasonal variations removed.

STC: Smoothed trend-cycle component, which is smoothed version of the seasonally adjusted series that shows both trend and cyclic components.

ERR: The residual component of the series for a particular observation

Figure 19

Autoregressive (AR) models can be effectively coupled with moving average (MA) models to form a general and useful class of time series models called autoregressive moving average (ARMA) models,. However, they can only be used when the data is stationary. This class of models can be extended to non-stationary series by allowing differencing of the data series. These are called autoregressive integrated moving average (ARIMA) models.

The variable SAS will be used in the ARIMA models because the original credit lending data is de-seasonalised. As the data in Figure 19 is de-seasonalised it is important the trend is removed, which results in seasonalised data. Therefore, as mentioned before, the data must be differenced to remove the trend and create a stationary model.

Figure 20 displays the autocorrelations after the first differencing. There is still a slight trend displayed in the ACF graph. Figure 21 is also a line graph depicting the first difference.

Figure 20 Figure 21

Figure 20

Figure 20 displays the autocorrelations after the first differencing. There is still a slight trend displayed in the ACF graph. Figure 21 is also a line graph depicting the first difference.

Model Statistics

Model

Number of Predictors

Model Fit statistics

Ljung-Box Q(18)

Number of Outliers

Stationary R-squared

Normalized BIC

Statistics

DF

Sig.

Seasonal adjusted series for creditlending from SEASON, MOD_2, MUL EQU 4-Model_1

0

.485

14.040

18.693

15

.228

0

Model Statistics

Model

Number of Predictors

Model Fit statistics

Ljung-Box Q(18)

Number of Outliers

Stationary R-squared

Normalized BIC

Statistics

DF

Sig.

Seasonal adjusted series for creditlending from SEASON, MOD_2, MUL EQU 4-Model_1

0

.476

13.872

16.572

17

.484

0

ARMA (3,2,0)

Original Data

Predicted Values

50878.00

50335.29843

52199.00

50252.00595

50261.00

50310.44277

49615.00

49629.75233

47995.00

49226.60620

45273.00

48941.24113

42836.00

48674.95295

43321.00

48150.91779

ARMA (0,2,1)

Original Data

Predicted Values

50878.00

50562.03020

52199.00

50226.83433

50261.00

49870.11538

49615.00

49491.87337

47995.00

49092.10829

45273.00

48670.82013

42836.00

48228.00891

43321.00

47763.67462

Question 4

Part A

Business Forecasting can be used to predict future values. It is important the performance of the built model to predict future values is known. Presently, the current economic climate is certain to have a negative effect on future values so it is important to know when to modify a built model.

Signal tracking can be used to resolve this issue. Signal tracking is a measure that indicates whether the forecast is keeping pace with any genuine upward or downward changes in the forecast variable (demand, sales, etc). The tracking signal is mathematically defined as the sum of the forecast errors divided by the mean absolute deviation.

Tracking signal = sum(forecast errors)/MAD

The following table displays the MAD values for the Quadratic, Cubic, ARIMA (3,2,0) and ARIMA (0,2,1) models.

MAD (Mean Absolute Value)

Quadratic

Cubic

ARIMA (3,2,0)

ARIMA (0,2,1)

1805.168

874.9475

836.2912

829.6187

Part B

The most important assumption to consider is how good the original data is. On one hand, if the data is accurate, tests can be performed to analyze the accuracy of the forecasts. In regression analysis, forecasting results do not only provide a probability of how accurate the overall forecast, but can also test the reliability of all the individual variables in the forecast. On the other hand, if the quality of the data is not good, forecasting may produce results that do not coincide with reality, which may lead to false forecasting results (Vasigh et al, 2008).

Forecasting models also assume that future values will continue on from the past, ignoring any environmental changes. At present current lending has suffered due to the current economic climate, but this is likely to improve in the near future. Towards the end of the current credit lending graphs, there is the beginning of a downward trend, which is accurate, but the final chosen model is unable to predict how long the economic downturn will last. Suffice to say, economic changes may distort results, therefore, the model must be modified to take external factors into consideration.

Another influence assumption is the effect of additive and multiplicative. It has been assumed the credit lending data uses a multiplicative model during decomposition, which is expressed as a product of trend, seasonal and irregular components. But f additive decomposition was to be used, this would change the results because additive decomposition is the sum of seasonal, trend and irregular components.

Economics changes may distort results

Extensive data-mining of information

Only a crude approximation of reality