The Recent Developments In Portfolio Selection Finance Essay

Published: November 26, 2015 Words: 6391

This paper studies the recent developments in portfolio selection with a large number of portfolios using regularization. Of one of them an interesting constraint is to use the L1 norm which effectively encourages sparse portfolio(Brodie et al. 2009; Fan et al. 2009). Markowitz mean-variance portfolio optimization theory(Markowitz, 1952), the corner stone of the model portfolio theory will be discussed first. In part 2(Next), I will discuss the reasons why Markowitz's original model could not get more popularity among practitioners. In part 3(Thereafter), I review some of the more important results of L1 model, in part 4(Finally), we will compare the out-of-sample performance of Markowitz's L2 risk (standard deviation) model with L1 model using several datasets(historical stock return data). Three groups of data are used to examine these models. The application on the datasets seems to suggest a stable performance of L1 norm, superior to other models.

The main purpose of this paper is to demonstrate in details that the L1 constrained portfolio has a stable performance, a special case of regularized portfolio, can remove most of the difficulties of Markowitz's model while maintaining its advantages over equilibrium models.

2. Review of Markowitz's Model

The modern portfolio theory was introduced by Harry Markowitz(1952,1959) with his paper "Portfolio Selection" in 1952 Journal of Finance, which laid down the first mathematical model for allocating capital over a number of available assets under the objective of maximizing the return while minimizing the risk on the investment. In this paper Markowitz measured the return on a portfolio by the expected value of the portfolio return, while risk is quantified by variance(dispersion) of returns. Markowitz mean-variance portfolio optimization theory states that a rational investor, instead of a speculator, would be willing to diversify and to either maximize his expected return given an upper bound on the risk, or to minimize the risk given a lower bound on the return the investor is willing to accept. Therefore an optimal portfolio can be obtained by solving a convex quadratic programming problem. Markowitz portfolio selection model is of profound impact as many important economic modeling of financial markets or theories are based upon it, for example, the well-known/celebrated Capital Asset Pricing Model(CAPM), developed primarily by Sharpe(1964), Lintner(1965), etc.

2.1 Problem Formulation

A portfolio is defined as a collection of assets/investments hold by investors. It is assumed that one unit of capital is available and require that capital to be fully invested, that is, the portfolio weights, which represent the relative amount of capital invested in each asset, add up to unity.

Suppose there are N assets; we have the following notations defined:

= the rate of return in asset i;

= the weight of portfolio invested in asset i, i=1,…,N.

= the column vector of portfolio weights , i.e.

= the rate of return (per period) of asset i, i =1,…,N.

= the expected value of, i.e.

= the column vector of expected returns ,i.e.

= covariance of asset i with asset j; i=1,…,N and j=1,…,N

= NÃ-N matrix of covariance

= the expected return of the portfolio

= the standard deviation of the portfolio

The portfolio expected return (per period)/mean and variance of the portfolio is given by

In the traditional Markowitz portfolio optimization, our objective is to find a portfolio that has minimal variance for a given expected return, i.e. to minimize for any fixed subject to the constraint.

To represent the objective function in vector/matrix form, one seeks satisfying that:

s. t. , (2.1)

As, can thus be written as

.

The minimization in (2.1) is then equivalent to:

s. t. , .

Define be a matrix of which

For the empirical implementation, a matrix is defined, of which row t equals. Also, sample averages are used to estimate the expectation.

That is, and where

Hence given the notation, the optimization problems become

(2. 2)

s. t. (2. 3)

(2. 4)

where, for a vector a in , .

2.2 General Algorithm to the Unconstrained Markowitz Problem

The usage of Lagrange Multipliers is adopted as a standard technique to deal with budget constraints such as the one in Markowitz optimization problem. As the sum of the weights(asset proportions) must equal to 1, we consider the unconstrained case where all values for weights are permitted. In other words, asset weights outside the range [0,1], which represent short-selling(borrowing)(smaller than 0) or leverage(greater than 1), are allowed.

Below is a brief introduction of this technique.

The technique works as the original minimization problem is transformed by introducing the Lagrange multiplier, λ, to deal with the budget constraint. We then solve the resulting transformed minimization problem by taking partial derivatives, setting them equal to zero, and solving the resulting set of simultaneous linear equations to obtain the optimal weights.

The proof of the obtained w minimizes our objective function subject to the budget constraint is omitted, but John Norstad gave a detailed proof in his article. [i]

The algorithm is explained as follows:

Let

To minimize/maximize a target function subject to a set of constraints .

Step 1

Construct the Lagrange function

The vector is called the vector of Lagrange multipliers.

Step 2

We solve the system of equations:

For the portfolio optimization problem, the Lagrange function, in matrix notation, is

Solve the system of equations, in matrix notation:

The solutions are: and and

Therefore, the optimal weight vector is solved explicitly, with computed and.

The algorithm is used for the dataset of the paper to compute the traditional Markowitz method performance. Also is assumed to be not singular and has an inverse.

2.3 Drawbacks of traditional Markowitz Mean-Variance Portfolio Optimization Model

Although the Markowitz quadratic optimization problem is a convenient and useful theoretical framework for portfolio selection, easy to solve as the formulae above shown, contrary to its theoretical reputation, in practical estimation, the original form of Markowitz portfolio does not have a stable performance, nor is it embraced by financial practitioners as a tool for optimizing a large-scale portfolio.

Therefore drawbacks of the Markowitz portfolio optimization model are discussed as below in details.

The complexity/difficulty of computation

The covariance structure of a portfolio is usually estimated by sample covariance matrix of asset returns as a standard statistical method. The computational burden gets heavy for practitioners in the case of managing a large number of assets. Suppose we have n assets, then calculating the n expected returns and n(n+1)/2 estimates in covariance matrix through historical data are necessary. For example, if 1000 stocks are to be allocated, then the covariance matrix involves over 500,000 unknown parameters to be estimated. Moreover, solving a large-scale dense quadratic programming problem where almost all's are nonzero is, if not impossible, very difficult if n is over, say 500.

The challenge of dimensionality

The classic Markowitz portfolio optimization is an ill-conditioned inverse problem caused by the large amount of subjects but few observations. The covariance matrix is required to be inversed, however, this estimator often suffers from the curse of dimensions, as stated by Pafka, Potters and Kondor [2004] that in practice the stock return time series period used for estimation (T) is not long enough compared to the number of stocks considered (N), For example, it would be troublesome if the number of data points that are used for estimating the covariance matrix is no more than 400 (about two-year daily data, or eight-year weekly data, or thirty-year monthly data) while we have over 500 assets to manage. In that case the sample covariance matrix is not invertible at all. Even when N is less than T but not negligible, inverting the matrix results in dramatically amplified estimation error and extreme weights would be calculated. Also, the computed weights are sensitive to errors in inputs such as the expected returns and covariance matrix; small changes in the input parameters would result in large changes in the optimized portfolio allocation(Chopra and Ziemba (1993)) [ii] .

The ignorance of Transaction/Management cost in practice

An optimal solution of a large-scale quadratic programming program usually results the optimal weights with many nonzero elements. It is noticed by [iii] ???(find reference!) that at least 100-200 positive weights would be resulted for a portfolio of over 1000 assets. The inconvenience in practice is that investors have to pay significant amounts of transaction costs to buy many different stocks by a very small amount. Moreover, the periodic adjustments of the portfolio also involve unavoidable transaction costs, proportional to the difference between the purchase and the sale prices, which are not considered by the classic Markowitz model as a factor of selecting assets.

It is very sensitive to errors in the estimates of the inputs, namely the expected return and the covariance matrix.

2.4 Improvements on Markowitz Mean-Variance optimization Model

Based on reasons above, several techniques have been suggested to reduce the sensitivity of the Markowitz-optimal portfolios to input uncertainty.

Of many proposed method, Ledoit and Wolf (2004) proposed an improved estimator of the covariance matrix based on the statistical principal of shrinkage. The idea is to find an optimal linear combination/a weighted average of the sample covariance matrix S and a highly structured estimator F. By determining an optimal shrinkage intensity estimator based on an optimality criterion/according to a quadratic loss function( that does not depend on the inverse of the covariance matrix), shrinkage pulls the most extreme coefficients towards more central values, thereby systematically reducing estimation error where it matters most. An empirical study demonstrates that shrinkage results in a significantly higher realized information ratio of the active manager compared to the sample covariance matrix. [iv]

In a nutshell, the method is to shrink the sample covariance matrix towards either a covariance matrix implied by the factor model structure, or the identity matrix, which assumes constant correlation between returns of any two stocks.

Other papers such as Frost and Savarino (1986) suggested Bayesian estimation of means and covariance matrix. To be specific, one can incorporate estimation risk/uncertainty directly into the decision process, i.e. asserting a 'non-informative' or 'invariant' prior, or specifying an informative prior that all securities/assets have identical expected returns, variances, and asserted pairwise correlation coefficients, this empirical Bayes method reduces estimation error by drawing the posterior estimates of each security's expected return, variance, and pairwise correlation coefficients toward the average return, average variance, and average correlation coefficient, respectively, of all the securities in the population. This empirical Bayes method is shown to select portfolios whose performance is superior to that achieved, given the assumption of a non-informative prior or by using classical sample estimates.

However, as noted by Disatnik and Benninga(2007) [v] , a significant drawback of many modified covariance matrix is that they generate minimum-variance portfolios incorporating significant short sale positions. Short-selling, in practice, is widely prohibited (mutual funds, for example, in many cases are not allowed to short sell). Therefore, to some extent short sales are considered as an undesirable feature of portfolio optimization. Moreover, the above techniques, while reducing the sensitivity of input vectors in the mean-variance allocation, did not address fully the adverse effect due to the accumulation of estimation errors, particularly when portfolio size is large.

There are also efforts made to modify the Markowitz unconstrained mean-variance optimization problem to improve the weights stability. For example, Goldfarb and Iyengar (2003) proposed alternative deterministic models to select portfolios, that is, a robust convex optimization framework/formulation and showed that not only it helped reduce the sensitivity of the portfolio composition to the parameter estimates, but also provides a better risk-return portfolio performance.

3. Regularized Markowitz Portfolio

In the succeeding paragraphs, a regularization of Markowitz's portfolio construction, as proposed by Brodie et al.(2008) and Fan, Zhang and Yu(2009), will be discussed to tame the trouble instabilities and restrain from accumulating substantial estimation errors especially for vast portfolios when N is large.

make the resulting allocation depend less sensitively on the input vectors.. are able to withstand noisy data considerably better than classical portfolios

Brodie et al.(2008) and Fan, Zhang and Yu(2009)'s paper extended Jagannathan and Ma (2003)'s work of imposing no-short-sales constraints on the Markowitz mean-variance optimization problem. They both adopted the usage of normalization, by adding a gross exposure constraint, one can obtain an optimized portfolio allocation with sparsity and stability.

Moreover, Fan(2009) also provides theoretical proof to the question why the constraint on gross exposure prevents the risks or utilities of selected portfolios from accumulation of statistical estimation errors. Another prominent contribution of this paper is that it provided mathematical insights to the utility approximations/formulation with the gross-exposure constraint in portfolio selection, tracking, and improvement. As the gross exposure parameter relaxes from 1 to infinity, the optimization problem progressively transforms from no short-sale constraint to no constraint on short sales.

The gross exposure constraint, in essence, is a norm on the allocation vector. As the importance of this penalty can be adjusted with a "tunable" coefficient, it makes not only the Markowitz problem more practical, but also bridges the gap between the no-short-sale optimization problem of Jagannathan and Ma(2003) and the unconstrained optimization problem of Markowitz (1952, 1959). To be specific, for large values of this coefficient, optimization of the penalized objective function turns out to be equivalent to solving the original (unpenalized and no constraint on short sales) problem, as the coefficient decreases, the optimal solutions penalize more heavily for assets with short-positions, and when it reaches 1, it is essentially a problem with no short-sale allowed.

The discussion below of the regularization of Markowitz's portfolio construction is restricted to the traditional Markowitz mean-variance approach, while factor models and utility optimization problem could be incorporated as a variation.

While in the discussion below I shall stay closely to Brodie et al.(2008)'s paper in problem formulation and data analysis, which has the closest form to Markowitz' original form as well.

Fan(2009) has proved that for a wide range of the constraint parameters, the optimal portfolio does not sensitively depend on the estimation errors of the input vectors. Besides, the empirical and theoretical risks are also approximately the same for any allocation vector satisfying the gross-exposure constraint. As a generalization to the work by Markowitz (1952) and Jagannathan and Ma (2003),

3.1 Problem formulation for constrained

Given the notation, we have seen the formulae below in part 2.

such that , ,

We augment the objective function of interest with a penalty term, and our original Markowitz objective function with an penalty becomes:

(3.1)

s. t. (3.2)

, (3.3)

Where is the norm term; and is absorbed with the choice of the adjustable parameter.

3.2 L1 norm

Regularization is defined as introducing additional information in order to solve an ill-posed problem. It improves the conditioning of the problem, thus enabling a numerical solution. Typical examples of regularization in statistical machine learning include ridge regression, lasso, etc. For lasso, it can also be used for model selection, by penalizing models on the parameters. For an â„“p penalty for the unconstrained problem, when p=1, it is lasso regression; when p=2, it is ridge regression.

Although penalization leads to biased regression method, LASSO provides a solution path of selected variables and allows us to use at our disposal. Moreover, using regularization help improve the overall prediction accuracy, by introducing the bias in exchange for variance reduction as discussed in Hastie, Tibshirani and Friedman (2001). Unlike Ridge regression which is not useful for variable selection, in contrast, the LASSO allows simultaneous model fitting(parameter shrinkage) and variable selection by setting some coefficients in norm constraint on to zero exactly.

The particular problem of minimizing an (unconstrained) objective function of the type (3. 2) was named as lasso regression by Tibshirani (1996). Therefore, we focus on one particular regularization method, that is, the L1 norm.

3.3 Why Lasso ---special features of L1

The advantage of adopting the L1 norm of Markowitz's optimal allocation vector can be summarized into the following:

It promotes sparsity.

The penalizing â„“1 norms have a sparsifying effect, and as the coefficient decreases, the â„“1 norm term in the penalized objective functions are penalized heavier and heavier, shrinking the values of the weights and hence the solution will only contain a few active positions, resulting a small set of assets selected. In practical world where formulating investment portfolios, sparsity solutions are also desirable/of vital importance as investors frequently require to be able to limit the number of positions they must create, monitor and liquidate, by considering suitably large values of Ï„ in (3.1) Brodie et al.(2008) also illustrates geometrically how the addition of an â„“1 term to the unconstrained volatility minimization encourages sparse solutions.

It regulates the amount of shorting in the portfolio designed by the optimization process.

Based on the constraint (3.3), we can rewrite the objective function in (3.1) to

(3. 4)

From (3,4) we could see that, as the last term is the coefficient constant that does not depend on the choice of the weight factor, it is clear that the only penalization is done with assets with negative weights, i.e. short positions. In other words, under the constraint (3.3), the â„“1 penalty is thus equivalent to a penalty on short positions. Therefore, with a considerably/extremely large Ï„ chosen, most of the components converge to their zero limit, the optimal solution we will have is a portfolio with no short positions and a sparse one with limited active positions. While as we decreases Ï„ in the â„“1-penalized objective function to be optimized, the constraint is relaxed by not removed completely; it then no longer imposes positivity absolutely, but still penalizes overly large negative weights.

A restriction to non-negative-weights-only can have a regularizing effect on Markowitz's portfolio construction.

The proof can be done as below:

Suppose and are minimizers of (2), corresponding to the values and respectively, and both satisfy the two linear constraints (3), (4). We have the following:

Hence,

If some of the are negative, but are all non-negative, then we have .This shows.

It shows the optimal portfolio with non-negative entries corresponds to the largest value of , and typically results in the spareset solution, as the penalty term is weighted more heavily which promotes sparsity.

It stabilizes the problem.

By imposing a penalty on the size of the coefficients of w in an appropriate way, we reduce the sensitivity, stabilize the optimization problem as the penalization/regularization help alleviate the effects of possible multicollinearity when is badly conditioned as we noted in the drawbacks of the Markowitz' original problem. The stability induced by the â„“1 penalization makes practical, empirical work possible with only limited training data. In fact, in Daubechies, Defrise, and De Mol (2004), it is proved (for the unconstrained case) that any â„“p penalty on w, with 1 <=p<= 2, suffices to stabilize the minimization of (1) by regularizing the inverse problem. On the other hand, Brodie et al.(2008) shows geometrically that sparsity is encouraged when 0<=p<=1. Therefore, p=1, the l1 penalty discussed in our optimization has both desirable features.

Fan's article summarize here!

It makes a possible proxy to account for transaction costs in a natural/practical way.

In reality, investors not only care about the choice of the securities they trade, but are also concerned with the transaction costs they will incur when acquiring and liquidating the positions they select. Transaction costs in a liquid market usually are incurred because of the commission fees for the brokers. Usually the amount charged will be proportional to the total of transacted amount multiplied by the bid-ask spread applicable to the size of the transaction. Although there is a minimum fixed fee charge, the amount is usually ignorable for moderate to large investors. Hence, the total transaction cost can be seen as proportional to the absolute sum of capitals invested in each asset, i.e, the transaction cost is then effectively captured by an â„“1 penalty.

3.4 General solution to the L1 constrained problem -- LARS-LASSO algorithm for Constrained Risk Minimization

LARS algorithm/a constrained homotopy, presented by Efron, Hastie, Johnstone, and Tibshirani(2004), allows us to compute portfolios involving only a small number of securities. The nice feature of LARS is, it does not require separate computations to find solutions for each value of Ï„, rather, by exploiting the piecewise linear dependence of the solution on Ï„, it obtains, in one run, the weight vectors for all values of Ï„ (i.e. for all numbers of selected assets) in a prescribed range.

To put it in a simple way, The LARS procedure works by starting with all coefficients equal to zero, and find the predictor most correlated with the response, say x1 . The largest step possible in the direction of this predictor is taken until another predictor, say x2 , has as much correlation with the current residual. At this point, LARS proceeds in a direction equiangular between the two predictors until a third variable xj3 enters into the "most correlated" set, by having the same absolute correlation with the working residual as those with x1 and x2. LARS then proceeds equiangularly between x1, x2 and x3 , that is, along the "least angle direction," until a fourth variable enters, and so on. The LARS algorithm can be adapted to deal with a general â„“1-penalized minimization problem, of which the slopes have to be recomputed at every breakpoint, the increases and decreases one single index j one at a time, so that the full set of/all Lasso solutions can be generated by the modified LARS algorithm.

The graph below illustrates how LASSO computes a solution path for variable selections with varying weight values.

However, we still have two linear constraints (3.2) and (3.3) to take into consideration.

As R programming is only utilizable for the lasso problem which has the following form in (3.1)

We need to find incorporate the two linear constraints into the quadratic form.

(3. 5)

And Equation (3.1), (3.2) and (3.3) are equivalent with the form below

(3. 6)

By choosing and sufficiently large, we could enforce the linear constraints to be realized by forcing the two quadratic terms forced to be equal to zero, which is just a transformation for the two linear terms.

Hence we shall enforce relatively large values for and , so that whenever the two quadratic terms do not equal to zero, the terms will be penalized harshly, a case that shall not happen. The next step then is to adapt the equation in (3. 6) to the equivalent form that is built in R to call for the Lasso-Lars algorithm, which only deals with the equation in (3.5) and hence we need to combine the three quadratic forms in (3.6) to the form in (3.5).

We could show that we need to modify the response vector and the input matrix .

As equation (3.5) and (3.6) can be expanded as shown below:

By comparing the terms and combine the common weight vector, we could derive the modified inputs in R: where

Now the next problem boils down to choose a proper and , without loss of generality, we assume that they could be chosen to be the same although not necessary, as long as is sufficiently large to ensure both quadratic terms are fully penalized to zero.

Based on the nature of datasets that is available, a program is run to test the tracking effect of the model when varying the value of , since our objective is to choose an arbitrary large compared to to ensure that the two terms associated with are fully penalized before the l1 norm term.

A portfolio with ten assets(detailed description in the dataset 1) was used to fit the lasso model with only non-negative positions being the portfolio selection criterion. As we have proved earlier, the optimal portfolio with non-negative entries corresponds to the largest value of , therefore we vary the value of in a range of [1, 2000] while keeping the portfolio fixed to compute the fitted model required return. We would want the LASSO fitted model do not have results affected by the variation of .

prediction error(expected tracking error) : ,

Below is a graph of how the fitted model required return gradually stabilized especially when increases to around 250. As a small sample size was used, we specify to be 1000 for a conservative reason.

The table shows how the computed weights are similarly allocated without much influence from the value of .

[500,]

0.05535

0

0.40316

0

0.20207

0.09627

0.00314

0.07135

0.03002

0.13863

[1000,]

0.05535

0

0.40316

0

0.20208

0.09627

0.00314

0.07135

0.03003

0.13863

4. DATA AND METHODOLOGY

4.1 Specification of the three datasets

300 monthly returns from January 1986 through December 2010 on stocks traded on the New York Stock Exchange (NYSE) and Nikkei 225 are employed. The stock returns are extracted from Bloomberg database. The first dataset consists of ten largest stocks traded in NYSE. In the second example, we use 40 stocks selected from Nikkei 225 top 100 stocks based on market capitalization, while the third dataset consists of 50 stocks selected from NYSE top 100 stocks.

The criterion to select the stocks in our examination is described as follows:

1. Companies which were not on the list at the starting point and entered the Nikkei 225 or NYSE at different dates afterwards are excluded.

3. Large companies are taken preference over small and medium-sized companies, with market cap larger than 10 billion.

Japan 986135592960 39310000128 As an aside

4.2 Performance comparison criteria

In the following section we compare the performance of regularized portfolio selection, especially l1 norm criterion with the classical Markowitz MV portfolio approach on real market data. Also a parallel performance comparison was made against another benchmark portfolio, which is a naïve, equal-weighted portfolio consisting of a large number N of individual stocks. Although the 1/N strategy is simply an equal investment in each available security, it is noted that 1/N strategy is a tough benchmark since it has shown to outperform a host of optimal portfolio strategies constructed with existing optimization procedures, including the traditional MV strategy sometimes. ( DeMiguel, Garlappi, and Uppal, 2007) [vi] .

The out-of-sample performance of L1 penalized portfolio and classic Markowitz portfolio relative to that of the 1/N portfolio across three empirical datasets of monthly returns would be compared, using the following two performance criteria: (i) the out-of-sample Sharpe ratio and (ii) the standard deviation, which was computed when we derive Sharpe ratios.

4.3 Computation procedures

The analysis is based on a "rolling-sample" approach. Specifically, given a T -month-long dataset of asset returns, an estimation window of length M = 60 or M = 120 months was chosen. In each month t, starting from t = M + 1, the data in the previous M months is used to estimate the portfolio weights by targeting the return to be the mean of equally weighted portfolio of the same period M. Then based on either classical Markowitz optimization strategy or l1 norm constraint optimization strategy, the weight vector is computed and used to compute modified returns for the followed 12 months , assuming the portfolio have a rebalance frequency of 12 months. After every 12 months. The weight parameters are then recursively computed using the preceding M months' returns, until the end of the dataset is reached. The outcome of this rolling-window approach is a series of T − M monthly out-of-sample returns generated by each of the portfolio strategies of the empirical datasets. Hence we would obtain the time series of monthly out-of-sample returns generated by each strategy and then we measure their out-of-sample Sharpe ratio as well as the out-of-sample standard deviation.

For l1 norm constraint optimization strategy, one of the selection criterions can be to select portfolios with non-negative weights, i.e. no short positions. As the LARS-LASSO technique provides a series of weights under different value of , there will be a few optimal choices for the weight vector, in that case, the two linear constraints would be double checked to make sure the weights selected for non-negative portfolio chosen should be near 1 and the estimated return should be as close to the required return as possible. Another selection criterion for l1 norm strategy is that a particular number of assets are targeted, so that only the portfolio with targeted asset number and smallest tracking error is selected every rebalance period.

4.4 Simulation result and analysis

The out-of-sample performance evaluation of our sparse portfolio

No -short-positions portfolio, 1/N-strategy portfolio

From our simulation results, we conclude that portfolio strategies from the optimizing models are expected to outperform the 1/N benchmark if: (i) the estimation window is long; (ii) the ex ante (true) Sharpe ratio of the mean-variance efficient portfolio is substantially higher than that of the 1/N portfolio; and (iii) the number

of assets is small. The first two conditions are intuitive The reason for the last condition is that a smaller number of assets implies fewer parameters to be estimated and, therefore, less room for estimation error. Moreover, other things being equal, a smaller number of assets makes naive diversification less effective relative to optimal diversification.

Example 1: US10

Table 1: Performance of the sparse portfolio with no short-selling, for US10

In Table 1, three portfolio strategies are tested for their performance over 12 consecutive months immediately following their construction; the out-of-sample returns based on different strategies (hence different allocation weights) are pooled over 5 years to compute monthly mean return m, standard deviation of monthly return σ and Sharpe ratio S (expressed in %).

The table shows that the Sharpe ratios of 1/N-strategy portfolio for the whole sample period as well as for consecutive sub-periods extending over 5 years each, are consistently higher than the optimal no-short-positions portfolio, while the classic Markowitz portfolio perform the worst, with almost all negative Sharpe values. Except in period 01/96-12/00, Markowitz portfolio is higher than that of no-short-positions portfolio. Also, the standard deviations for no-short-positions portfolio are not always the smallest compared to the other two, this is because sample size is small, and the regularization effect is not significant.

The limited penalization can also be shown in Figure 1 below. The optimal portfolio without short positions can sometimes reach 10, which is the whole sample size. In that case, the spasity and stability(in terms of standard deviation) effects are limited and hence the other two datasets are presented to illustrate the advantages of using l1 norm strategy.

Figure 1: Number of assets without short positions, for US10. It ranges from 6 to 10 for the optimal portfolio without short positions from year to year. The average over 25 years is around 8.

In table 2, following the same construction methodology as shown in table 5.1, we could observe that from 01/91-12/95, 01/96-12/00 and period 01/01-12/05, the no-short-positions portfolio has the best performance in interms of Sharpe ratio, as it is higher than the other two portfolios. Except for period 01/06-12/10, the Sharpe raio was negative due to a negative mean return. It has been noted that the return in Japan after 2000 onwards has dropped significantly, thus the negative values are understandable. In terms of standard deviations, again the no-short-positions portfolio outperforms the other two portfolios throughout the whole period. It can be contributed to lasso's stability property as it limits the estimation error and hence has a better stable performance than the other two.

From the table below it is also obvious to notice that the regularized portfolio limits the assets down from a total sample size of 40 to an average value of 11, which suggests the spasity of the l1-norm penalized portfolio. Thus with the dataset of Japan40, lasso seems to be a rather superior strategy of portfolio selection.

Figure 2: Number of assets without short positions, for Japan40. It ranges from 6 to 17 for the optimal portfolio without short positions from year to year. The average over 25 years is around 11.

7.68297Figure 3 The Sharpe ratio, for the full period 1986-2010, for various optimal sparse portfolios. It was based on the second criterion of choosing an optimal sparse portfolio with different fixed numbers. The no-short-positions optimal portfolio is indicated by a horizontal blue line, stretching from 6 to 17(its minimum to maximum number of assets; see also Figure 2.)

It clearly shows that when targeting a portfolio with a larger asset amount, the portfolio selected has an inferior performance due to the instability, worse than the performance of sparse portfolios without short positions.

Example 3: US50

Table 3: Performance of the sparse portfolio with no short-selling, for US50

In table 3, following the same construction methodology as shown in table 5.1, we could observe that throughout the evaluation period, 1/N strategy cannot be outperformed. The no-short-positions portfolio has the second best performance in terms of Sharpe ratio, while the classi Markowitz perofrmance the worst. It suggests that the naïve strategy surely is a robust portfolio. However, in terms of standard deviation, non-negative weight portfolio has a relative small standard deviation with only fewer asset positions comparable to the other two. It could be obseravble based on Table 1, 2 and 3 that the not-so-good Sharpe ratio performance is caused mainly because the no-short-positions portfolio have a inferior performance of the mean returns compared to the other two. Therefore although the standard deviation is limited, its performance may still not able to beat the equal weighted portfolio. It can be concluded that rather non-negative weight portfolio could outperform the equal weight portfolio also depends on the nature of the dataset, as table 5.2 shows that non-negative weight portfolio does provide a satisfying performance. While in the US case, the fluctuation within data itself may have limited the l1 norm peneralized portfolio to outperform the benchmark portfolio.

Figure 2: Number of assets without short positions, for Japan40. It ranges from 11 to 22 for the optimal portfolio without short positions, from year to year. The average over 25 years is around 17.

2.49362

Figure 4 The Sharpe ratio, for the full period 1986-2010, for various optimal sparse portfolios; It was based on the second criterion of choosing an optimal sparse portfolio with different fixed numbers. The no-short-positions optimal portfolio is indicated by a horizontal blue line, stretching from 11 to 22 (its minimum to maximum number of assets; see also Figure 2.)It is observable that a lot of Sharpe ratios are below zero, which indicates the unstable performance of including more assets. It suggests that data variation is so large that an estimation using historical data to allocate the weights on asset for future may not be sufficient at all. Rather, keeping a naïve portfolio where equal investment is made achieves higher Sharpe ratios.

To further confirm the inferior performance is not caused due to the implemented strategy in terms of rolling window period and rebalance period, we also varied the rolling window, by extending it from 60 months period(5 years), to 7-year period and 9-year period. Also by keeping other things fixed, the portfolio is also rebalanced with a higher frequency, ranging from 24 months, 12 months(originally), 6 months to 3 months and 1 month. The resulted table is shown in the Appendix. From which it suggests that the result is consistent in terms of portfolio performance, that is, the 1/N strategy outperform the l1 norm strategy with no-short-positions, and with the classic Markowitz portfolio perform worst.

4.5 Summary of the Analysis

In the three datasets we have, it is noticeable that sparse portfolio with no-short-positions works well with limiting the number of active positions, while at the same time, have a lower standard deviation, and hence stabilize the portfolio in the out-of-sample portfolio performance. However, due to the limited positions, it sometimes cannot justify its inferior mean return with respect to its small standard deviation to achieve higher Sharpe ratios than the other two, and hence the benchmark portfolio outperform both no-short-positions portfolios as well as the Markowitz's portfolio sometimes.

Also, the Markowitz' portfolio did not perform well In all three datasets, mainly due its ill-condition matrix with result in extreme values in weights, and hence it has many negative Sharpe ratio values.

5. Possible extension, limitation and caution points

5.1 The empirical results' limitations:

The discussion of the empirical result is restricted to the traditional Markowitz mean-variance approach. In fact, a variety of modification on the structure, such as the adoption of using factor models, etc… Similar ideas could also be applied to different portfolio construction frameworks considered in the literature. Targeting the mean of the mean return for 1/n may not be good ?

It could be noticed that Markowitz portfolio has the least performance as it targeted at the return fixed by 1/N strategy. It may result in an unsatisfactory performance of Markowitz as extreme weights may be calculated which behave badly. If the targeted return is the minimum-variance portfolio, then the performance result may favor Markowitz better.

5.2 Partial index tracking

Investors would like to track the index's performance by pursuing a passive investment strategy, thinking the market cannot be beaten. However, as the index composes a large number of assets, it would be inefficient and unrealistic to buy and sell all the assets for a full replication of the index; hence the l1 penalty can be practically generalized in to track an index effectively, with a smaller set of assets. We would have an objective function, similar to the one in (2):

Which seeks to minimize the expected tracking error while at the same time enforces sparsity and stabilize the problem of which may contain collinear assets. The index could be any existing financial index or other abstract financial time series such as investor sentiment time series, etc.

5.3 Portfolio Adjustment

The constructions in consecutive years are not meant to the behavior of a single investor, it is rather, the results obtained by different investors who would follow the same strategy to build their portfolio. For single investors, it is possible to adopt a sparse portfolio adjustment strategy in subsequent years. Using a small modification to our original constraint minimization formulae, we could easily formulae the problem.

5.4 L2 norm Regularization

L2 norm, the ridge regression, as introduced in, is not used for variable selection but it also has a shrinkage property which stabilizes the original problem. Hence we would not expect it to deduce a sparse solution. But rather, it is a good method to regularize the covariance matrix and derive weights with less extreme values.

6. Conclusion

Appendix 1 Stock List

Appendix 2 Completed Data Results