This paper studies the recent developments in portfolio selection with a large number of assets using regularization. Of one of them an interesting constraint is to use the L1 norm which effectively encourages sparse portfolio(Brodie et al. 2009; Fan et al. 2009). Markowitz mean-variance portfolio optimization theory(Markowitz, 1952), as the corner stone of the modern portfolio theory will be reviewed first. After a discussion of the reasons why Markowitz's original model could not get much popularity among practitioners, a brief introduction of the improvements using various modification methods will be given. In part 3, important attributes of the L1 norm method will be scrutinized. While in part 4, an out-of-sample performance comparison among the classic Markowitz model, the L1 norm model and an equal-weighted portfolio method will be presented. Three groups of historical stock return data are used to examine these models.
The main purpose of this paper is to demonstrate in details that the L1 constrained portfolio, a special case of regularized portfolios, has a sparse and stable performance and hence has its advantages over the difficulties faced by the classic Markowitz's model.
2. Review of the Traditional Markowitz Mean-Variance Model
The modern portfolio theory was introduced by Harry Markowitz(1952,1959) with his paper "Portfolio Selection" in 1952 Journal of Finance, which laid down the first mathematical model for allocating capital over a number of available assets with the objective of maximizing the return while minimizing the risk. In this paper Markowitz measured the return on a portfolio by the expected value of the portfolio return, while the risk is quantified by variance of returns. Markowitz mean-variance portfolio optimization theory states that a rational investor, instead of a speculator, would be willing to diversify and to either maximize his expected return given a certain risk level, or to minimize the risk while maintaining a fixed return. Therefore an optimal portfolio can be obtained by solving a convex quadratic programming problem. The Markowitz portfolio selection approach is of profound impact as many important economic modeling of financial markets or theories are based upon it, for example, the well-known Capital Asset Pricing Model(CAPM), developed by Sharpe(1964), Lintner(1965) and Black(1972).
2.1 Problem Formulation for the Traditional Markowitz Mean-Variance Portfolio
A portfolio is defined as a collection of assets/investments hold by investors. It is assumed that one unit of capital is available and require the capital to be fully invested, that is, the portfolio weights, which represent the relative amount of capital invested in each asset, add up to unity.
Suppose there are N assets; we have the following notations defined:
= the rate of return in asset i;
= the weight of portfolio invested in asset i, i=1,…,N.
= the column vector of portfolio weights , i.e.
= the rate of return (per period) of asset i, i =1,…,N.
= the expected value of, i.e.
= the column vector of expected returns , i.e.
= covariance of asset i with asset j; i=1,…,N and j=1,…,N
= NÃ-N matrix of covariance
= the expected return of the portfolio
= the standard deviation of the portfolio
The portfolio expected return and variance are given by
In the traditional Markowitz portfolio optimization, the objective is to find a portfolio that has minimal variance for a given expected return, i.e. to minimize for any fixed subject to the constraint.
To represent the objective function in vector/matrix form, one seeks satisfying that:
s. t. , (2. 1)
As, can thus be written as
.
The minimization in (2.1) is then equivalent to:
s. t. , .
For the empirical implementation, a matrix is defined, of which row t equals. Also, sample averages are used to estimate the expectation.
That is, and where
Hence given the notation, the optimization problem becomes
(2.2)
s. t. (2.3)
(2.4)
where, for a vector a in , .
2.2 General Algorithm to the Unconstrained Traditional Markowitz Problem
The usage of Lagrange Multipliers is adopted as a standard technique to deal with budget constraints such as the one in Markowitz optimization problem. As the sum of the weights must equal to 1, the unconstrained case is considered, that is, all values for weights are permitted. In other words, asset weights outside the range [0,1], which represent short-selling(borrowing) when it is smaller than 0; or leverage when it is greater than 1, are allowed.
Below is a brief introduction of this technique.
The technique works as the original minimization problem is transformed by introducing the Lagrange multiplier, λ, to deal with the budget constraint. We then solve the resulting transformed minimization problem by taking partial derivatives, setting them equal to zero, and solving the resulting set of simultaneous linear equations to obtain the optimal weights.
The proof of the obtained minimizes the objective function subject to the budget constraint is omitted, but John Norstad gave a detailed proof in his article.
The algorithm is explained as follows:
Let , the objective is to minimize/maximize a target function subject to a set of constraints .
Step 1
Construct the Lagrange function
The vector is called the vector of Lagrange multipliers.
Step 2
Solve the system of equations:
For the portfolio optimization problem, the Lagrange function, in matrix notation, is
Solve the system of equations, in matrix notation:
The solutions are: and and
Therefore, the optimal weight vector is solved explicitly, with computed and.
The algorithm is used for the datasets in the paper to compute the traditional Markowitz method performance. Also is assumed to be not singular and has an inverse.
2.3 Drawbacks of the Traditional Markowitz Portfolio Optimization Model
Although the Markowitz quadratic optimization problem established a useful framework for portfolio selection and is also easy to solve as the formulae above shown, contrary to its theoretical reputation, in practical estimation, the original form of Markowitz portfolio does not have a stable performance, nor is it embraced by financial practitioners as a tool for optimizing a large-scale portfolio.
Therefore drawbacks of the Markowitz portfolio optimization model are discussed as below in details.
The complexity/difficulty of the computation
The covariance structure of a portfolio is usually estimated by sample covariance matrix of asset returns as a standard statistical method. The computational burden gets heavy for practitioners in the case of managing a large number of assets. Suppose we have n assets, then calculating the n expected returns and n(n+1)/2 estimates in covariance matrix through historical data are necessary. For example, if 1000 stocks are to be allocated, then the covariance matrix involves over 500,000 unknown parameters to be estimated. Moreover, solving a large-scale dense quadratic programming problem where almost all's are nonzero is, if not impossible, very difficult if n is over, say 500(Konno and Yamazaki (1991)).
The challenge of dimensionality
The classic Markowitz portfolio optimization is an ill-conditioned inverse problem caused by the large amount of subjects but few observations. The covariance matrix is required to be inversed, however, this estimator often suffers from the curse of dimensions, as stated by Pafka, Potters and Kondor (2004) that in practice the stock return time series period used for estimation (T) is not long enough compared to the number of stocks considered (N). For example, it would be troublesome if the number of data points that are used for estimating the covariance matrix is no more than 400 (about two-year daily data, or eight-year weekly data, or thirty-year monthly data) while we have over 500 assets to manage(Fan et al.(2009)). In that case the sample covariance matrix is not invertible at all. Even when N is less than T but not negligible, inverting the matrix results in dramatically amplified estimation error and extreme weights would be calculated. Also, the computed weights are sensitive to errors in inputs of the expected returns and covariance matrix; small changes in the input parameters would result in large changes in the optimized portfolio allocation(Chopra and Ziemba (1993)) .
The ignorance of Transaction/Management cost in practice
An optimal solution of a large-scale quadratic programming program usually results the optimal weights with many nonzero elements. It is noticed by Konno and Yamazaki (1991) that at least 100-200 positive weights would be resulted for a portfolio of over 1000 assets. The inconvenience in practice is that investors have to pay significant amounts of transaction costs to buy many different stocks by a very small amount. Moreover, the periodic adjustments of the portfolio also involve unavoidable transaction costs, proportional to the difference between the purchase and the sale prices, which are not considered by the classic Markowitz model as a factor of selecting assets.
2.4 Improvements on Markowitz Mean-Variance Optimization Model
As a result of reasons above, many methods are proposed to reduce the sensitivity of the classic Markowitz optimization method to input uncertainty.
Ledoit and Wolf (2004) suggested an improved estimator of the covariance matrix based on the statistical principal of shrinkage. The idea is to find an optimal weighted average of the sample covariance matrix S and a highly structured estimator F. By determining an optimal shrinkage intensity estimator according to a quadratic loss function that does not depend on the inverse of the covariance matrix, shrinkage pulls the most extreme coefficients towards more central values, thereby systematically reducing estimation error where it matters most. An empirical study demonstrates that shrinkage results in a significantly higher realized information ratio of the active manager compared to the sample covariance matrix.
In a nutshell, the method is to shrink the sample covariance matrix towards either a covariance matrix implied by the factor model structure, or the identity matrix, which assumes constant correlation between returns of any two stocks.
Other papers such as Frost and Savarino (1986) suggested Bayesian estimation of means and covariance matrix, which has shown to select portfolios which have superior performance. To be specific, one can incorporate estimation uncertainty directly into the decision process, i.e. asserting a 'non-informative' or 'invariant' prior, or specifying an informative prior that all assets have identical expected returns, variances, and asserted pairwise correlation coefficients, this empirical Bayes method reduces estimation error by drawing the posterior estimates of each security's expected return, variance, and pairwise correlation coefficients toward the average of return, variance and correlation coefficient respectively for all securities in the population.
However, as noted by Disatnik and Benninga(2007), a significant drawback of many modified covariance matrix is that they generate minimum-variance portfolios incorporating significant short sale positions. Short-selling, in practice, is widely prohibited (mutual funds, for example, in many cases are not allowed to short sell). Therefore, to some extent short sales are considered as an undesirable feature of portfolio optimization. Moreover, the above techniques, while reducing the sensitivity of input vectors in the mean-variance allocation, did not address fully the adverse effect due to the accumulation of estimation errors, particularly when portfolio size is large.
There are also efforts made to modify the Markowitz unconstrained mean-variance optimization problem to improve the weights stability. For example, Goldfarb and Iyengar (2003) proposed alternative deterministic models to select portfolios, that is, a robust convex optimization formulation and showed that not only it helped reduce the sensitivity of the portfolio composition to the parameter estimates, but also provides a better risk-return portfolio performance.
3. Regularized Markowitz Portfolio
In the succeeding paragraphs, a regularization of Markowitz's portfolio construction, as proposed by Brodie et al.(2008) and Fan et al. (2009), will be discussed to tame the trouble instabilities and restrain from accumulating substantial estimation errors especially for vast portfolios when N is large.
Brodie et al.(2008) and Fan et al. (2009)'s paper extended Jagannathan and Ma (2003)'s work of imposing no-short-sales constraints on the Markowitz mean-variance optimization problem. They both adopted the usage of normalization, by adding a gross exposure constraint, one can obtain an optimized portfolio allocation with sparsity and stability.
Moreover, Fan et al. (2009) also provides theoretical proof to the question why the constraint on gross exposure prevents the risks or utilities of selected portfolios from accumulation of statistical estimation errors. Another prominent contribution of this paper is that it provided mathematical insights to the utility approximations with the gross-exposure constraint in portfolio selection, tracking, and improvement. As the gross exposure parameter relaxes from 1 to infinity, the optimization problem progressively transforms from no short-sale constraint to no constraint on short sales.
The gross exposure constraint, in essence, is a norm on the allocation vector. As the importance of this penalty can be adjusted with a "tunable" coefficient, it makes not only the Markowitz problem more practical, but also bridges the gap between the no-short-sale optimization problem of Jagannathan and Ma(2003) and the unconstrained optimization problem of Markowitz (1952, 1959). To be specific, for large values of this coefficient, optimization of the penalized objective function is equivalent to solving the original no-constraint-on-short-sales problem. As the coefficient decreases, the optimal solutions penalize more heavily for assets with short-positions, and when it reaches 1, it is essentially a problem with no short-sale allowed.
The discussion below of the regularization of Markowitz's portfolio construction is restricted to the traditional Markowitz mean-variance approach, in both problem formulation and data analysis, while factor models and utility optimization problem could be incorporated as a variation.
3.1 Problem Formulation for Constrained L1 Norm Portfolio
Given the notation, we have seen the formulae below in part 2.
such that , ,
The objective function of interest is augmented with a penalty term, and the original Markowitz objective function with an L1 penalty becomes:
(3.1)
s. t. (3.2)
, (3.3)
Where is the L1 norm term; and is absorbed with the choice of the adjustable parameter.
3.2 Regularization and the L1 Norm
Regularization is defined as introducing additional information in order to solve an ill-conditioned problem. Typical examples of regularization in statistical machine learning include ridge regression, lasso regression(the type in function (3.2)), etc. For an Lp penalty for the unconstrained problem, when p=1, it is lasso regression; when p=2, it is ridge regression.
Although penalization leads to biased regression method, Lasso provides a solution path of selected variables and allows us to use at our disposal. Moreover, using regularization help improve the overall prediction accuracy, by introducing the bias in exchange for variance reduction as discussed in Hastie, Tibshirani and Friedman (2001). Unlike Ridge regression which is not useful for variable selection, in contrast, the Lasso allows simultaneous model fitting(parameter shrinkage) and variable selection by penalizing models on the parameters.
Therefore, we focus on this particular regularization method, the L1 norm.
3.3 Special Features of the L1 Norm Model
The advantage of adopting the L1 norm of Markowitz's optimal allocation vector can be summarized into the following:
It promotes sparsity.
The penalizing L1 norm has a sparsifying effect, and as the coefficient decreases, the L1 norm term in the penalized objective function is penalized heavier and heavier, shrinking the values of the weights and hence the solution will only contain a few active positions, resulting a small set of assets selected. In practical world when formulating investment portfolios, sparsity solutions are also desirable as investors frequently require to limit the number of positions they maintain in a portfolio. By considering suitably large values of Ï„ in (3.1), Brodie et al.(2008) also illustrates geometrically how the addition of the L1 term to the unconstrained risk minimization encourages sparse solutions.
It regulates the amount of short positions in the portfolio in the optimization process.
Based on the constraint (3.3), we can rewrite the objective function in (3.1) to
(3. 4)
From (3,4) we could see that, as the last term is the coefficient constant that does not depend on the choice of the weight factor, it is clear that the only penalization is done with assets with negative weights, i.e. short positions. In other words, under the constraint (3.3), the L1 penalty is thus equivalent to a penalty on short positions. Therefore, with a considerably large Ï„ chosen, most of the components converge to their zero limit, the optimal solution will be a portfolio with no short positions and a sparse one with limited active positions. While in the L1-penalized objective function decreases, the constraint is relaxed by not removed completely; it then no longer imposes positivity absolutely, but still penalizes overly large negative weights (Brodie et al.(2009)).
A restriction to non-negative-weights-only portfolios corresponds to the largest value in , and hence results in the sparest solution can be proved as below:
Suppose and are minimizers of (2), corresponding to the values and respectively, and both satisfy the two linear constraints (3), (4). We have the following:
Hence,
If some of the are negative, but are all non-negative, then we have .This shows.
It shows the optimal portfolio with non-negative entries corresponds to the largest value of , and typically results in the spareset solution, as the penalty term is weighted more heavily which promotes sparsity.
It stabilizes the problem.
By imposing a proper penalty on the size of the coefficients of , the resulting allocation depend less sensitively on the input vectors, and hence the optimization problem is stabilized. To be specific, the regularization help alleviate the effects of possible multicollinearity when is badly conditioned as noted in the drawbacks of the Markowitz' original problem. In fact, in Daubechies, Defrise, and De Mol (2004), it is proved that any Lp penalty on w, with 1<=p<=2, suffices to stabilize the minimization functions in (1) by regularizing the inverse problem. On the other hand, Brodie et al.(2008) shows geometrically that sparsity is encouraged when 0<=p<=1. Therefore, p=1, the L1 penalty discussed in our optimization has both desirable features. Moreover, Fan et al.(2009) has proved that for a wide range of the constraint parameters, the optimal portfolio does not sensitively depend on the estimation errors of the input vectors. Also, the empirical and theoretical risks are also approximately the same for any allocation vector satisfying the gross-exposure constraint.
It makes a possible proxy to account for transaction costs in a natural way.
In reality, investors not only care about the choice of the assets they trade, but are also concerned with the transaction costs they incur when creating and liquidating the positions in their portfolios. Transaction costs in a liquid market usually are incurred because of the commission fees for the brokers. Usually the amount charged will be proportional to the total of transacted amount multiplied by the bid-ask spread applicable to the size of the transaction. Although there is a minimum fixed fee charge, the amount is usually ignorable for moderate to large investors. Hence, the total transaction cost can be seen as proportional to the absolute sum of capitals invested in each asset, i.e., the transaction cost is then effectively captured by a L1 penalty.
3.4 LARS-LASSO Algorithm to the L1 Constrained Risk Minimization Problem
LARS algorithm/a constrained homotopy, presented by Efron, Hastie, Johnstone, and Tibshirani(2004), allows us to compute portfolios with the L1 penalty. The nice feature of LARS is, it does not require separate computations to find solutions for each value of Ï„, rather, by exploiting the piecewise linear dependence of the solution on Ï„, it obtains, in one run, the weight vectors for all values of Ï„ (i.e. for all numbers of selected assets) in a prescribed range.
To put it in a simple way, The LARS procedure works by starting with all coefficients equal to zero, and find the predictor most correlated with the response, say x1. The largest step possible in the direction of this predictor is taken until another predictor, say x2, has as much correlation with the current residual. At this point, LARS proceeds in a direction equiangular between the two predictors until a third variable xj3 enters into the "most correlated" set, by having the same absolute correlation with the working residual as those with x1 and x2. LARS then proceeds equiangularly between x1, x2 and x3, that is, along the "least angle direction," until a fourth variable enters, and so on. The LARS algorithm can be adapted to deal with a general â„“1-penalized minimization problem, of which the slopes have to be recomputed at every breakpoint, the increases and decreases one single index j one at a time, so that the full set of Lasso solutions can be generated by the modified LARS algorithm.
The graph below illustrates how Lasso computes a solution path for variable selections with varying weight values(US10 in Example 1; Total asset number is 10).
However, we still have two linear constraints (3.2) and (3.3) to take into consideration.
As R programming is only utilizable for the Lasso problem which has the following form in (3.1)
We need to find incorporate the two linear constraints into the quadratic form.
(3. 5)
And Equation (3.1), (3.2) and (3.3) are equivalent with the form below
(3. 6)
By choosing and sufficiently large, we could ensure the linear constraints to be realized by forcing the two quadratic terms to be equal to zero, which is just a transformation for the two linear terms.
Hence we shall enforce relatively large values for and , so that whenever the two quadratic terms do not equal to zero, the terms will be penalized harshly. The next step then is to adapt the equation in (3. 6) to the equivalent form that is built in R to call for the LARS-LASSO algorithm, which only deals with the equation in (3.5) and hence we need to combine the three quadratic forms in (3.6) to the form in (3.5).
We could show that we need to modify the response vector and the input matrix .
As equation (3.5) and (3.6) can be expanded as shown below:
And
By comparing the terms and combine the common weight vector, we could derive the modified inputs in R: where
Now the next problem boils down to choose a proper and , without loss of generality, we assume that they could be chosen to be the same although not necessary, as long as is sufficiently large to ensure both quadratic terms are fully penalized to zero.
Based on the nature of datasets that is available, a program is run to test the tracking effect of the model when varying the value of , since our objective is to choose an arbitrary large compared to to ensure that the two terms associated with are fully penalized before penalizing the l1 norm term.
A portfolio with ten assets(US10, detailed description in the Example 1) was used to fit the Lasso model with only non-negative positions being the portfolio selection criterion. As we have proved earlier, the optimal portfolio with non-negative entries corresponds to the largest value of, therefore we vary the value of in a range of [1, 2000] while keeping other conditions fixed to compute the fitted model required return. We would want the Lasso fitted model do not have computed weights affected by the variation of .
Below is a graph of how the fitted model's required return gradually become stabilized especially when increases to around 250. As a small sample size was used, we specify to be 1000 in R programs for a conservative reason.
The table shows how the computed weights proportions are similar without much influence from the value of when it changes from 500 to 1000.
[500,]
0.05535
0
0.40316
0
0.20207
0.09627
0.00314
0.07135
0.03002
0.13863
[1000,]
0.05535
0
0.40316
0
0.20208
0.09627
0.00314
0.07135
0.03003
0.13863
4. Data and Methodology
4.1 Specification of the Three Datasets
300 monthly returns from January 1986 through December 2010 on stocks traded on the New York Stock Exchange (NYSE) and Nikkei 225 are employed. The stock returns are extracted from Bloomberg database. The first dataset consists of 10 largest stocks traded in NYSE. In the second example, we use 40 stocks selected from Nikkei 225 top 100 stocks based on market capitalization, while the third dataset consists of 50 stocks selected from NYSE top 100 stocks.
The criterion to select the stocks in our examination is described as follows:
1. Companies which were not on the list at the starting point and entered the Nikkei 225 or NYSE at different dates afterwards are excluded.
2. Large companies are taken preference over small and medium-sized companies, with market cap larger than 10 billion.
4.2 Performance Comparison Criteria
In the following section we compare the performance of regularized portfolio selection, in particular, the L1 norm constraint optimization approach with the classical Markowitz MV portfolio approach on real market data. Also a parallel performance comparison is made against another benchmark portfolio, which is a naïve, equal-weighted portfolio(1/N portfolio) consisting of a large number N of individual stocks. Although the 1/N strategy is simply an equal investment in each available security, it is noted that 1/N strategy is a tough benchmark since it has shown to outperform a host of optimal portfolio construction strategies, including the traditional Markowitz strategy most of the time. ( DeMiguel, Garlappi, and Uppal, 2007) .
The out-of-sample performance of the L1 penalized portfolio and the classic Markowitz portfolio relative to that of the 1/N portfolio across three empirical datasets of monthly returns would be compared, using the following two performance criteria: (i) the out-of-sample Sharpe ratio and (ii) the standard deviation , which was computed when we derive Sharpe ratios.
4.3 Computation Procedures
The analysis is based on a "rolling-sample" approach. Specifically, given a T -month-long dataset of asset returns, an estimation window of length M = 60 months was chosen. In each month t, starting from t = M + 1, the data in the previous M months is used to estimate the portfolio weights by targeting the return to be the mean of 1/N-strategy portfolio of the same period M. Then based on either classical Markowitz optimization strategy or the L1 norm constraint optimization strategy, the weight vector is computed and used to compute modified returns for the followed 12 months, assuming the portfolio have a rebalance frequency of 12 months. After every 12 months, the weight parameters are then recursively computed using the preceding M months' returns, until the end of the dataset is reached. The outcome of this rolling-window approach is a series of (T - M) monthly out-of-sample returns generated by each of the portfolio strategies of the empirical datasets. Hence we would obtain the time series of monthly out-of-sample returns generated by each strategy and then we measure their out-of-sample Sharpe ratio as well as the out-of-sample standard deviation.
For the L1 norm constraint optimization strategy, one of the selection criteria can be to select portfolios with non-negative weights, i.e. no short positions. As the LARS-LASSO technique provides a series of weights under different value of , there will be a few optimal choices for the weight vector, in that case, the two linear constraints would be double checked to make sure the weights for the selected non-negative portfolio should be near 1 and the estimated return should be as close to the required return as possible. Another selection criterion for the L1 norm strategy is that a particular number of assets are targeted, so that only the portfolio with targeted asset number and smallest tracking error, , is selected every rebalance period.
4.4 Empirical Results Evaluation and Analysis
Example 1: US10
Table 1: Performance of the sparse portfolio with no short-selling, for US10
Evaluation Period
Equal Weight
Non-negative Weight
Markowitz Weight
m
σ
S(in %)
m
σ
S(in %)
m
σ
S(in %)
01/91-12/10
0.00484
0.04355
11.10988
0.00085
0.04346
1.94565
-0.00085
0.04689
-1.81457
01/91-12/95
0.00778
0.03681
21.13767
0.00007
0.03937
0.18903
-0.00099
0.04088
-2.41997
01/96-12/00
0.00870
0.05037
17.27623
0.00428
0.05489
7.79185
0.00607
0.05585
10.87444
01/01-12/05
-0.00157
0.03816
-4.11915
-0.00246
0.03805
-6.47533
-0.00457
0.04273
-10.69184
01/06-12/10
1.01815
0.00444
0.04764
0.00150
0.04021
3.71773
-0.00392
0.04708
-8.32542
In Table 1, three portfolio strategies are tested for their performance over 12 consecutive months immediately following their construction; the out-of-sample returns based on different strategies (hence different allocation weights) are pooled over 5 years to compute monthly mean return m, standard deviation of monthly return σ and Sharpe ratio S (expressed in %).
The table shows that the Sharpe ratios of 1/N-strategy portfolio for the whole sample period as well as for consecutive sub-periods extending over 5 years each, are consistently higher than the optimal no-short-positions portfolio, while the classic Markowitz portfolio perform the worst, with almost all negative Sharpe values. Except in period 01/96-12/00, Markowitz portfolio is higher than that of no-short-positions portfolio. Also, the standard deviations for no-short-positions portfolio are not always the smallest compared to the other two, this is because sample size is small, and the regularization effect is not significant.
The limited penalization effect can also be shown in Figure 1 below. The optimal portfolio without short positions can sometimes reach 10, which is the whole sample size. In that case, the spasity and stability(in terms of standard deviation) effects are not obvious and hence the other two datasets are presented to illustrate the advantages of using the L1 norm strategy.
Figure 1: Number of assets with no short positions, for US10. It ranges from 6 to 10 for the optimal non-negative-weights portfolio from year to year. The average over 25 years is around 8.
Example 2: Japan40
Table 2: Performance of the sparse portfolio with no short-selling, for Japan40
Evaluation Period
Equal Weight
Non-negative Weight
Markowitz Weight
m
σ
S(in %)
m
σ
S(in %)
m
Σ
S(in %)
01/91-12/10
0.00462
0.05636
8.18922
0.00330
0.04289
7.68297
0.00586
0.09220
6.35890
01/91-12/95
0.00455
0.05979
7.61124
0.00499
0.05450
9.15635
0.00772
0.12494
6.17982
01/96-12/00
0.00523
0.05070
10.31515
0.00568
0.03849
14.74671
0.00741
0.11243
6.59445
01/01-12/05
0.01091
0.05008
21.78394
0.00733
0.03352
21.85830
0.00951
0.05163
18.41268
01/06-12/10
-0.00223
0.06422
-3.47201
-0.00481
0.04223
-11.39433
-0.00119
0.05872
-2.02736
In table 2, following the same construction methodology as shown in table 1, we could observe that in period 01/91-12/95, 01/96-12/00 and 01/01-12/05, the no-short-positions portfolio has the best performance in in terms of Sharpe ratio, as it is higher than the other two portfolios. Except for period 01/06-12/10, the Sharpe raio was negative due to a negative mean return. It has been noted that the return in Japan after 2000 onwards has dropped significantly, thus the negative values are understandable. In terms of standard deviations, again the no-short-positions portfolio outperforms the other two portfolios throughout the whole period. It suggests that Lasso's stability property help limit the estimation error and hence has a better stable performance than the other two.
From the table below it is observed that the regularized portfolio reduces the number of assets from a total sample size of 40 to an average value of 11, which suggests the spasity of the L1-norm penalized portfolio. Thus with the dataset of Japan40, Lasso seems to be a good superior strategy for the portfolio selection.
Figure 2: Number of assets with no short positions, for Japan40. It ranges from 6 to 17 for the optimal non-negative-weights portfolio from year to year. The average over 25 years is around 11.
Figure 3 The Sharpe ratio, for the full period 1986-2010, for various optimal sparse portfolios. It was based on the second criterion of choosing an optimal sparse portfolio with different fixed numbers. The no-short-positions optimal portfolio is indicated by a horizontal blue line, stretching from 6 to 17(its minimum to maximum number of assets; see also Figure 2.)
It clearly shows that when targeting a portfolio with a larger asset amount, the portfolio selected has an inferior performance due to the instability, and is worse than the performance of sparse portfolios without short positions.
Example 3: US50
Table 3: Performance of the sparse portfolio with no short-selling, for US50
Evaluation Period
Equal Weight
Non-negative Weight
Markowitz Weight
m
σ
S(in %)
m
σ
S(in %)
m
σ
S(in %)
01/91-12/10
0.00664
0.04917
13.49430
0.00105
0.04225
2.49362
-0.00537
0.11694
-4.59295
01/91-12/95
0.00896
0.03482
25.74913
0.00167
0.04202
3.96372
-0.00359
0.13194
-2.72345
01/96-12/00
0.00688
0.04780
14.38383
0.00583
0.04753
12.26397
0.00660
0.13915
4.74411
01/01-12/05
0.00209
0.04011
5.20889
-0.00163
0.03699
-4.41330
-0.01339
0.10117
-13.23334
01/06-12/10
0.00861
0.06825
12.61897
-0.00165
0.04244
-3.88247
-0.01110
0.09023
-12.30698
In table 3, following the same construction methodology as shown in table 1, we could observe that throughout the evaluation period, 1/N strategy cannot be outperformed. The no-short-positions portfolio has the second best performance in terms of Sharpe ratio, while the classic Markowitz perform the worst. It suggests that the naïve strategy is quite robust. However, in terms of standard deviation, non-negative weight portfolio has a relative small standard deviation with only a few asset positions compared to the other two. It can be obseravble based on Table 1, 2 and 3 that the not very satisfatory Sharpe ratio performance of the L1 norm strategy is because the no-short-positions portfolio have a inferior performance in terms of the out-of-sample estimated mean returns compared to the other two. Therefore although the standard deviation is limited, its performance may still not be able to beat the equal weighted portfolio. It seems to suggest that whether non-negative weight portfolio could outperform the equal weighted portfolio also depends on the nature of the dataset, as table 2 shows that non-negative weight portfolio does provide a satisfactory performance. While in the US case, the fluctuation within data itself might have limited the L1 norm peneralized portfolio to outperform the benchmark portfolio.
Figure 4: Number of assets with no short positions, for Japan40. It ranges from 11 to 22 for the optimal non-negative portfolios, from year to year. The average over 25 years is around 17.
Figure 5 The Sharpe ratio, for the full period 1986-2010, for various optimal sparse portfolios. It was based on the second criterion of choosing an optimal sparse portfolio with different fixed numbers. The no-short-positions optimal portfolio is indicated by a horizontal blue line, stretching from 11 to 22 (its minimum to maximum number of assets; see also Figure 4.)It is observable that a lot of Sharpe ratios are below zero, which indicates the unstable performance of including more assets. It suggests that data variation is so large that an estimation using historical data to allocate the weights on assets for future may not be sufficient at all. Rather, keeping a naïve portfolio where equal investment is made achieves higher Sharpe ratio, in this example, is 0.00664, for 1/N-strategy portfolio, slightly higher than 0.
To further confirm the inferior performance is not caused due to the implemented strategy in terms of rolling window period and rebalance period. They are tested using R programming with US50 dataset, by varying the rolling window, extending it from 5-year period(originally), to 7-year period and 9-year period. Also by keeping other things fixed, the portfolio is also rebalanced with a higher frequency, ranging from 24 months, 12 months(originally), 6 months to 3 months and 1 month. The detailed result table is presented in the Appendix 2. From which it suggests that the result is consistent in terms of portfolio performance, that is, the 1/N strategy outperform the L1 norm strategy with no-short-positions, and with the classic Markowitz portfolio perform worst for most of the time. Moreover, it is worth noticing that the standard deviation for the L1 norm is consistently relative small compared to the other two strategies.
4.5 Summary of the Empirical Analysis
In the three datasets we have, it is noticeable that sparse portfolio with no-short-positions works well with limiting the number of active positions, while at the same time, produces a relatively small standard deviation, and hence stabilize the portfolio in the out-of-sample portfolio performance. However, due to its limited positions in a few assets, it sometimes cannot justify its inferior mean return with respect to its small standard deviation to achieve higher Sharpe ratios than the other two, and hence the benchmark portfolio outperform both no-short-positions portfolios as well as the Markowitz's portfolio for most of the time.
Also, the empirical data suggests that the Markowitz' portfolio did not perform well, with many negative Sharpe ratio values shown in all three datasets. Also, the standard deviation, the variance otherwise, is quite large among the three methods, indicating an unstable performance and high volatility. It may be contributed mainly due its ill-condition matrix which results in extreme values in weights, and hence produces bad performance results.
5. Limitations and Possible Extension
5.1 The Empirical Results Limitations
The discussion of the empirical results is restricted to the traditional Markowitz mean-variance approach. In fact, a variety of modification on the portfolio structure, such as the adoption of factor models, etc., could also be applied.
It could be noticed that Markowitz portfolio having an unsatisfactory performance among the three methods might be attributed to that it targeted at the mean return fixed by 1/N strategy, which may be on the Efficient Frontier but with extreme weights computed and hence high variance and bad performances. If the targeted return is the minimum-variance portfolio, then the performance result may favor Markowitz better.
5.2 Partial Index Tracking
Investors would like to track the index's performance by pursuing a passive investment strategy, thinking that the market cannot be beaten. However, as the index composes a large number of assets, it would be inefficient and unrealistic to buy and sell all the assets for a full replication of the index. Therefore the L1 penalty can be practically generalized in to track an index effectively, with a smaller set of assets. We would have an objective function, similar to the one in (2):
Which seeks to minimize the expected tracking error while at the same time enforces sparsity and stabilize the problem of which may contain collinear assets. The index could be any existing financial index or other abstract financial time series such as investor sentiment time series, etc.
5.3 Portfolio Adjustment
The constructions in consecutive years specified in 4.3 Computation Procedures are not meant to model the behavior of a single investor. Rather, it models how the optimal portfolios are built for different investors who follow the same strategy each year. For single investors, it is possible to adopt a sparse portfolio adjustment strategy in subsequent years. Using a small modification to our original constraint minimization formulae, we could easily formulae the problem, where
s. t. ,
5.4 The L2 Norm Regularization
The L2 norm, the ridge regression, as introduced in 3.2 Regularization, is not used for variable selection but it also has a shrinkage property which stabilizes the original problem. Hence we would not expect it to deduce a sparse solution. But rather, it is a good method to regularize the covariance matrix and derive weights with less extreme values.
6. Conclusion
In summary, the traditional Markowitz portfolio, its computation algorithm and its drawbacks in terms of practical implementation has been presented. The classic Markowitz mean-variance optimization strategy is theoretically influential, however, the inversion required during the estimation process lead to aggregated estimation error and extreme values in computed weights, especially in a dense covariance portfolio case with a vast asset size. The unsatisfactory performance of the Markowitz portfolio has lead to many modification methods. Among them, the preceding paragraphs have shown with empirical examples that, the L1 norm regularized portfolio strategy, not only produces a sparse portfolio, but also limits the variation and estimation error. Although in terms of Sharpe ratio, the L1 norm strategy does not outperform the naïve, equal weighted portfolio most of the time, the data application suggests that it does have shown a consistent behavior in promoting the portfolio's sparsity and stability.
7. Appendix
1 The List of the Companies.
Group 1: The list of 10 assets in New York Stock Exchange:
XOM US Equity, GE US Equity, IBM US Equity, WMT US Equity, CVX US Equity, PG US Equity, JPM US Equity, WFC US Equity, JNJ US Equity, T US Equity.
Group 2: The list of 40 assets in Nikkei 225:
7203 JP Equity, 7267 JP Equity, 7751 JP Equity, 8058 JP Equity, 7201 JP Equity, 9501 JP Equity, 4502 JP Equity, 6954 JP Equity, 6758 JP Equity, 6902 JP Equity, 8031 JP Equity, 6752 JP Equity, 6301 JP Equity, 8802 JP Equity, 6501 JP Equity, 6502 JP Equity, 6503 JP Equity, 5401 JP Equity, 4063 JP Equity, 9503 JP Equity, 8604 JP Equity, 6971 JP Equity, 9502 JP Equity, 8053 JP Equity, 4503 JP Equity, 4901 JP Equity, 8801 JP Equity, 8001 JP Equity, 5108 JP Equity, 5201 JP Equity, 4452 JP Equity, 7011 JP Equity, 6326 JP Equity, 8002 JP Equity, 2503 JP Equity, 6702 JP Equity, 7269 JP Equity, 5405 JP Equity, 9531 JP Equity, 8035 JP Equity.
Group 3: The list of 50 assets in New York Stock Exchange:
XOM US Equity, GE US Equity, IBM US Equity, WMT US Equity, CVX US Equity, PG US Equity, JPM US Equity, WFC US Equity, JNJ US Equity, T US Equity, PFE US Equity, BAC US Equity, KO US Equity, SLB US Equity, HPQ US Equity, COP US Equity, VZ US Equity, PEP US Equity, MRK US Equity, DIS US Equity, MCD US Equity, OXY US Equity, UTX US Equity, AIG US Equity, ABT US Equity, MMM US Equity, CAT US Equity, HD US Equity, F US Equity, AXP US Equity, USB US Equity, BA US Equity, MO US Equity, DD US Equity, UNP US Equity, UNH US Equity, EMR US Equity, CVS US Equity, HON US Equity, BMY US Equity, DOW US Equity, MDT US Equity, APA US Equity, NKE US Equity, LLY US Equity, TXN US Equity, DE US Equity, HAL US Equity, WAG US Equity, BK US Equity.
2 Complementary Data Results
Extended US50 Table, with various rolling window and rebalance period
Rolling Window: 5 Years, 7 Years, 9 Years
Evaluation Period
Rolling Window- 60 Months(5 Years)
Equal Weight
Non-negative Weight
Markowitz Weight
m
σ
S(in %)
m
σ
S(in %)
m
σ
S(in %)
01/91-12/95
0.00896
0.03482
25.74913
0.00167
0.04202
3.96372
-0.00359
0.13194
-2.72345
01/96-12/00
0.00688
0.04780
14.38383
0.00583
0.04753
12.26397
0.00660
0.13915
4.74411
01/01-12/05
0.00209
0.04011
5.20889
-0.00163
0.03699
-4.41330
-0.01339
0.10117
-13.23334
01/06-12/10
0.00861
0.06825
12.61897
-0.00165
0.04244
-3.88247
-0.01110
0.09023
-12.30698
Evaluation Period
Rolling Window- 84 Months(7 Years)
Equal Weight
Non-negative Weight
Markowitz Weight
m
σ
S(in %)
m
σ
S(in %)
m
σ
S(in %)
01/93-12/97
0.008429
0.031908
26.41535
0.004797
0.038906
12.33002
0.00867
0.051353
16.8835
01/98-12/02
-0.00106
0.052933
-1.99638
-0.00415
0.051059
-8.13157
-0.00652
0.067833
-9.60981
01/03-12/07
0.008277
0.026281
31.49204
0.005713
0.029383
19.44188
0.003827
0.044917
8.520213
Evaluation Period
Rolling Window- 108 Months(9 Years)
Equal Weight
Non-negative Weight
Markowitz Weight
m
σ
S(in %)
m
σ
S(in %)
m
σ
S(in %)
01/95-12/99
0.009639
0.044066
21.87411
0.005345
0.044904
11.90241
0.013995
0.056107
24.94337
01/00-12/04
0.002719
0.044167
6.156981
-0.00358
0.044491
-8.04525
-0.00941
0.05273
-17.8393
01/05-12/09
0.005848
0.064555
9.058432
0.00128
0.039053
3.276905
-0.00769
0.059653
-12.8849
Rebalance Frequency: Every 24 Months, 12 Months, 6 Months, 3 Months, 1 Month
Evaluation Period
Rebalance Frequency -Every 24 Months
Equal Weight
Non-negative Weight
Markowitz Weight
m
σ
S(in %)
m
σ
S(in %)
m
σ
S(in %)
01/91-12/95
0.008965
0.034815
25.74913
0.003893
0.039347
9.893747
0.011173
0.09468
11.8013
01/96-12/00
0.006875
0.047797
14.38383
0.007495
0.047695
15.71409
-0.00556
0.132852
-4.18347
01/01-12/05
0.00209
0.040115
5.208886
-0.00312
0.038178
-8.17121
-0.00825
0.087193
-9.46207
01/06-12/10
0.008612
0.068249
12.61897
0.000412
0.043633
0.9445
-0.02241
0.111465
-20.103
Evaluation Period
Rebalance Frequency -Every 12 Months
Equal Weight
Non-negative Weight
Markowitz Weight
m
σ
S(in %)
m
σ
S(in %)
m
σ
S(in %)
01/91-12/95
0.00896
0.03482
25.74913
0.00167
0.04202
3.96372
-0.00359
0.13194
-2.72345
01/96-12/00
0.00688
0.04780
14.38383
0.00583
0.04753
12.26397
0.00660
0.13915
4.74411
01/01-12/05
0.00209
0.04011
5.20889
-0.00163
0.03699
-4.41330
-0.01339
0.10117
-13.23334
01/06-12/10
0.00861
0.06825
12.61897
-0.00165
0.04244
-3.88247
-0.01110
0.09023
-12.30698
Evaluation Period
Rebalance Frequency -Every 6 Months
Equal Weight
Non-negative Weight
Markowitz Weight
m
σ
S(in %)
m
σ
S(in %)
m
σ
S(in %)
01/91-12/95
0.008965
0.034815
25.74913
0.002488
0.041819
5.949139
0.005761
0.144146
3.996422
01/96-12/00
0.006875
0.047797
14.38383
0.004807
0.048891
9.831344
0.0004
0.12892
0.31051
01/01-12/05
0.00209
0.040115
5.208886
-0.00248
0.03714
-6.67739
-0.00343
0.084453
-4.05855
01/06-12/10
0.008612
0.068249
12.61897
-0.00198
0.041944
-4.71574
-0.01588
0.100724
-15.7692
Evaluation Period
Rebalance Frequency -Every 3 Months
Equal Weight
Non-negative Weight
Markowitz Weight
m
σ
S(in %)
m
σ
S(in %)
m
σ
S(in %)
01/91-12/95
0.008965
0.034815
25.74913
0.00222
0.041619
5.334534
0.002258
0.125809
1.795044
01/96-12/00
0.006875
0.047797
14.38383
0.003851
0.049559
7.770152
-0.00253
0.10967
-2.30574
01/01-12/05
0.00209
0.040115
5.208886
-0.00334
0.036298
-9.21262
-0.00532
0.071033
-7.48274
01/06-12/10
0.008612
0.068249
12.61897
-0.00126
0.040797
-3.09689
-0.02472
0.179795
-13.7502
Evaluation Period
Rebalance Frequency -Every 1 Month
Equal Weight
Non-negative Weight
Markowitz Weight
m
σ
S(in %)
m
σ
S(in %)
m
σ
S(in %)
01/91-12/95
0.008965
0.034815
25.74913
0.002125
0.040327
5.269144
0.002125
0.11939
1.779849
01/96-12/00
0.006875
0.047797
14.38383
0.003971
0.047967
8.27851
-0.00407
0.109531
-3.71967
01/01-12/05
0.00209
0.040115
5.208886
-0.00416
0.036617
-11.3502
-0.0057
0.070638
-8.06433
01/06-12/10
0.008612
0.068249
12.61897
-0.00152
0.040544
-3.74152
-0.02963
0.127725
-23.1996
One exemplary LASSO graph for Japan40
One exemplary LASSO graph for US50
3. Program Codes