It is impossible to start a review of previous literature on fund performance without first mentioning Jensen's (1968) research. Jensen's (1968) early paper is a strong advocate against any realistic added value from fund managers' stock picking abilities. Jensen names this skill "predictability" and defines it as the capacity of a manager to correctly anticipate the future value of risky assets in order to gain above normal returns. His research was a cornerstone in fund performance analysis, as his paper gave birth to the widely used "alpha" measure. Jensen's alpha is defined by:
Where is the expected portfolio return, is the risk free rate, is the portfolio Beta based on the Capital Asset Pricing Model, and represents the expected market return.
Using alpha, Jensen showed that through an analysis of 115 mutual funds between 1955-1964, US mutual fund managers were on average incapable of using their predictability skills to outperform any "buy-the-market-and-hold" strategy. Henriksson (1984) found similar results investigating the market timing performance of 116 mutual funds between 1968 and 1980. He used both parametric and non-parametric testing to analyse monthly return data which was inclusive of dividends and fund management fees. Ultimately, both tests showed that fund managers failed to follow a consistent successful strategy, and that they were not able to predict accurately small or large movements of the market portfolio.
However, one of the arguable shortcomings of Henriksson (1984)'s study was the way he considered fund returns as a whole, rather than look at the specifics. This is why Grinblatt and Titman (1993) took a different approach, and instead of analysing the returns that investors make from holding the fund looked at the performance of individual stocks within the funds' portfolios. Doing so has several advantages, namely that being specific at an individual stock level allows them to create custom benchmarks that better capture the fund manager's individual investment style. This also means that the actual returns found do not include any fees, trading costs, or expenses which would otherwise alter their findings. Though it could be argued that doing so would actually overestimate the true value of managers' predictability skills, it would still remain a meaningful result since the benchmarks created also do not include any fees or expense. Their paper actually found that between 1975 and 1984, aggressive-growth funds did outperform their custom benchmarks by 2 to 3 percent (before expenses). It is important to note that Grinblatt and Titman's finding swere somewhat criticized, as the sample of mutual funds they used was relatively small (around 275). Furthermore, they had only done research on a 10 year period. Finally, some question the fact that their paper did not take into account anomaly factors such as book-to-market, size or momentum.
Yet another approach to fund performance was developed a bit further down the line by Daniel, Grinblatt, Titman and Wermers (1997) who use "Characteristic timing" and "Characteristic Selectivity" in order to measure portfolios' performances in 2500 equity funds between 1975 and 1994. Characteristic timing is defined by the ability of the fund manager to alter portfolio weights accordingly in time of potential abnormal return. On the other hand, characteristic selectivity is the manager's aptitude to stock-pick specific assets to bring superior performance. Their research showed that though some aggressive-growth funds exhibited partial selectivity benefits, most funds failed to convert characteristic timing into above-normal returns. It is interesting to note that though their approach and measures were different from Grinblatt and Titman (1993)'s paper, their findings did partially match. However as a whole, Daniel et al. found that mutual funds do on average tend to beat their passive benchmarks, though this is usually by less than 100 basis points, which would then be engulfed by management fees. This seems to corroborate what Grinblatt and Titman found in 1993.
Brands, Brown & Gallagher (2005) discuss the common perception that increased portfolio concentration on specific asset classes or sectors is usually a valid way of trying to outperform the market. The magnitude to which the fund manager's asset allocation strategy deviates from passive benchmarks should therefore be a valid measure of skill, as well as the value of active management. Their paper proved the existence of a positive relation between higher manager input (i.e. portfolio concentration as well additional stock picking) and superior performance in Australian equity funds. This seems particularly the case for portfolios that are both overweight and hold stocks outside popular benchmarks such as the S&P500. Wermers (2003) support this research, by finding a similar relation in a sample of US mutual funds.
More recently, Petajisto (2010) studies US domestic all-equity mutual funds and divides them into several categories of style using "Active Share" and "Tracking Error". Tracking error would be defined as the volatility of the difference between the returns of a fund against its assigned benchmark, typically a market index. Active share simply is the amount of portfolio manager input into building the portfolio so that it differs from the passive indexes. Petajisto particularly focuses on "closet indexing", a phenomenon that is increasingly present in the mutual fund sphere. This occurs when funds sell active management at a premium, when actually their portfolio holdings are nearly identical to any low cost index fund. He finds that there a significant increase of closet indexing between 2007 and 2009, which now encompasses about a third of all mutual funds. Petajisto believes that this can in part be attributed to above-normal market volatility from the credit crisis, as similar growth was found between 1999 and 2002 which corresponds to the "dot com" bubble. Overall, Petajisto (2010) finds an average mutual fund performance of -0.41% compared to low cost index fund benchmarks. As a number of them secretly just replicated their benchmarks, after taking into account fees and expenses, their performance lagged behind by exactly that amount. Interestingly, funds who were most active in stock picking added the most value to investor capital. On average they exceed their benchmark performance by 1.26% net of all fees and expenses. By using a multivariate regression, Petajisto (2010) found similar results in each fund category, also finding that active share has most predictive powers in finding alpha in small-cap funds though it still has statistical and economic significance in large-cap investment pools.
In the same year, Fama & French (2010) studied a sample of 3,156 funds which focus on U.S equity investments. They used a perspective they call "equilibrium accounting" which implies that the aggregate alpha value is equal to zero before taking into account fund costs. They found that between 1984 and 2006, most mutual funds underperformed both three and four factor model benchmarks as well as the CAPM which was mainly due to fund fees and expenses. This means that if some portfolio managers within the sample were actually able to produce similar returns to their benchmarks, these were hidden by the weight of fund costs. At a more individual level, they found it difficult to provide any meaningful evidence of managerial skill versus luck in excess returns. However Fama & French (2010) do find that very few funds actually have enough active management skill to cover their costs using a distribution of Jensen's alpha estimates using net returns. When using gross returns, the opposite becomes true, where the evidence for skill in winner funds or lack of it in loser funds becomes much clearer. But again, for the majority of funds within the $5 million Assets Under Management (AUM) sample, few had enough skill to beat their benchmarks.
ii.ii UK Evidence
The main limiting factor in previous research on fund performance is that most of it emanated from the United States, or at least studied funds based in the U.S. Luckily, Fletcher (1997) looks at the performance of 85 UK mutual funds that invest in North American securities between 1985 and 1996. Fund excess returns were calculated on the basis of the Jensen (1968) measure as follows:
"where rit is the excess return on trust i in period t, rjt is the excess return on the jth benchmark portfolio in period t for j 5 1, . . . , K, bij is the beta of asset i relative to factor j, K is the number of portfolios in the benchmark, and eit is a random error term with E(eit) 5 0 and E(eitrjt) 5 0 for j 5 1, . . . , K.". p456. These excess returns are then benchmarked against two sets of portfolios: one constructed as the excess return from the S&P 500 index, the second as a three index model. In order to appropriately capture each fund's investment style, they were then divided into four categories: Growth, Income, Special Situations/Smaller Companies, and General. The study revealed that there was no evidence of superior performance by the funds compared to their benchmark, and that no significant predictability in performance was possible. Quigley and Sinquefield (1998) use a similar methodology, though they consider 752 UK equity unit trusts between 1978 and 1997. In order to complete their analysis, they complete their testing before and after adjusting for risk using Jensen's alpha, as well as three factor model based on market risk, value and size. Quigley and Sinquefield find that after trading costs are paid, top earning fund returns are not significantly higher than the sample average, however low earning funds appeared markedly worst.
Blake and Timmermann (1998) also studied UK Mutual fund performance through a large sample of 2300 UK open-ended funds over the period 1972-1995. Using multiple regressions, they did find signs of persistence in performance (positive and negative) in both best and worst performing funds. Black and Timmermann argue that these findings are surprising as the spread for UK mutual funds are typically higher than those found in the United States which would make it more costly for investors to transfer their capital from a poorly performing fund to a top-performing one. Allen and Tan (1999) also found partial evidence of positive performance persistence within a sample of 131 UK equity mutual funds between 1989 and 1995. They used an alternative approach to performance analysis of previous studies, by comparing the relative performance of a fund against the sample itself rather than to an index benchmark (FTSE100 or S&P500). To complete this comparison, Allen and Tan used four separate empirical tests including and Ordinary Least Squares (OLS) regression of excess returns, a table analysis of winners and losers as well as a Pearson's Chi squared tests on these tables, and Spearman Rank Correlation Coefficient analysis. These tests are executed on two groups of funds divided by low and high variance. Ultimately, this outlined that superior performance does persist on long time horizons however it doesn't in the very short term. Furthermore, both low and high variance groups exhibited persistent "winners" which indicates that risk-taking is not a necessity for superior performance. Heffernan (2001) finds similar results examining 288 UK investment trusts divided into eight categories between 1994 and 1999. Similarly to Allen & Tan, Heffernan benchmarked the average annual performance of each fund against its respective category. He found that there were no relation between higher fees and better fund performance, and that there was some evidence of persistent success over the longer term both in terms of performance and variance.
Finally, Cuthbertson, Nitzsche, & O'Sullivan (2008) apply a cross-sectional bootstrap approach to UK equity mutual funds to distinguish if the persistence in returns they display is actually due to above-normal stock picking abilities, or if there are simply a consequence of luck. They found that only 5 to 10% of top performing funds actually do display superior performance due to stock picking abilities, whereas smaller stock funds did not perform well at all. There also seems to be an opposite relation between on and off-shore funds, as the former's performance is due to skill, and the latter to luck.
ii.iii. R² and style analysis
Sharpe's (1992) Return-style based analysis provides a solid framework that enables investors to compare the fund manager's asset allocation strategy to that of its benchmark. He justifies this methodology by asserting that "if it acts like a duck, assume it's a duck". Essentially, this approach consists of a regression analysis of historical returns against the identified benchmarks' performances. These returns are calculated using the Capital Asset Pricing Model (CAPM).
Furthermore the benchmark portfolio needs to represent a realistic passively managed alternative. By using this approach, an individual investor can evaluate how much return is due to the diversification skills of the portfolio manager, and how much is simply due to the replicating features of the fund. It is important to note that appropriate choice in the style benchmark is vital to ensure proper analysis. For example, if the mutual fund focuses on a particular asset class (i.e. equity, money market, etc.) or sector (technology, retail, healthcare etc.), the style benchmark should include these specific limitations. Because of the wide spectrum of mutual funds in the UK, it is impossible to match exactly each asset allocation strategy, creating a necessity for custom-made benchmarks. This is achieved by blending a combination that will generate the highest R² possible, R² being the fund returns' variance explained by the benchmarks' variance.
Ben Dor & Jannagathan (2002) define R² as:
with being the amount of return attributable to the portfolio manager's stock picking ability, also called "selection" and the amount of return associated with the replicating features of the fund to its benchmark, also called "style". In their study, Ben Dor & Jannagathan estimate the fund's style by using returns in a 36 month timeframe. This is mainly to reduce potential "noise" from shorter time horizons, but also to increase the accuracy of the fund's style exposure description. By computing the fund's return and subtracting to it the benchmark's return for that period, they can calculate the "selection" return.
Goyenko & Amihud (2008) argue that R² is a strong predictor of Mutual fund performance. They achieve this by regressing fund returns on those of multi-factor benchmark models, mainly by Fama and French (1993). The Carhart (1997) multifactor model is also widely use to analysis persistence in mutual fund performance. Essentially, it is based on similar parameters as the Fama-French model except Carhart adds a momentum factor which is defined as the difference in return between the 30% best performing stocks within an equally weighted portfolio over a 12-month period, and the 30% worst performing stock over a 12-month period. In their research, Goyenko & Amihud proved that a negative relation exists between R² and the mutual funds' performance measure by the CAPM's alpha. Ultimately, this means that a low R² indicates a large input into the asset allocation strategy by the portfolio manager compared to the passive benchmark. They believe that this relation supresses the need to use benchmarking as per Return Based Style Analysis (RBSA). Though many researchers in Finance make no reserve of the use of multi-factor models such as Fama French (1993) or Carhart (1997), it is important to note that the use of such models have been criticized as being too fitting in the broad context of the such research. For instance, Daniel and Titman (2012) believe that using a wider range than the "traditional" 25 size and book-to-market portfolios would make these Fama French (1993) a much more powerful testing tool. This is seconded by Lewellen et al (2010) who finds that expending the set of benchmark portfolios to also include industry portfolios has a positive effect on the validity of findings through a regressional analysis using those factors. After controlling for style using control variables, Amihud & Goyenko (2012) firstly found that funds within their sample which had the lowest R² produced the highest excess returns, with a maximum associated alpha value of 3.8%. Secondly, they found that some of their control variables such as fund size, or fund manager tenure, were actually quite closely correlated to R² to an extent that they can explain approximately 40% of the variation between funds. This helps them ascertain that indeed R² is a predictor of persistence in fund performance. Finally, the relation held true for mutual funds that invested in other products than equity, such as corporate bonds. These findings have quite a deep meaning for financial research. If their findings were tested in a varied range of markets and were proven to be consistently correct, then investors would have an approachable new tool to forecast the future performance of a selection of mutual funds.
ii.iv. Biases
Because the choice of style benchmark is so crucial in a successful style analysis, further research was done on the actual categorization that funds give themselves, and their accuracy.
Dibartolomeo and Witowski (1997) found that 9% of equity mutual funds were highly misclassified, while 31% were somewhat out-of-category. According to their research this could be due to the fact that existing classification systems are ambiguous, but also that competitive nature of the open-end fund industry pressures managers in rebalancing their portfolios outside their original mission statement. Kim, Shukla and Thomas (2000) support Dibartolomeo and Witowski's research by finding that 46% of 1043 funds analysed actually had investment attributes that matched their original mission statements. Over the three years of their study, 57% of the funds changed their investment approach at least once.
For the above reasons, Dibartolomeo and Witowski (1997) also argue that that return-based style analysis should prevail over similar approaches such as Holding-Based Style Analysis (HBSA) which solely rely on the fund manager's stated objectives and the investment style he declares. In order to achieve their style analysis, they used quadratic programming in order to measure the influence of different investment styles on each fund within their sample.
Elton & Gruber (2012) also pointed out biases that might negatively impact any return-based analysis. In their paper, they explain that at their inception many mutual fund families enter an "incubation" period. In this period, funds are opened with different style and limited capital. When the incubation phase is over, the underperforming funds are either merged or closed completely, leaving only the successful ones to be opened to the public. Because only the funds that make the cut will have historical data available, this creates an upwards bias in the information available. Evans (2010) supports this by applying the Fama-French four factor model on US equity mutual funds which ultimately outlined a 3.5% risk-adjusted outperformance compared to nonincubated funds. Though this means that they attract higher flows at their opening, any outperformance seems to disappear over time. Evans finds that using a Fund age control variable is a suitable way of suppressing this bias.
Survivorship in mutual fund performance is another bias discussed by Elton, Gruber & Blake (1996). When funds perform poorly over a certain period of time, or if their total market value is not large enough to be deemed worthy of the managements' efforts, then they are usually closed completely. In some cases they can also be merged to another fund within the same family. Elton, Gruber & Blake make the assertion that this is done in order for the fund managers to keep receiving their fees on the investors' capital, but also erase any poor performance from both hard copy and digital databases. Ultimately, their paper finds that when α is examined on the largest and smallest funds within their sample, smaller funds have much worse performance, with a negative α twice as large as the larger funds. This seems to be consistent with the fact that a large number of smaller mutual funds fail to survive compare to the larger ones, and that in turn, those who fail have poorer performance than those who do. Brown, Goetzmann, & Ibbotson (1992) find similar adverse effects of the survivorship bias. According to their research, surviving funds have an apparent persistence in positive performance solely due to the dilution of risk among different fund managers. Moreover, they find that it would have a negative effect on the volatility and return relationship (namely CAPM's β) on risk-adjusted historical data.
Interim trading can also have a negative impact on any fund performance analysis. This was first described by Fama (1972), when portfolio managers trade in or out of the fund in a way that would affect perceived stock picking ability over several periods of time. For example, a study considers two separate time periods of a fund's performance. If a significant macroeconomic event occurs between the two time periods, this would increase market volatility, but also imply that expected risk-return ratios are readjusted to be less favourable. Because the data calculated would be an average of both time periods and does not cover the specific market-shifting event, then it would appear as if the fund manager anticipated in part the increase in risk. If a return-based style analysis approach is used, then this would be perceived as superior performance. On the other hand, weigh-based style analysis would considering the difference in weighting between the two time periods and would therefore not create a bias. For this reason, a conditional weight-based approach would be more suitable.
Finally, Ferson & Aragon (2006) describe costs as playing a major role in creating biases. These can include initial joining fees, transaction fees, portfolio managers' fees, income tax, marketing, and more. When considering mutual fund performance, returns are always net of all these expenses. Additional costs at an individual investor level can be incurred such as load fees, or redemption costs when buying or selling the fund's units. This means that comparing these fund returns against a passive benchmark's cost structure can become quickly out of hand. Most previous studies in fund performance consider the benchmark portfolio to be free of any costs, while funds returns are net of all costs and expenses. Most investors would consider that any portfolio managers should be able to earn back trading costs, and therefore would only be considered to add value to the fund past that threshold. Therefore treatment of costs and how they should be accounted for is crucial to get a true perspective on fund performance.
ii.v Conclusion
At both a US and UK level, previous evidence of superior performance which is sustainable in mutual funds is not clear cut. However, style analysis and the correlation between R² and Jensen's alpha as found by Amihud and Goyenko (2008) are significant tools which can help filter the noise in an attempt to distinguish portfolio managers that bring true added value to their funds, to consistent underperformers. Moreover, careful consideration should be taken into factors that could create negative biases in analysing the data set.