In this study, the well-known pairs trading strategy, one of typical market neutral strategies, is modified to be able to utilize high frequency equity data, and it is applied to the constituent shares of the Nifty 50 index. This study is distinguished from the most of previous work on the traditional pairs trading strategy by the use of high frequency data in strategy modeling instead of daily closing prices, which allows analyzing the performance of the strategy in high frequency domain. The trading signal is generated based on the spread between stocks of pair, by estimating time adaptive regression coefficients using Double Exponential Smoothing Process.
Major findings include that arbitrage profitability is in fact present without being subject to market condition even when conservative transaction costs are taken into account.
Furthermore an enhanced version of the strategy is presented which selects high-ranking pairs to trade for the next time period based on a set of in-sample statistics. The analysis of the data series reveals that the extent to which daily data are cointegrated provides a good indicator of the profitability of the pair in the high-frequency domain. For each series, the in-sample winning ratio, average returns and number of trading opportunities generated are a good indicator of the future profitability as well. It is verified that the enhanced strategy has better profitability and reliability compared to the basic strategy.
CHAPTER 1: INTRODUCTION
Statistical arbitrage is a popular trading strategy adopted by hedge funds often abbreviated as StatArb. As a trading strategy, statistical arbitrage is a heavily quantitative and computational approach to equity trading. It involves statistical methods as well as automated trading systems.
the traditional approach to statistical arbitrage is through attempting to bet on the temporal convergence and divergence of price movements of pairs and baskets of assets, using statistical methods. A more academic definition of statistical arbitrage is to spread the risk among thousands to millions of trades in very short holding time, hoping to gain profit in expectation through the law of large numbers.
Statistical Arbitrage first evolved in mid-80's with Nunzio Tartaglia, who assembled at Morgan Stanley a 21 team of physicists, mathematicians and computer scientists to uncover statistical mispricing in the equity markets, Gatev et al. (2006). StatArb evolved out of the simpler pair trade strategy, in which stocks are put into pairs by fundamental or market-based similarities. When one stock in a pair outperforms the other, the poorer performing stock is bought long with the expectation that it will climb towards its outperforming partner, the other is sold short. This hedges the risk from market movements. Thus it is also known as a beta neutral strategy.
Currently StatArb is not limited to a pairs of stocks but a portfolio of a hundred or more stocks-some long, some short-that are carefully matched by sector and region to eliminate exposure to beta and other risk factors. Strategies such as mean reversion are used to predict entry and exit points in the trade.
With the development of better IT infrastructure and low latency networks a new trading technique called High Frequency Trading (HFT) is gaining popularity. In high-frequency trading, programs analyze market data to capture trading opportunities that may open up for only a fraction of a second to several hours. High-frequency trading, (HFT), uses computer programs and sometimes specialized hardware to hold short-term positions in equities, options, futures, ETFs, currencies, and other financial instruments that possess electronic trading capability. HFT strategies are usually very sensitive to the processing speed of markets and of their own access to the market, David Bowen et al. (2010).
High-frequency trading started around 1999, after the U.S. Securities and Exchange Commission (SEC) authorized electronic exchanges in 1998. In the early 2000s HFT trades had an execution time of several seconds whereas by 2010 this has decreased to milliseconds and even microseconds. High Frequency Trading grew by about 164% between 2005 and 2009. In the U.S., high-frequency trading firms represent 2% of the approximately 20,000 firms operating today, but account for 73% of all equity orders volume. The Tabb Group, a consultancy based in Westborough, MA, estimates that high-frequency automated trading now accounts for 61 percent of the more than 10 billion shares traded daily across the numerous exchanges that make up the U.S. market. Tabb estimates profits from high-frequency trading in the first nine months of 2010 at $8 billion or more, Urstadt (2010). Most high-frequency trading strategies fall within one of four groups of trading strategies:
Market making
Ticker Tape Trading
Event Arbitrage
High-frequency Statistical Arbitrage
High-frequency trading has become controversial, with critics charging that traders are manipulating the market, taking advantage of the little guy, and even courting a full-scale financial meltdown.
The increasing dominance of algorithmic trading and the growing speed of execution, could cause tiny price changes to snowball, rolling down the hill at exponentially increasing speed-either because the machines are trading too fast or because too many funds are trading in the same style, Urstadt (2010).
HFT is a technical means to implement established trading strategies. HFT is not atrading strategy as such but applies the latest technological advances in market access, market data access and order routing to maximize the returns of established trading strategies. Therefore, the assessment and the regulatory discussion about HFT should focus on underlying strategies rather than on HFT as such.
HFT is a natural evolution of the securities markets instead of a completely new phenomenon.
There is a clear evolutionary process in the adoption of new technologies triggered by competition, innovation and regulation. Like all other technologies, algorithmic trading (AT) and HFT enable sophisticated market participants to achieve legitimate rewards on their investments
especially in technology and compensation for their market, counterparty and operational risk exposures.
The majority of HFT based strategies contributes to market liquidity (market makingstrategies) or to price discovery and market efficiency (arbitrage strategies).
Academic literature mostly shows positive effects of HFT based strategies on marketquality.
The majority of papers, focusing on HFT, do not find evidence for negative effects of HFT on market quality. On the contrary, the majority argues that HFT generally contributes to market quality and price formation and finds positive effects on liquidity andshort term volatility. Only one paper critically points out that under certain circumstances HFT might increase an adverse selection problem and in case of the flash crash one studydocuments that HFT exacerbated volatility. As empirical research is restricted by a lack of accessible and reliable data, further research is highly desirable. In contrast to internalization or dark pool trading, HFT market making strategies facerelevant adverse selection costs as they are providing liquidity on lit markets withoutknowing their counterparties. In internalization systems or dark venues in the OTC space, banks and brokers know the identity of their counterparty and are able to ―cream skim uninformed order flow. In contrast, HFTs on lit markets are not informed on the toxicity of their counterparts and face the traditional adverse selection problems of market makers.
Any assessment of HFT based strategies has to take a functional rather than an institutional approach. HFT is applied by different groups of market players from investment banks to specialized boutiques. Any regulatory approach focusing on specialized players alone risks (i) to undermine a level playing field and (ii) exclude a relevant part of HFT strategies.
The high penetration of HFT based strategies underscores the dependency of players in
today's financial markets on reliable and thoroughly supervised technology. Therefore, (i) entities running HFT strategies need to be able to log and record algorithms'input and output parameters for supervisory investigations and back-testing, (ii) marketshave to be able to handle peak volumes and have to be capable of protecting themselves against technical failures in members' algorithms, (iii) regulators need a full picture of potential systemic risks triggered by HFT and require people with specific skills as well asregulatory tools to assess trading algorithms and their functionality. Any regulatory interventions in Europe should try to preserve the benefits of HFTwhile mitigating the risks as far as possible by assuring that (i) a diversity of trading strategies prevails and that artificial systemic risks are prevented, (ii) economic rationale rather than obligations drive the willingness of traders to act as liquidity providers, (iii) co-location and proximity services are implemented on a level playing field, (iv) instead of market making obligations or minimum quote lifetimes, the focus is on the alignment of volatility safeguards among European trading venues that reflect the HFT reality and ensure that all investors are able to adequately react in times of market stress.
The market relevance of HFT requires supervision but also transparency and opencommunication to assure confidence and trust in securities markets.
Given the publicsensitivity to innovations in the financial sector after the crisis, it is the responsibility of entities applying HFT to proactively communicate on their internal safeguards and risk management mechanisms. HFT entities act in their own interest by contributing to anenvironment where objectivity rather than perception leads the debate: They have to drawattention to the fact that they are an evolution of securities markets, supply liquidity andcontribute to price discovery for the benefit of markets.
In India SEBI had passed Algo trading in 2009. The exchanges have offered co-location services by which brokers can place their servers inside the exchange thereby reducing the earlier transaction time form 40 milliseconds to about 4 milliseconds. Stock exchanges have also come up with ultra-low latency tools using which it will be possible to run propriety strategies. NSE has come out with AlgoStudio which is an event-driven, ultra-low latency, scalable and high-frequency algorithmic trading platform that facilitates automated trading with increase in profitability and competitive advantage. The deployment of AlgoStudio will help further reduce transaction time to less than a millisecond.
Statistical arbitrage strategies attained much popularity in the 1980's because of the high returns they provided with low risk. But the profitability of these strategies has deteriorated; Gatev et al. (2006). But since early 2000, with the use of high frequency trading, statistical arbitrage strategies have again become profitable and are been extensively used by institutions in US and Europe.
The industry practice for selecting stock pairs is using daily closing prices and basic cointegration techniques, Gatev et al. (2006). The purpose of this research is to apply a statistical arbitrage technique of pairs trading to high-frequency equity data and compare its profit potential to the standard sampling frequency of daily closing prices in the Indian equity markets.
Once it is found that use of high frequency trading provides greater returns with less volatility compared to daily prices the research can be extended to find more complex algorithms instead of using the basic pair trading strategy.
CHAPTER 2: REVIEW OF LITERATURE
Vidyamurthy, G. (2004), "Pairs Trading - Quantitative Methods and Analysis", John Wiley & Sons, Inc., New Jersey.
Pair trading is a well-known technique which is widely used for finding statistical arbitrage opportunities by hedge fund managers. The strategy is widely documented in current literature including Vidyamurthy (2004), Lin et al. (2006). Pair trading relies on the principle of equilibrium pricing for near-equivalent shares. In efficient markets, capital asset pricing model-based valuation theory and the law of one price require price equality for equivalent financial assets over time. The price spreads of near-equivalent assets should also conform to a long-term stable equilibrium over time. When a sufficiently large deviation of price spread from the long-run norm is identified, a trade is opened by simultaneously buying (go long) the under-valued share and selling (short) the over-valued share. The trade is closed out when prices return to their equilibrium price spread levels by selling the long position and off-setting the short position. Net trading profit sums the profits from the long and short positions, calculated as the difference between the opening and closing prices (net of trading costs less interest on short sale receipts) Lin et al. (2006). As we simultaneously get into a long-short position it is considered as a market neutral strategy. But the strategy is not completely risk-less as the price spreads may escalate instead of reversing or the equilibrium level may shift from historical position. But this strategy has provided good returns with very low volatility in the past compared to the equity index returns.
Christian L. Dunis, Gianluigi Giorgioni, Jason Laws, and Jozef Rudy, "Statistical Arbitrage and High-Frequency Data with an Application to Eurostoxx 50 Equities", March 2010, Liverpool Business School, Working paper
The motivation for this paper is to apply a statistical arbitrage technique of pairs trading to high-frequency equity data and compare its profit potential to the standard sampling frequency of daily closing prices. We use a simple trading strategy to evaluate the profit potential of the data series and compare information ratios yielded by each of the different data sampling frequencies. The frequencies observed range from a 5-minute interval, to prices recorded at the close of each trading day.
The analysis of the data series reveals that the extent to which daily data are cointegrated provides a good indicator of the profitability of the pair in the high-frequency domain. For each series, the in-sample information ratio is a good indicator of the future profitability as well.
Conclusive observations show that arbitrage profitability is in fact present when applying a novel diversified pair trading strategy to high-frequency data. In particular, even once very conservative transaction costs are taken into account, the trading portfolio suggested achieves very attractive information ratios
Joao F. Caldeira, Guilherme V. Moura, "Selection of a Portfolio of Pairs Based on Cointegration: The Brazilian Case", March 2012,Federal University of Rio Grande do Sul, Brazil
Pairs trading is a statistical arbitrage strategy designed to exploit short-term deviations from a long-run equilibrium between two stocks. Traditional methods of pairs trading have sought to identify trading pairs based on correlation and other non-parametric decision rules. Trading pairs are selected based on the presence of a cointegrating relationship between two stocks. Cointegration enables us to combine the two stocks in a certain linear combination so that the combined portfolio is a stationary process. If two cointegrated stocks share a long-run equilibrium relationship, then deviations from this equilibrium are only short-term and are expected to die out in future periods. To profit from this relative misspricing, a long position in the portfolio is opened when its value falls sufficiently below its long-run equilibrium and is closed out once the value of the portfolio reverts to its expected value. Similarly, profits may be earnt when the portfolio is trading sufficiently above its equilibrium value by shorting the portfolio until it reverts to its expected. It uses simple regression to first find the error term and then the ADF (Augmented Dickey-Fuller) test is used for checking the error term's stationarity. If one stock is cointegrated with another stock in given time period, it means the spread between the stock prices is bounded around equilibrium level in the period, not wandering off to infinity. Cointegration technique is more popularly used for finding stock pairs for market neutral strategies and is also considered as the industry standard technique Gatev et al. (2006). There have also been studies of market neutral arbitrage strategies based on the cointegration approach, Alexander and Dimitriu (2002), Gatev et al. (2006), Lin et al. (2006).
Balvers, Ronald; Yangru Wu; Gilliland, Erik. Mean Reversion across National Stock Markets and Parametric Contrarian Investment Strategies. Journal of Finance, Apr2000, Vol. 55 Issue 2, p745-772, 28p
Mean Reversion is the tendency of asset prices to return to long term equilibrium or mean value. The findings show a significant positive speed of reversion in US stocks. This property of stock prices is very important for implementing a pair trading strategy.
The findings imply a significantly positive speed of reversion with a halflife of three to three and one-half years. This result is robust to alternative specifications and data. Parametric contrarian investment strategies that fully exploit mean reversion across national indexes outperform buy-and-hold and standard contrarian strategies.
Gatev, Evan; Goetzmann, William N.; Rouwenhorst, K. Geert; Pairs Trading: Performance of a Relative-Value Arbitrage Rule. Review of Financial Studies, Fall2006, Vol. 19 Issue 3, p797-827, 31p
Stocks are matched into pairs with minimum distance between normalized historical prices. A simple trading rule yields average annualized excess returns of up to 11% for self-financing portfolios of pairs. The profits typically exceed conservative transaction-cost estimates. Robustness of the excess returns indicates that pairs trading profits from temporary mispricing of close substitutes. We link the profitability to the presence of a common factor in the returns, different from conventional risk measures.
CHAPTER 3: RESEARCH METHODOLOGY
Pair Selection:
A pair trading strategy requires the stocks in a pair to be near equivalent in fundamentals. Therefore stock pairs were selected from the same industry. The industry classification provided by NSE in forming the nifty index was used. Also pairs having beta spread, calculated based on CAPM (Capital Asset Pricing Model), larger than 0.2 were discarded to realize the market neutral strategy. Based on these criteria 23 stock pairs were formed.
The pairs were then passed through a cointegration test. Only those pairs were selected which are cointegrated i.e. which exhibit long term equilibrium. The 2-step approach proposed by Engle and Granger (1987) is used for the estimation of the long-run equilibrium relationship where first the OLS regression is performed. In the second step the residuals of the OLS regression are tested for stationarity using the Augmented Dickey-Fuller unit root test at 95% confidence level.
Based on the above criteria finally 14 pairs were selected. Appendix B gives a list of the stock pairs selected.
Trading Signal Generation:
The spread between the prices of the two stocks which form a pair was used to generate trading signals. The spread is calculated as
Zt = PYt - βt PXt
where Zt is the value of the spread at time t, PXt is the price of share X at time t, PYt is the price of share Y at time t and βt is the adaptive coefficient beta at time t.
To estimate the adaptive coefficient βt a smoothing technique Double Exponential Smoothing Prediction (DESP) was used.
Double exponential smoothing-based prediction (DESP) models:
Double exponential smoothing-based prediction (DESP) models are defined by two series of simple exponential smoothing equations. The method derives the forecasts through the following five steps:
Compute single exponentially smooth (SES) values
SESt = αAt + (1-α) SESt-1
Compute DES series
DES = α(SESt) + (1-α) DESt-1 where 0<α<1
Compute adjustment coefficients between SES and DES.
at = 2 SESt - DESt
Compute trend estimates bt
bt = α (SESt - DESt)
1 - α
Derive forecasts
Ft+n = at + bt (m)
Trading Strategy:
The spread calculated for each of the stock pairs was normalized by subtracting it by its mean and dividing by standard deviation. When the spread is +2 standard deviation above the mean (or when it is -2 standard deviation below the mean) the spread will be bought and when the spread gets closer than 0.5 standard deviation to the mean the position will be liquidated.
The investment strategy being money-neutral, the amount of rupees to be invested on the long and short side of the trade will be the same. As the prices of the stocks forming the pair will change in different proportions the position will stop being money neutral. But there will no rebalancing done to the position once entered. Only two types of transactions will be done, entry into a new position, and total liquidation of held positions.
For HFT all the positions that are built in a day will be liquidated by the end of the day i.e. all positions will be squared off at end of day even if it may not be profitable. For pair trading using daily data the maximum time for which a position will be held is 90 days.
For an illustration, in Figure 1 below shows the normalized spread and the times when the positions are open. When the dotted line is equal to 1(-1), the investor is long (short) the spread.
Figure 1. The normalized spread of the pair consisting of HDFC Bank and Punjab National Bank sampled at 5-minute interval
Figure 2. Cumulative return curve for out-of-sample data of the pair trading strategy using HDFC Bank and Punjab National Bank sampled at a 5-minute interval
In-sample indicators:
All the indicators are calculated in the in-sample period. The objective is to find the indicators with high predictive power of the profitability of the pair in the out-of-sample period.
The first indicator is the t-stat from the ADF test (on the residuals of the OLS regression of the two shares). It gives the level of cointegration between the two stocks.
The second indicator is the number trading opportunities generated in the in-sample period. Higher the number of trading opportunities generated higher will be the cumulative return.
Out-of-sample performance measurement:
Return calculation:
The return in each period is calculated as
Rett = ln(Pxt/Pxt-1) - ln(Pyt/Pyt-1) - transaction cost
where Px is the price of the share we are long, Py is the price of the share we are short and transaction cost is the total transaction cost during the purchase and sale of the spread.
Transaction cost:
The transaction cost for high frequency trading is taken as a conservative figure of 0.04% for one complete trade i.e purchase and sale of the spread. The average retail intraday transaction cost in India is 0.01% one way for one script. So for the purchase and sell of two stocks the transaction cost is taken to be 0.04%.
For daily closing data the transaction cost is taken as 0.2% for one complete trade. The average retail transaction cost in India is 0.05% one way for one script. So for the buy and sell of a pair of stocks it's taken as 0.2%.
Tools used for data collection and analysis:
The tools used for the research are:
Microsoft excel
EViews
MS Excel has been used extensively for data collection, formatting and applying the trading strategy. VBA Macro in excel has been used to implement the trading algorithm, generate trading signals and calculate the returns. Appendix C has the macro code for implementing the trading strategy.
EViews has been used for running cointegration test on the stock pairs. The t-stats of the ADF test are further used to find the correlation between returns and t-stat values.
Hypothesis
Hypothesis: To check for profitability of application of high frequency data to pair trading strategy
H0: Pair trading strategy using high frequency sampling of 5 minute is not more profitable than a sampling period of 1 day.
Ha: Pair trading strategy using high frequency sampling of 5 minute is more profitable than a sampling period of 1 day.
Data
The data used for the research is 5 minute tick data and daily closing data for the 50 stocks that are included in the Nifty index. A list of the stocks and their industry classification is given in Appendix A. The data provided by NSE has been adjusted for stock splits and bonus issues. The 5 min tick data spans from 1st September 2008 to 30th April 2011. The daily closing data ranges from 1st January 2007 to 30th April 2011.
The data set includes bullish, bearish, and flat market period within the horizon. By including the period of global financial crisis in the second half of the year 2008 it will be possible to highlight the reliability of the strategy during the time when stock market meltdown and global financial systems collapsed.
Table 1 gives the in and out of sample period for high frequency data(5 min tick data) and daily frequency data. The start of the out-of-sample period is not aligned between daily and high-frequency data. If the out-of-sample period for daily data started at the same date as is the case for high-frequency data, it would not contain enough data points for the out-of-sample testing.
Table 1. Specification of the in- and out-of-sample periods and number of data points contained in each
In - sample
Out - of - sample
Start
End
Data points
Start
End
5 min data
9/1/2008
3/31/2010
27100
4/1/2010
4/29/2011
Daily data
1/1/2007
12/31/2009
739
1/1/2010
4/29/2011
CHAPTER 4: ANALYSIS
Preliminary out-of-sample results:
Table 2 presents the summary of average annualized trading statistics for the out-of-sample period. The statistics include average return per year both including and excluding transaction costs, no of trades, average holding period, average rate of return per trade, volatility, and winning ratio. The results clearly show that returns using data at 5 min frequency give much higher return compared to daily data. Thus we can reject the null hypothesis and infer that high frequency data can be used with pair trading strategy to generate higher returns compared to daily data.
For high frequency data the average return per year including transaction cost is 23.41% with a volatility of as low as 0.76%. Average rate of return per trade including transaction cost is 0.08% but the average number of trading opportunities that came up in a year was around 370. The winning ratio is 53.59%. The average holding period has been around 5.45 i.e. with 5 min tick data its 27.25 minutes.
For daily data frequency the average return including transaction cost comes out to be negative. Although the volatility is low at 4.02% still the returns are very low. The winning ratio was just 30% i.e. out of 10 only 3 trades were generating positive returns. The average holding period for a pair was around 11 days.
Table 2. Summary of out-of-sample annualized trading statistics for pair trading strategy
Average values
5 min data
Daily data
Avg Return per year incl trancost(%)
23.41%
-2.40%
Avg Return per year excl trancost(%)
38.20%
-0.88%
No of trades
369.5000
10.1538
Avg Return per trade incl trancost(%)
0.08%
-0.77%
Avg Return per trade excl trancost(%)
0.12%
-0.57%
Volatility(%)
0.76%
4.02%
Avg holding period
5.4458
11.0353
Winning ratio(%)
53.59%
30.63%
Table 3 and 4 gives the complete results for the out-of-sample and in-sample period for 5-min data respectively. Table 5 and 6 gives the out-of-sample and in-sample period output for daily data respectively. Appendix D gives the complete output for each stock pair both using daily and 5 min data.
Table 3. Out-of-sample annualized trading statistics for 5-min tick data
Table 4. In-sample annualized trading statistics for 5 min data
Table 5. Out-of-sample annualized trading statistics for daily data
Table 6. In-sample annualized trading statistics for daily data
Relationship between the in-sample t-stats and the out-of-sample returns and number of trading opportunities:
This test tries to examine whether the in-sample cointegration of a given trading pair implies better out-of-sample performance. One can logically assume that a higher stationarity of the residual from the cointegration equation implies a higher confidence that the pair will revert to its mean. Thus we would expect a significant positive correlation between the t-stat of the ADF test on the OLS residuals and the out-of-sample performance parameters. Table 3 gives the correlation matrix for the t-stats of in-sample data with the out-of-sample parameters like average return, number of trading opportunities and winning ratio for 5 min data.
Table 7. Correlation matrix for in-sample t-stats and the out-of-sample average return, number of trading opportunities and winning ratio
t-stat
No of trades
Avg Return per year incl trancost
Winning ratio
t-stat
1
No of trades
0.1268
1
Avg Return per year incl trancost
0.2467
0.6159
1
Winning ratio
0.2332
0.1595
0.3443
1
The in-sample t-stat seems to have certain predictive power for the out-of-sample performance parameter. There is a good positive correlation between the average returns and the t-stat value. Also the winning ratio is positively correlated with the in-sample t-stat value i.e. higher the t-stat trades which are profitable will be more. But the correlation between number of trades and t-stat value is less.
Relationship between the in-sample performance parameters and the out-of-sample returns and number of trading opportunities:
This test tries to examine that the stock pairs which show better performance in the in-sample period also perform better in the out-of-sample period. This will help in determining in-sample parameters using which the stock pairs can be filtered for finding the most optimum pairs.
Table 8. Correlation matrix for in-sample performance parameters and the out-of-sample average return and winning ratio
In sample parameters
Avg Return per year incl trancost
No of trades
Out of sample Avg Return per year incl trancost
0.446332026
0.372514146
Out of sample Winning ratio
0.228013323
0.61318429
The out of sample average return shows a high correlation with the in-sample average return and winning ratio. The out of sample winning ratio is highly correlated to the number of trading opportunities in the in-sample period and the in-sample winning ratio.
Thus the pairs which performed better in the in-sample period also performed well in the out-of-sample period and these three in-sample parameters can be used to filter the best performing stock pairs.
Building a diversified portfolio based on the in-sample parameters
The in-sample parameters t-stat, average return per year, winning ratio and number of trades are equally weighted and a combined rank is generated. The top five ranking stock pairs will be selected.
Table 9. Combined rank for stock pairs calculated using the in-sample parameters
Table 10. Out-of-sample performance for the portfolio formed of the top 5 stock pairs
The output in table 9 clearly shows that the portfolio performance is better than the average performance of all 14 stock pairs. The average return is 33% compared to 23.41% and the return standard deviation is also lower. The winning ratio has also increased.
Chapter 5: Conclusion
This study applies a pair trading strategy to the constituent shares of the Nifty 50 index. It implements a basic long-short trading strategy which is used to trade shares sampled at 5- minute and daily intervals.
First, the shares are divided into industry groups and form pairs of shares that belong to the same industry and have a low beta spread. The DESP approach is used to calculate an adaptive beta for each pair.
Subsequently, the spread between the shares is calculated and trading activity is simulated based on 2 simple trading rules. The position (long or short) is entered whenever the spread is more than 2 standard deviations away from its long-term mean. All positions are liquidated when the spread returns to its long-term mean (defined as its distance being lower than 0.5 standard deviations from the long-term mean), that is, technically, when it reverts towards the long-term mean.
As such, standalone pair trading results using daily data is not very attractive. But by using high frequency data better results can be obtained. The in-sample parameters such as the t-stat of ADF test, average return, number of trading opportunities and winning ratio can be used to select the best performing stock pairs.
A diversified pair trading portfolio based on the 5 trading pairs with the best in-sample indicator value was formed. This approach is able to produce very attractive results with high returns and low volatility. This is a very attractive result when compared to the performance of the Nifty 50 index. Thus the combination of the high-frequency data and pair trading strategy can be profitably used in the Indian equity markets.
In general sense, statistical arbitrage is only demonstrably correct as the amount of trading time approaches infinity and the liquidity, or size of an allowable bet, approaches infinity.
Statistical arbitrage is also subject to model weakness as well as stock- or security-specific risk. The statistical relationship on which the model is based may be spurious, or may break down due to changes in the distribution of returns on the underlying assets. Factors, which the model may not be aware of having exposure to, could become the significant drivers of price action in the markets, and the inverse applies also. The existence of the investment based upon model itself may change the underlying relationship, particularly if enough entrants invest with similar principles. The exploitation of arbitrage opportunities themselves increases the efficiency of the market, thereby reducing the scope for arbitrage, so continual updating of models is necessary.