Reference no: EM133026840
Assignment - Quantitative Finance Final Project
Topics - To complete the project, you must implement one topic from this list - according to the tasks of up to date Brief. If you continue from a previous cohort, please review topic description because tasks are regularly reviewed. It is not possible to submit past topics.
1. Portfolio Construction with Views and Robust Covariance (PC)
2. Deep Learning for Time Series (DL)
3. Long/Short Trading Strategy Design & Backtest (TS)
4. Credit Spread for a Basket Product (CR)
Portfolio Construction with Robust Covariance
Part I: Robust Covariance
1. Implement Portfolio Choice based on your approach to optimal diversification: introduce an exogenous asset, check for the lesser correlated assets, long/short. See Q&A.
2. Decide which method you use to make to make a covariance matrix robust:
-Marcenko-Pastur denoising is arguably the best method to deal with noise-induced instability. Denoising and detoning recipes (see de Prado, 2020) are affine and, beneficial over shrinkage because of superior preservation of the signal carried by top eigenvectors. While no ready code provided, it should not be a problem for a quant to implement the recipes.
-Ledoit-Wolf nonlinear shrinkage has code, which can be applied directly to assets data. In addition to that ready recipe, explore trace and minimum covariance determinant (both are in sklearn package of Python). One can use a trace of elements from a sample covariance matrix, or even take Factor Covariance - such as the diagonal matrix of covariance between each asset and S&P 500.
3. Poor man's choice of EGARCH+correlations is no longer acceptable.
4. Produce supporting representations: heatmaps/3D of covariance matrices, plots of eigen-values of naive sample covarance vs. robust covariance.
Part II: Imposing Views. Comparative Analysis
1. Plan your Black-Litterman application. Find a ready benchmark or construct the prior: equilibrium returns can come from a broad-enough market index. Implement computational version of BL formulae for the posterior returns.
2. Imposing too many views will make seeing impact of each individual view difficult.
3. Describe analytically and compute optimisation of at least two kinds. Optimisation is improved by using sensible constraints, eg, budget constraint, 'no short positions in bonds' but such inequality constraints 8wi > 0 trigger numerical computation of allocations..
4. You will end up with multiple sets of optimal allocations, even for a classic mean-variance optimisation (your one of two kinds). Please make your own selection on which results to focus your Analysis and Discussion { the most feasible and illustrative comparisons.
-Naive covariance vs. robust { remember to compute allocations for sample covariance too (pre-transformed).
-BL views are not affected by covariance matrix - therefore, to compute allocations shifted by views (through Black-Litterman model) with naive or robust covariance is your choice.
-Three levels of risk aversion { this is optional addition to BL results, which should not get in the way of proper analysis (BL vs benchmarks).
Part III: Backtesting OPTIONAL
1. Running P&L is simply: optimal allocations [1] price series. If a future holdout dataset not available, then backtest against a period in the past! Backtesting packages allow historic simulation (crossvalidation over many periods or even shuffled samples) and distributional analysis.)
2. The base but insightful comparison is: allocations vs simple 1/N. Can also compare to Diversification Ratio portfolio.
3. Alternatively, Naive Risk Parity kind of portfolio allocations are easily computed (refer to Risk Budgeting Elective).
4. What phenomena have you encountered, ee, why would investors end up allocating into riskier (high beta) assets? Does equal risk contribution work? Would 'the most diversified portfolio' kind of approach work?
Deep Learning for Time Series
Part I: Features Engineering
Please revisit ML Lab II (ANNs) for basic discussion on feature scaling. Be careful about sklearn feature selection by F-test.
1. Past moving averages of the price, simple or exponentially weighted (decaying in time), so SMA, EMA. Technical indicators, such as RSI, Stochastic K, MACD, CCI, ATR, Acc/Dist). Interestingly, Bollinger Bands stand out as a good predictor. Remember to vary lookback period 5D/10D/21D/50D, even 200D for features as appropriate. Non-overlapping periods means you need data over long periods.
Volume information and Volume Weighted Average Price appear to be immediate-term signals, while we aim for prediction.
2. Use of features across assets are permitted but be tactical about design: eg, features from commodity price impacting an agricultural stock (but not oil futures price on an integrated oil major), features from cointegrated equity pair. Explore Distance Metrics among features (KNN) and potential K-means clustering as yet another alternative to SOM.
3. Balance the needs of longer-term prediction vs. short-term heteroskedastic volatility. Yang & Zhang (2000) provide an excellent indicator: drift-independent volatility which takes into account an overnight jump { but might not be as useful for 5D/long-term prediction as can't be re-scaled to non-daily jumps. Smoothed volatility estimate (EWMA/EGARCH) can be scaled √t but it's not intended as a medium-term prediction indicator, and at the same time risks being dominated by the long-term average variance σ-2.
4. OPTIONAL Interestingly credit spreads (CDS) can be a good predictor for price direction. Think out of the box what other securities have 'credit spread' affecting their price.
5. OPTIONAL Historical data for financial ratios is good if you can obtain the data via your own professional subscription. Other than that, history of dividends, purchases/disposals by key stake-holders (director dealings) or by large funds, or Fama-French factor data is better available.
Part II: Pipeline Formation (considerations)
Your implementation is likely be folded into some kind of ML Pipeline, to allow you re-use of code (eg, on train/test data) and aggregating the tasks. Ensemble Methods present an example of such pipleline: Bagging Classier is an umbrella name for the process of trying several parametrisations of the specific classifer (eg Logistic Regression). AdaBoost over Decision Tree Classifier is another case. However, please do not use these for DL Topic.
Empirical work might find that RNNs/Reinforcement Learning might work better WITH-OUT past returns! Alternatively, if you are predicting 5D/10D move there will be a significant autocorrelation effect - your prediction will work regardless of being a good model or not.
Please limit your exploration to 2-3 assets and focus on features, their SOM (if possible), and LTSM Classifier to make the direction prediction. If you are interested in the approach to choose a few from a large set of assets - can adopt a kind of diversified portfolio selection (see Portfolio Construction topic Q&A).
You are free to make study design choices to make the task achievable. Substitutions:
- present relationship between features with simple scatterplots (vs SOMs) or K-means clustering;
- use MLP classifer if recurrent neural nets or LTSM is particular challenge;
- re-define task and predict Momentum sign (vs return sign) or direction of volatility.
Pairs Trading Strategy Design & Backtest
Can utilise the ready multivariate cointegration (R package urca) to identify your cointegrated cases first, especially if you operate with the system such as four commodity futures (of different expiry but for the period when all traded. 2-3 pairs if analysing separate pairs by EG.
Part I: 'Learning' and Cointegration in Pairs. Trade Design
1. Even if you work with pairs, re-code regression estimation in matrix form - your own OLS implementation which you can re-use. Regression between stationary variables (such as DF test regression/difference equations) has OPTIONAL model specification tests for (a) identifying optimal lag p with AIC BIC tests and (b) stability check.
2. Implement Engle-Granger procedure for each your pair. For Step 1 use Augmented DF test for unit root with lag 1. For Step 2, formulate both correction equations and decide which one is more significant.
3. Decide signals: common approach is to enter on bounds μe±Zσeq and exit on et everting to about the level μe.
4. At first, assume Z = 1. Then change Z slightly upwards and downwards - compute P&L for each case of widened and tightened bounds that give you a signal. Alternatively run an optimisation that varies Zopt for μe±Zoptσeq and either maximises the cumulative P&L or another criterion.
Caution of the trade-off: wider bounds might give you the highest P&L and lowest Ntrades however, consider the risk of co-integration breaking apart.
Part II: Backtesting
It is your choice as a quant to decide which elements you need to argue successfully that your trading strategy (a) will not fall apart and (b) provides 'uncorrelated return'.
4. Industry backtesting practice includes splitting data into train/test subsets. For your forward testing periods you can use Quantopian platform to produce drawdown plots, rolling SR and rolling beta vs chosen factors.
5. OPTIONAL To test if there is a structural change from μeOld to μeNew, the Likelihood Ratio (LR) test applies to coint regressions for Period 1 and Period 2.
6. Industry backtesting relies on rolling betas, while scientific research will test for breakouts using LR test. One hand, cointegrated relationship supposed to persist and β'Coint should stay the same. Keep delivering stationary spread over say, 3-6 months, without the need to be updated. However, Kalman filter/particle filter adaptive estimation of coint regression will give updated β'Coint and μe.
However, you can simply re-estimate cointegrated relationships by shifting data 1-2 weeks (remember to reserve some future data), and report not only on rolling β'Coint, but also Engle-Granger Step 2, the history of value of test statistic for the coefficient in front of EC term.
Part III - Multivariate Cointegration OPTIONAL
Your project can take another turn from the start: look into Johansen Procedure for multivariate cointegration and apply to futures, rates, etc. Five 'deterministic trends' in coint residual are possible but, in practice only need a constant inside the residual et-1.
Interpret Maximum Eigenvalue and Trace statistical tests, both are based on Likelihood Ratio principle, eg, how you decided the number of cointegrated relationships?
Efficient implementation outlined in Jang & Osaki (2001) but you might need Ch 12 from Zivot (2002) book. If coded Johansen Procedure, validate using R/Matlab libraries.
Credit Spread for a Basket Product
1. For each reference name, bootstrap implied default probabilities from quoted CDS and convert them to a term structure of hazard rates, τ ∼ Exp(λ^1Y, . . . , λ^5Y).
2. Estimate default correlation matrices (near and rank) and d.f. parameter (ie, calibrate copule). You will need to implement pricing by Gaussian and t copuleseparately.
3. Using sampling form copula algorithm, repeat the following routine (simulation):
(a) Generate a vector of correlated uniform random variable.
(b) For each reference name, use its term structure of hazard rates to calculate exact time of default (or use semi-annual accrual).
(c) Calculate the discounted values of premium and default legs for every instrument from 1st to 5th-to-default. Conduct MC separately or use one big simulated dataset.
4. Average premium and default legs across simulations separately. Calculate the fair spread.
Attachment:- Assignment File - Quantitative Finance Project.rar