Reference no: EM132408683
MULTIAGENT REINFORCEMENT LEARNING LEAGUE OF OPTIONS TRADING MODELS
-MAKE SURE THE CODE IS WELL COMMENTED AND ADD A CONCLUSION WITH YOUR RESULTS.
-> Start with 2 agents: one option seller and one option buyer. And, then add more agents, that is then make it multi-agent.
-> Please use a simulation-based setup to make your own data, i.e. Monte Carlo Simulation. Stick to just Black-Scholes for the entire assignment. Hence, the underlying stock follows a GBM (Gradient Boosting Machines). This way, you know the theoretical value of the option and the hedging strategy in a frictionless world. However, remember that price impact on the stock price will be ad-hoc in this setup.
-> Defining the right incentives (utility functions) for your agents will be key. The seller makes money by selling and hedging the option. The buyer MUST have some external willingness to buy an option, up to a certain price.
-> The seller must probably have some risk-aversion, otherwise you may end up with an agent that do not hedge effectively as he focuses only on the "average" P&L which does not take account large losses.
-> Check deep hedging JP Morgan slides/paper. There, it is crucial that the agent focuses on the \alpha-CVaR. The approach is policy-based hence different form Q- learning.
-> Start introducing frictions like transaction costs only when your "simplest" setup start giving reasonable results.
So, BRIEFLY,
Make data -> do Q-learning on it -> get results -> do Fitted Q-iteration ->get results -> policy based approach -> get results -> make the cumulative reward plots -> CLEARLY show all results
1) Q-learning
2) Fitted Q-Iteration
3) Policy based/JP Morgan..deep hedging
4) Cumulative Rewards Plot or any other better way(s) to compare their efficiencies
Attachment:- MULTIAGENT REINFORCEMENT LEARNING LEAGUE.rar