CAIS++ Spring 2019 Project: Building an Agent to Trade with Reinforcement Learning
-
February 3rd:
- In meeting:
- First Meeting, Environment Set-Up, DQN Explanation, Project Planning
- Homework:
- Read first three chapter of Spinning Up
- Watch the first half of Stanford RL Lecture
- Code up your own LSTM on the AAPL data. Check each other's out for inspiration, find online resources, ask questions in the slack. Should be a useful exercise for everyone!
- (Optional: Watch LSTM Intro Video)
- In meeting:
-
February 10th: Working LSTM Model
- State:
- Current Stock Price
- percent change from (n-1) to n open
- Action Space: Buy, Sell, Hold (3-dimensional continuous) as percentage of bankroll
- Reward:
- Life Span (define maximum length agent can interact with environment)
- Receives reward based on profit/loss at the end
- Sparse reward, harder to train
- Life Span (define maximum length agent can interact with environment)
- Punishment
- Test extra severe punishment for losses
- set thresh-hold time before it can trade again based on punishment
- Architecture of model
- One day of encoding LSTM
- Observation, no actions taken
- Second day of Policy Net
- Based on methodology learned from encoding
- Actions are taken
- Two-day batches
- One day of encoding LSTM
- Model Dimensions
- Encoding LSTM
- #layers of LSTM, #layers of FCs
- input size, hidden units size, encoding vector size
- Policy LSTM
- input size (state space size)
- output size (action space size: 3d continuous)
- Encoding LSTM
- Homework:
- Jincheng and Yang: Begin building Encoding / Policy Net Models
- Chris: Look through Andrew's current LSTM model
- Grant: Do the preprocessing data
- Tomas: - Continue working on RL architecture - Make graph of prices + volume over batch - Visualize price gaps
- Pre-Process Data
- Visualization
- Gym Trading Environment
- Integrate LSTM into DDDQN
- State:
-
February 18th: Working DQN
- Done for homework
- Built first policy gradient model (Jincheng)
- Worked on data pre-processing (Tomas)
- Today's plan
- Data pre-processing
- Use data as input into the gym
- Finalize the model
- Done for homework
-
February 24th: Work day
- Finish pre-processing
- Finish trading gym
- simulate.py
- Change action 'quit' to quit when timeseries ends
- change time series to remove seconds
- series_env.py
- in class seriesenv
- do not need daily_start_time, daily_end_time
- remove randomization of start index (in def 'seed')
- in class seriesenv
- simulate.py
- Finish pipelining <<<<<<< HEAD
-
February 28th: Hackathon
- TODO
- Review the current reward function in series_env
- Finish building the dataset
- Merge dataset with environment and test
- Begin building the model
- Create sine wave csv for testing
- TODO
-
March 3rd: Working Actor-Critic Model
-
March 10th: Add details like trading costs, slippage, and ask-bid spread; compute performance statistics; data visualization
=======
- March 3rd:
- Work on implementing LSTM (Chris & Caroline)
- Create test datasets (Grant)
- Integrate dataset with gym (Yang & Jincheng)
- March 31th:
- Finish LSTM
- Working Actor-Critic Model
- Add details like trading costs, slippage, and ask-bid spread; compute performance statistics; data visualization
13bd296a9295db93c71581f958b043ea734bdeda
- Build back testing environment
- Integrate NLP sentiment analysis as feature
- Add more indicators to model
- Clean up README
- Do we hold positions overnight? I think initially no. There are also weird jumps over holidays and weekends.
- Take into account high, low, close, volume data
- OpenAI Spinning Up
- UCL + DeepMind Lecture Series
- Stanford CS234: Reinforcement Learning
- CS234: Reinforcement Learning | Winter 2019
- Deep-rl-bootcamp
- David Silver:Intro Reinforcement Learning
- Stanford CS330: Deep Multi-Task and Meta Learning