Expired
Milestone
Dec 11, 2019–Jun 8, 2020
Official dates
Display by
Burndown chart
Burnup chart
Unstarted Issues (open and unassigned)
0
Ongoing Issues (open and assigned)
1
Completed Issues (closed)
103
- Fix return calculation
- Effect of scaling input to [0, 1]
- Fix post processing
- Test Kostrikov's implementation
- Effect of setting model in training mode
- Effect of Kostrikov's third convolutional layer with 32 instead of 64 channels
- Parameters are not using max gradient norm
- Change batch size from percentages to absolute numbers
- Logging cosy screened scripts output
- Tensorboard does not like large event files
- Effect of orthogonal initialization
- Effect of epsilon annealing on policy loss
- Bash script for mass-starting on cosy machines
- Tensorboard file naming scheme does not allow proper evaluation
- Policy performance evaluation
- Game Selection
- Update docker setup for tensorboard logging
- Readme Update
- Setup COSY-Lab
- Verify annealing of ɛ and α
- Effect of entropy bonus
- Tensorboard integration
- Use baselines environment wrappers
- Use grad norms to evaluate stability
- Use smaller value loss coeffecient (c_1 = 0.5)
- Use smaller clip range (ɛ = 0.1)
- Use KL divergence to evaluate stability
- ReLU vs tanh activation
- Returns with gamma and lambda vs returns without lambda
- Return calculation doesn't bootstrap from last value
- Minimum vs maximum in value function loss calculation
- Double check loss calculation
- The great bug hunt of 2020
- Batch size once again
- Parallelization with shared memory
- Curiosity
- Logging mean_stats to console with no terminated episodes
- Fake done from EpisodicLifeEnv triggers attempt at logging episode data
- Mocking and deleting in Logger test_save and test_save_not_mocked do not work
- Effect of evaluation over the last 100 episodes vs last 100 time steps with terminating episodes
- Effect of reward clipping vs reward binning
- Font choice
- Environment parallelization with MPI or subprocessing
- Flatten multiple values per time step for batch forward pass
- Reset on done
- Goal Review #2
- Masking terminal states
- Effect of advantage normalization
- Flatten rollout time steps for batch determination
- Rollout generation with horizon time steps
- Multiple epochs without retain_graph=True
- Advantages should be normalized
- Faulty Probability Ratio Calculation
- PPO Batch Size
- EpisodicLifeEnv not resetting properly on loss of final life
- Logging Losses across multiple episodes
- CUDA 10.1 on Tesla VM
- Docker Image
- Effect of reward scaling
- Write Thesis
- Thesis Structure
- Reward Scaling Breaks CUDA
- Global Gradient Clipping
- Observation Normalization and Clipping
- Orthogonal Initialization
- Adam Annealing
- Reward Scaling
- Exploding Value Function
- PPO Optimizations
- PPO and Rollout integration tests
- Goal review #3
- PPO scaling epsilon
- Mismatch in number of states and number of actions in Rollout
- Performance Review
- Rollout/Storage Class
- Return calculation is backwards
- Refactor Policy Tests
- Evaluation
- Logging
- Entropy Bonus
- Postprocessing Implementation
- Include feedback from meeting 1 in README
- Value Function Loss
- Agent Parallelization
- Negative action head output breaks categorical initialization
- Experiment Setup
- PPO CLIP
- PPO KLPEN
- Inverse Dynamics Features
- Variational Autoencoder
- Random Features
- Gym State Channel Order
- REINFORCE Atari Test
- Background Lectures
- Goal Specification
- Common Architecture
- Shared Value Function + Policy Parameters
- Value Function Implementation
- Feature Extraction
- Grayscale Conversion
- Preprocessing Implementation
- GAE Implementation
- PPO Implementation
Loading
Loading
Loading