Foundations of Deep Reinforcement Learning: Theory and Practice in PythonAddison-Wesley Professional, 2019 M11 20 - 416 pages The Contemporary Introduction to Deep Reinforcement Learning that Combines Theory and Practice Deep reinforcement learning (deep RL) combines deep learning and reinforcement learning, in which artificial agents learn to solve sequential decision-making problems. In the past decade deep RL has achieved remarkable results on a range of problems, from single and multiplayer games—such as Go, Atari games, and DotA 2—to robotics. Foundations of Deep Reinforcement Learning is an introduction to deep RL that uniquely combines both theory and implementation. It starts with intuition, then carefully explains the theory of deep RL algorithms, discusses implementations in its companion software library SLM Lab, and finishes with the practical details of getting deep RL to work. This guide is ideal for both computer science students and software engineers who are familiar with basic machine learning concepts and have a working understanding of Python.
|
Contents
SARSA | 23 |
Deep QNetworks DQN | 23 |
Combined Methods | 23 |
Parallelization Methods | 23 |
Algorithm Summary | 23 |
SLM | 23 |
Actions | 23 |
Rewards | 23 |
Transition Function | 23 |
Epilogue | 23 |
B Example Environments | 23 |
References | 23 |
Index | 23 |
Improving | 23 |
Network Architectures | 23 |
Hardware | 23 |
Environment Design | 23 |
Proximal Policy Optimization PPO | 23 |
PolicyBased and ValueBased Algorithms | 23 |
Other editions - View all
Foundations of Deep Reinforcement Learning: Theory and Practice in Python Laura Graesser,Wah Loon Keng No preview available - 2020 |
Common terms and phrases
a₁ action space actor Actor-Critic algorithms Advantage Estimation advantage function algorithm ARXIV Atari games Atari Pong batch Bellman equation BipedalWalker calculate CartPole Chapter Click complex convolutional debugging Deep Reinforcement Learning discrete Double DQN DQN algorithm entropy environment episode example Figure frame skipping grayscale hyperparameters implementation input layers learning rate max Q method minibatch MLPs n-step returns network architecture network parameters neural network numpy on-policy OpenAI Gym output parameter updates performance pixels policy gradient policy loss Policy Optimization POMDPs preprocessing priorities problem PyTorch Q-function Q-Learning Q-value replay memory reward signal robotic s₁ sampling SARSA shown in Code shown in Equation SLM Lab spec file supervised learning surrogate objective target network Temporal Difference Learning tensor training step trajectories trust region v_preds view code image