Contents

Notes from the Deep Reinforcement Learning Course by HuggingFace

Notes from the Deep Reinforcement Learning Course by HuggingFace

UNIT 1. INTRODUCTION TO DEEP REINFORCEMENT LEARNING

Definition

Reinforcement learning is a framework where an agent learns by doing actions within an environment and receiving rewards

Return function

Let $r_t$ be the reward received at the time step $t$.The cumulative reward(=return) $R(\tau)$ received from time step $t$, following the trajectory $\tau$ (sequence of states/actions) with a discount rate $\gamma$ is defined by:

$R(\tau)=\sum_{k=0}^{\infty}\gamma^k r_{t+k+1}$

Policy-based and Value-based methods

The policy $\pi$ associates to a state $s$ either:

  • An action $a$: deterministic policy
  • A probability distribution over actions: stochastic policy

In policy-based methods, we learn $\pi$ directly: the action we need to take.

In Value-based methods, we learn a value function that maps a state to the expected return and chooses the policy maximizing it.

Let $v_{\pi}$ be the value function when we follow the policy $\pi$, $R_t$ and $S_t$ random variables, respectively the reward and state at time $t$. We choose $\pi$ to maximize:

$v_{\pi}(s)=E_{\pi}[\sum_{k=0}^{\infty}\gamma^k R_{t+k+1}|S_t=s]$

UNIT 1. INTRODUCTION TO DEEP REINFORCEMENT LEARNING