Commit 434ae3ca authored by Daniel Lukats's avatar Daniel Lukats

added list of symbols and definitions

parent 962e9da4
\begin{longtable}{l l l}
$\doteq$ & defined to be & \\
$\sum_{s', r}$ & shorthand for $\sum_{s'\in\mathcal{S}}\sum_{r\in\mathcal{R}}$ & \\
$\propto$ & proportional to & \\
$a,s\sim\pi$ & $a,s$ observed by following $\pi$ & \\
$\mathbb{E}$ & expected value & \\
$\mathbb{E}_{a,s\sim\pi}$ & expected value with observed $a, s$ & \\
\\
$t$ & discrete time step & TODO pageref \\
$T$ & final time step, horizon & TODO pageref \\
$s, s'$ & states & \\
$a$ & action & \\
$r$ & reward & \\
$S_t$ & state at time $t$ & TODO pageref \\
$A_t$ & action at time $t$ & TODO pageref \\
$R_{t+1}$ & reward at time $t+1$ & TODO pageref \\
$\mathcal{S}$ & set of states & TODO pageref \\
$\mathcal{A}$ & set of actions & TODO pageref \\
$\mathcal{R}$ & set of rewards & TODO pageref \\
\\
$p(s', r \mid s, a)$ & dynamics function & TODO pageref \\
$\pi(a\mid s)$ & policy & TODO pageref \\
$\mu(s)$ & stationary distribution of states & TODO \\
\\
$\alpha$ & learn rate & TODO \\
$\gamma$ & discount factor & TODO \\
$\lambda$ & variance tuning factor & TODO \\
$\epsilon$ & clipping parameter & TODO \\
\\
$G_t$ & return & TODO ref \\
$G_t^\lambda$ & $\lambda$-return & TODO \\
$v_\pi(s)$ & value function & TODO \\
$a_\pi(s, a)$ & advantage function & TODO \\
$\delta_t$ & advantage estimator & TODO \\
$\delta_t^{\text{GAE}(\gamma, \lambda)}$ & Generalized Advantage Estimation & TODO \\
\\
$\boldsymbol\omega$ & parameter vector & TODO \\
$\hat{v}_\pi(s, \boldsymbol\omega)$ & parameterized value function & TODO \\
$\hat{a}_\pi(s, a, \boldsymbol\omega)$ & parameterized advantage function & TODO \\
$\overline{\text{VE}}(\boldsymbol\omega)$ & mean squared value error & TODO \\
\\
$\boldsymbol\theta$ & parameter vector & TODO \\
$\pi(a\mid s, \boldsymbol\theta)$ & parameterized policy & TODO \\
$J(\boldsymbol\theta)$ & fitness function & TODO \\
$\hat{g}$ & gradient estimator & TODO \\
\\
$\rho_t(\boldsymbol\theta)$ & likelihood ratio & TODO \\
$\text{clip}(\rho_t(\boldsymbol\theta), \epsilon)$ & clipping function & TODO \\
$\text{clip}_v(\boldsymbol\omega, \boldsymbol\omega_\text{old}, \epsilon, S_t)$ & value clipping function & TODO
\\
\\
$\mathcal{L}(\boldsymbol\theta)$ & loss & TODO \\
$\mathcal{L}^\text{CLIP}(\boldsymbol\theta)$ & PPO clipped loss & TODO \\
$\mathcal{L}^\text{VF}(\boldsymbol\omega)$ & value function loss & TODO \\
$\mathcal{L}^\text{VFCLIP}(\boldsymbol\omega)$ & clipped value function loss & TODO \\
\\
$c_1$ & value function loss coefficient & TODO \\
$c_2$ & entropy bonus coeficient & TODO \\
$S$ & entropy bonus & TODO \\
$\mathcal{L}^{\text{CLIP}+\text{VFCLIP}+S}(\boldsymbol\omega)$ & shared parameters PPO loss & TODO \\
\\
$\tau$ & rollout & TODO \\
$\phi$ & post-processing & TODO \\
$k$ & number of frames skipped & TODO \\
$N$ & number of parallel actors & TODO \\
$I$ & number of training iterations & TODO \\
$K$ & number of epochs & TODO \\
$M$ & minibatch size & TODO
\end{longtable}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment