Commit bf2e1011 authored by Daniel Lukats's avatar Daniel Lukats

minor typo and grammar fixes

parent f0ade70d
......@@ -40,9 +40,9 @@ only game that is not affected by this change is Pong, as the set of rewards in
Furthermore, no player will score multiple times within $k = 4$ frames (cf.~chapter \ref{sec:04:postprocessing}).
On the remaining games, rewards can achieve much larger magnitudes so reward binning or clipping has a notable effect on
the rewards. However, experiment \emph{reward\_clipping} shows a significant drop in performance on all games but pong.
the rewards. Experiment \emph{reward\_clipping} shows a significant drop in performance on all games but pong.
This is echoed in the reward graphs and final scores with all agents performing much worse once rewards are no longer
subject to reward binning.
subject to reward binning. As a consequence, rewards should be binned and not clipped.
\paragraph{Value function loss clipping.}
......@@ -55,7 +55,7 @@ TODO
\subsubsection{Outliers}
Figures \ref{fig:graph_seaquest} and \ref{fig:graph_penalty_beamrider} to \ref{fig:graph_paper_beamrider} contain
obvious outliers, some of which perform a lot better than other runs in that game whereas others perform a lot worse.
obvious outliers, some of which perform a lot better than other runs in that game, whereas others perform a lot worse.
Among all experiments conducted for this thesis, about $35\%$ of the graphs generated contain an obvious outlier.
Similar inconcistency can be seen in the plots published by \citeA{ppo} for a variety of games.
......
......@@ -10,7 +10,7 @@ publication. These optimizations are not part of the initial PPO code \cite[base
not mentioned in an update on the repository either \cite{ppo_blog}.
Debugging can be troublesome due to the frameworks used. Even when multiple implementations are available, ensuring
identitcal results is all but simple. The Arcade Learning Environment and the deep learning framework may rely on
identical results is all but simple. The Arcade Learning Environment and the deep learning framework may rely on
randomness, for example when sampling actions. Thus, one must ensure identical seeds and the same way of interfacing
with the random number generator to make sure that its state remains the same in all algorithms used for
comparison.\todo{wordsmith this} If the different algorithms utilize different deep learning frameworks, this issue may
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment