Foundations of Reinforcement Learning - A Personal Journey

Barto & Sutton in their book “Introduction to Reinforcement Learning” beautifully bring out the stark distinction between the complexity of a task performed by a machine and a human. They take the simple example of preparing breakfast, which could be considered a mundane effort from a human, based on their onset trials and pre-configured motor controls, which happens to be a rather complex set of actions for a machine to perform.

This simple thought draws great parallels from a scene in the movie “Imitation Game” where Alan Turing exclaims the following:

“Of course machines can’t think as people do. A machine is different from a person. Hence, they think differently. The interesting question is, just because something, uh… thinks differently from you, does that mean it’s not thinking?”

Leaving aside the debate of whether a machine can develop a conscience or not, which I believe is misdirected, we could hark back to the fascinating and intricate question of how we could make the machine think?

Reinforcement learning, which has been a derivative of adaptive systems and has existed for sixty decades, continues to lure curious minds to explore and translate learning to machines. If you are an amateur like me exploring this domain, here’s my take on it.

Breaking Down Reinforcement Learning

Let’s break down the steps in context of reinforcement learning - State, Action, Reward and Value:

1. State

My current state is an encapsulation of my experience with stochastic systems, Monte Carlo methods, distributed and adaptive systems. While there stands no pre-requisite to get immersed with RL, a good understanding of the underlying system and stochastic numeric involved helps your understanding of the formulation. Build on the foundations of state estimation methods, linear algebra, probability and random variables, this would give you a head start.

2. Action

The picture below depicts the action undertaken in order to maximise my reward. (I am cheating here since in a traditional RL setting, I wouldn’t know a priori if it would lead to a gain) Your action is a mere optimisation - reduce the amount the content you are consuming - pick a book like I did and in tandem a study partner who is undergoing the same rigour. And yes, please set up a space just to perform the learning routine.

3. Reward

Rewards are personal, based on your own projections of your future state. I pick some easy ones as personal awards:

A chocolate (sweet tooth!)
Play squash or head out for a run
Purchase things - like this book to accelerate my pace to reach the next defined state

Rewards can also be public:

Give a technical talk to your work colleagues or community forum based on your understanding
Have a vivid discussion with the experts of the field

4. Value

As the book mentions, value function is the hardest to quantify and formalise. Eating a chocolate - a low reward for your body could lead you to complete your pursuit of understanding the RL framework. Here, my personal bias is playing out, but it could be about deepening your understanding of machines and playing out the answer to Alan’s question.

Conclusion

Good luck with your learning journey, if you are choosing to pursue it.

P.S. You could spot a book on Combinatorics next to Sutton’s book. I am trying to rig more mathematics while I am pursuing this journey. I feel like a child prodigy who is being homeschooled due to a flawed and generalised learning structure. 😄