Which type of machine learning algorithm uses reward based learning system?

What do you do when a dog or child misbehaves? You scold them to make sure that they do not repeat and learn bad behavior. On the other hand, you will reward them if they do something good, to instill good behavior. Believe it or not, this system of positive or negative reinforcement is also used to train machines. It is called reinforcement learning, and it helps us come up with unique solutions. Q learning is a type of reinforcement learning which is model-free! 

In this article we will talk about what is Q-learning and how to go about implementing it.

Become an Expert in All Things AI and ML!

AI Engineer Master’s ProgramExplore Program

Which type of machine learning algorithm uses reward based learning system?

What Is Reinforcement Learning?

In machine learning, a common drawback is the vast amount of data that models need to train. The more complex a model, the more data it may require. Even after all this, the data we get may not be reliable. It may have false or missing values or may be collected from untrustworthy sources.

Reinforcement Learning overcomes the problem of data acquisition by almost completely removing the need for data!

Reinforcement learning is a branch of Machine Learning that trains a model to come to an optimum solution for a problem by taking decisions by itself. 

It consists of:

  • An Environment, which an agent will interact with, to learn to reach a goal or perform an action.
  • A Reward if the action performed by the model is bringing us closer to the goal/is leading to the goal. This is done to train the model in the right direction. 
  • A negative reward if it performs an action that will not lead to the goal to prevent it from learning in the wrong direction.

Reinforcement learning requires a machine learning model to learn from the problem and come up with the most optimal solution by itself. This means that we also arrive at fast and unique solutions which the programmer might not even have thought of.

Consider the image below. You can see a dog in a room that has to perform an action, which is fetching. The dog is the agent; the room is the environment it has to work in, and the action to be performed is fetching.

Which type of machine learning algorithm uses reward based learning system?

Figure 1: Agent, Action, and Environment

If the correct action is performed, we will reward the agent. If it performs the wrong action, we will not give it any reward or give it a negative reward, like a scolding.

Which type of machine learning algorithm uses reward based learning system?

Figure 2: Agent performing an action

What Is Q-Learning?

Q-Learning is a Reinforcement learning policy that will find the next best action, given a current state. It chooses this action at random and aims to maximize the reward.

Which type of machine learning algorithm uses reward based learning system?

Figure 3: Components of Q-Learning

Free Course: Machine Learning Algorithms

Learn the Basics of Machine Learning AlgorithmsEnroll Now

Which type of machine learning algorithm uses reward based learning system?

Q-learning is a model-free, off-policy reinforcement learning that will find the best course of action, given the current state of the agent. Depending on where the agent is in the environment, it will decide the next action to be taken. 

The objective of the model is to find the best course of action given its current state. To do this, it may come up with rules of its own or it may operate outside the policy given to it to follow. This means that there is no actual need for a policy, hence we call it off-policy.

Model-free means that the agent uses predictions of the environment’s expected response to move forward. It does not use the reward system to learn, but rather, trial and error.

An example of Q-learning is an Advertisement recommendation system. In a normal ad recommendation system, the ads you get are based on your previous purchases or websites you may have visited. If you’ve bought a TV, you will get recommended TVs of different brands. 

Which type of machine learning algorithm uses reward based learning system?

Figure 4: Ad Recommendation System

Using Q-learning, we can optimize the ad recommendation system to recommend products that are frequently bought together. The reward will be if the user clicks on the suggested product.

Which type of machine learning algorithm uses reward based learning system?

Figure 5: Ad Recommendation System with Q-Learning

Important Terms in Q-Learning

  1. States: The State, S, represents the current position of an agent in an environment. 
  2. Action: The Action, A, is the step taken by the agent when it is in a particular state.
  3. Rewards: For every action, the agent will get a positive or negative reward.
  4. Episodes: When an agent ends up in a terminating state and can’t take a new action.
  5. Q-Values: Used to determine how good an Action, A, taken at a particular state, S, is. Q (A, S).
  6. Temporal Difference: A formula used to find the Q-Value by using the value of current state and action and previous state and action.

What Is The Bellman Equation?

The Bellman Equation is used to determine the value of a particular state and deduce how good it is to be in/take that state. The optimal state will give us the highest optimal value. 

The equation is given below. It uses the current state, and the reward associated with that state, along with the maximum expected reward and a discount rate, which determines its importance to the current state, to find the next state of our agent. The learning rate determines how fast or slow, the model will be learning. 

Which type of machine learning algorithm uses reward based learning system?

Figure 6: Bellman Equation   

How to Make a Q-Table?

While running our algorithm, we will come across various solutions and the agent will take multiple paths. How do we find out the best among them? This is done by tabulating our findings in a table called a Q-Table.

A Q-Table helps us to find the best action for each state in the environment. We use the Bellman Equation at each state to get the expected future state and reward and save it in a table to compare with other states. 

Lets us create a q-table for an agent that has to learn to run, fetch and sit on command. The steps taken to construct a q-table are :

Step 1: Create an initial Q-Table with all values initialized to 0

When we initially start, the values of all states and rewards will be 0. Consider the Q-Table shown below which shows a dog simulator learning to perform actions :

Which type of machine learning algorithm uses reward based learning system?

Figure 7: Initial Q-Table      

Step 2: Choose an action and perform it. Update values in the table

This is the starting point. We have performed no other action as of yet. Let us say that we want the agent to sit initially, which it does. The table will change to:

Which type of machine learning algorithm uses reward based learning system?

Figure 8: Q-Table after performing an action

Step 3: Get the value of the reward and calculate the value Q-Value using Bellman Equation

For the action performed, we need to calculate the value of the actual reward and the Q( S, A ) value

Which type of machine learning algorithm uses reward based learning system?

Figure 9: Updating Q-Table with Bellman Equation

Step 4: Continue the same until the table is filled or an episode ends

The agent continues taking actions and for each action, the reward and Q-value are calculated and it updates the table.

Which type of machine learning algorithm uses reward based learning system?

 Figure 10: Final Q-Table at end of an episode

Want To Become an AI Engineer? Look No Further!

AI Engineer Master’s ProgramExplore Program

Which type of machine learning algorithm uses reward based learning system?

Q-Learning With Python

Let's use Q-Learning to find the shortest path between two points. We have a group of nodes and we want the model to automatically find the shortest way to travel from one node to another. We start by importing the necessary modules:

Which type of machine learning algorithm uses reward based learning system?

 Figure 11: Import necessary modules

Then we define all possible actions or the points/nodes that exist.

Which type of machine learning algorithm uses reward based learning system?

Figure 12: Define the actions

We define the rewards array for every action.

Which type of machine learning algorithm uses reward based learning system?

Figure 13: Define the rewards

We define our environment by mapping the state to a location and set the discount factor and learning rate:

Which type of machine learning algorithm uses reward based learning system?

Figure 14: Create Environment and set variables

We then define our agent class and set its attributes. 

Which type of machine learning algorithm uses reward based learning system?

Figure 15:Define Agent           

We then define its methods. The first method we refer to is training, which will train the robot in the environment. 

Which type of machine learning algorithm uses reward based learning system?

Which type of machine learning algorithm uses reward based learning system?

Figure 16: Define a method for how the agent interacts with the environment                  

We then define a method to select the optimal route for the next state.

Which type of machine learning algorithm uses reward based learning system?

Figure 17: Define a method to get optimal route

Now, let's call our agent and check the shortest route between points L9 and L1:

Which type of machine learning algorithm uses reward based learning system?

Figure 16: Find the shortest route between two points

As we can see, the model has found the shortest path between points 1 and 9 by traversing through points 5 and 8. 

Conclusion 

In this article titled ‘What is Q-Learning? The best guide to Q-Learning’, we first looked at a sub-branch of machine learning called Reinforcement Learning. We then answered the question, ‘What is Q-Learning?’ which is a type of model-free reinforcement learning. The different terms associated with Q-Learning were introduced and we looked at the Bellman Equation, which is used to calculate the next state of our agent. We looked at the steps required to make a Q-Table and finally, we saw how to implement Q-Learning in Python with a demo.

We hope this article answered the question which was burning in the back of your mind: ‘What is Q-Learning?’. 

Do you have any doubts or questions for us? Mention them in this article's comments section, and we'll have our experts answer them for you at the earliest!

Find our Professional Certificate Program in AI and Machine Learning Online Bootcamp in top cities:

NameDatePlaceProfessional Certificate Program in AI and Machine LearningCohort starts on 9th Jan 2023,
Weekend batchYour CityView DetailsPost Graduate Program in AI and Machine Learning, SingaporeCohort starts on 16th Jan 2023,
Weekend batchSingaporeView DetailsProfessional Certificate Program in AI and Machine LearningCohort starts on 23rd Jan 2023,
Weekend batchYour CityView Details

About the Author

Which type of machine learning algorithm uses reward based learning system?
Mayank Banoula

Mayank is a Research Analyst at Simplilearn. He is proficient in Machine learning and Artificial intelligence with python.

Which type of learning method learns based on rewards and feedback?

What is Reinforcement Learning? Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to behave in an environment by performing the actions and seeing the results of actions.

Which machine learning models are training to make a sequence of decisions based on the rewards?

Reinforcement learning is the training of machine learning models to make a sequence of decisions. The agent learns to achieve a goal in an uncertain, potentially complex environment.

What is reward based learning called?

Reward learning is a type of reinforcement learning.

Which machine learning algorithm rewards the machine on getting favorable outcomes?

Reinforcement Learning method aims at using observations gathered from the interaction with the environment to take actions that would maximize the reward or minimize the risk.