How to use Q Learning in Video Games Easily

How to use Q Learning in Video Games Easily


Hello World! Its Siraj point A to point B using a special type
of reinforcement learning called Q learning reinforcement learning means
learning by interacting with an environment through positive feedback or
reinforcement it’s similar to how you give a dog a treat but only if it rolls
over and it’s evolved over the past few decades in the late 1950s an American
mathematician named Richard bellman was trying to solve what he called the
optimal control problem this describes the problem of designing an agent to
minimize some behavior of a system over time eventually he and his colleagues
finally discovered a possible solution to it which was later called the bellman
equation it describes the value of a problem at a certain point in time in
terms of the payoff made by previous decisions and it also describes the
value of the remaining decision problems that result from the initial decisions
by involving a systems various states in this way it broke the problem down into
simpler subproblems the bellman equation is now used in many many fields it helps
with minimizing flight time for airplanes maximizing profits for hedge
funds minimizing the beef that soldier boy seems to have with everyone yeah
John episode no no no no no no Suraj it wasn’t b1 though bellman Squad was
making waves throughout the math community but meanwhile a psychologist
named Edward Thorndike was trying to understand how learning works by
studying the animal kingdom he came up with what he called the law of effect
which states responses that produce a satisfying effect in a particular
situation become more likely to occur again in that situation and responses
that produce a discomforting effect become less likely to occur again in
that situation thanks Captain Obvious now is actually pretty important
discovery one of his experiments was putting a cat in a wooden box and
observing it while it tried a bunch of different ways of getting out until it
finally hit the lever that opened the box when he put the cap back in the box
it immediately knew to hit the lever to get out and it was able to do that
because of the process of trial and error which is what Thorndike was
essentially describing in the law of effect a couple decades later a British
computer scientist named Chris Watkins thought that perhaps these two ideas
could be combined to create a new type of learning algorithm the idea of
designing an agent that minimizes some behavior of a system over time like the
bellman equation and does through the process of trial and error
similar to the law of effect and so he invented a novel reinforcement learning
technique he called q-learning so what is this let’s say we had five
rooms in a building connected by doors and we’ll just think of everything
outside of the building as one big room all of space-time is room five we can
think of this system as a graph each room is a node and each door is a link
like room one has doors to both room five and three so they’re connected our
goal is to put an agent in any room and for it to learn how to get to room five
through trial and error so the goal room is room five the set room five is a goal
we can associate a reward value with each door which is the link between
nodes so doors that lead immediately to the goal room get an instant reward of
100 doors not directly connected to our goal room gets zero reward in q-learning
the goal is to reach the state with the highest reward through a set of actions
so if each room is a state each action is represented by an arrow and the
mapping of state to action is the agents policy it uses the reward value as a
signal to improve its policy over time and it stores what it has learned
through experience and what’s called the Q matrix the rows represent the possible
states and the columns are possible actions leading to the next state it
updates the Q matrix over time as it learns the best actions to maximize the
reward seems pretty useful right shoe learnings got to be used everywhere in
video games right consumer video game BOTS need to be good but not so good
that a human couldn’t beat them the bots that used Q learning to master games
like chess and checkers and most recently Atari games become insanely
good at whatever they play academic AI learns while consumer game AI generally
just makes educated guesses it doesn’t really learn and its actions are all
scripted but as different as they are the two fields are converging as we
discover more about machine learning for example in Forza Motorsport you can
create a drive ATAR for yourself it’s an AI that learns how you drive by
observing you and can then imitate your driving style having adapted behavior
like this will make games more interesting and there’s a lot of
potential for more of it so let’s write out a 10 line high-level Python script
that uses cue learning to train a bot to get from point A to point B this game is
a 5 by 5 grid our agent is a yellow square and the goal is to find its way
to the green square or the Red Square to end the game each cell represents a
state the agent can be in and there are 4 actions up down left and right moving
a step will give us a reward of neck point zero for the red cell gives us
negative one and the green cell gives us positive one so we ideally want to get
to the green cell every time the game world is already built for us so we’ll
just start off by importing that at the top then in our main function we can go
ahead and create a while statement set to true because we want our agent to run
indefinitely next we’ll initialize our bots position
from a world class and set it to the agent variable so now we want our bot to
pick the right action to take in the game world and the question is how do we
decide that I got a grid of squares measured five by
five and I’m gonna get to grade in one piece alive I’m gonna make a cube matrix
initialize keeping every single reward AB line archived then I’ma pick an
action straight how to queue then I’m gonna do it yeah you just fresh and new
update queue width reward on B and once I got that I’ll go ahead and repeat yeah
we’ll use our box position as a parameter for the max Q function which
we’ll choose an action from our queue matrix as well as a potential reward
then we can perform that action by inputting the action as a parameter to
the do action method which will return our bot the action we took the reward we
received and the updated bots position after taking the action now we’re ready
to update our Q matrix so we’ll use the updated BOTS position as the parameter
we’ll print out both parameters to terminal so we can observe the results
we’ll run the script by typing in Python Lerner PI into terminal and it’ll pop up
as a GUI the bot will immediately start trying out possible paths to get to the
green one and we can observe the score in terminal improving over time this bot
in particular gets really good really fast like in 10 seconds it’s already
found the ideal path and is just going to keep doing it so to break it down
reinforcement learning is the process of learning by interacting with an
environment through positive feedback cue learning is a type of reinforcement
learning that minimizes the behavior of a system over time through trial and
error and it does this by updating its policy which is a mapping of state to
action based on a reward the coding challenge for this video is to modify
this code so that the game world is bigger and has more obstacles let’s make
it harder for our Q learning BOTS to find the optimal strategy details are in
the readme poster github link in the comments and I’ll announce the winner
next video for now I’ve got to optimize my life so thanks for watching

100 thoughts on “How to use Q Learning in Video Games Easily”

  1. Traceback (most recent call last):
    File "Learner.py", line 23, in <module>
    Q[(i, j)][action] = w
    KeyError: (4, 1)

    This is what I get after running the program shared by siraj via github link. Can anyone help?

  2. Hi @Siraj Rawal,

    Great tutorial and it surely helped me a lot. I was wondering how this approach can be used for a continuous state space(if that's the right term to use)? For eg. A race car finding its way around a track.
    If possible, can you do a tutorial on that too? 😛

  3. Nice video. Is it correct to call it Reward? Isn't it both Rewarded and Punished. Like both a Carrot AND Stick method of learning?

  4. Whats the reasoning behind making the value of walk_reward as -0.04? I mean why was this value chosen? Could you write something to optimise this number?

  5. Siraj, your eyebrows are amazing. And I think learning rate selection can be really important depending on your application of the decision process.

  6. At first I was annoyed by the level of excitement, but then I realized it made it way easier to pay attention to than my lectures. Thanks for that extra effort!

  7. Started AI and ML Recently. Thank God for finding you guys soon. I do have some queries to you. How could I contact you ? Any Forums ? How could I start with Q Learning ? What is the basic environmental Steps needed ?

  8. Man, this is just amazing. Every single topic of every single subject must be like this… fun, interactive, fascinating.

  9. Hey Siraj, how would you go about recreating the image generation used in https://youtu.be/-wmtsTuHkt0 ?

  10. Hi raj, great video as always 🙂 I wrote my self with the help of the internet a python code to implement Tic Tac Toe game with Reinforcement Learning ( Q- Learning), it learns to play the game by playing with itself and learning from the outcomes and gets better at it.
    Here's the github link for the same:
    https://github.com/jamesq9/Tic-Tac-Toe-Machine-Learning-Using-Reinforcement-Learning

  11. There are a lot of videos on youtube that explain this topic really well but this is not one of them. It doesn't really help you understand the idea behind it but only briefly outlines it with a few buzzwords and makes the thing seem more complicated than it really is

    I don't want to spread negativity but I sincerely believe that videos are just ckickbait with some buzzwords mixed with some forced jokes without any useful information that you could use to really learn something.

  12. Thanks Siraj for this series of awesome tutorial, the rap is great, I don't know you have the talent of rap before, go for American Talents!

  13. I still don't understand how can we set the Q matrix in case of e.g. games where we have no idea whether an action is good or not.

  14. Hi siraj, I have a question. This helps to get from point A to B. Now let's say I want to cover each space in the grid as my goal, is qlearning a way to do it? Or what do u Suggest? Great videos. 😊😊

  15. print "create siraj! you have points to assign to strength, health, wisdom, and dexterity."
    name=raw_input("Siraj Hair ")
    points=30
    attributes=("python", "tensorflow", "neural network", "iq")
    python=0
    Tensorflow =0
    neural network=0
    iq=180
    while True:
    print
    print "you have", points, "points left."
    print
    """
    1-add points
    2-take points
    3-see points per attribute
    4-exit
    """
    choice=raw_input("bla bla bla ")
    if choice=="1":
    attribute=raw_input("Siraj is bored ")
    if attribute in attributes:
    add=int(raw_input("how many points? "))
    if add<=points and add>0:
    if Siraj=="Python":
    Python+=add
    print name, "now has", strength, "Blabla points."
    elif attribute=="health":
    health+=add
    print name, "now has", health, "health points."
    elif attribute=="wisdom":
    wisdom+=add
    print name, "now has", wisdom, "wisdom points."
    elif attribute=="dexterity":
    dexterity+=add
    print name, "now has", dexterity, "dexterity points."
    points-=add
    else:
    print "invalid number of points."
    else:
    print "invalid attribute."
    elif choice=="2":
    attribute=raw_input("which attribute? strength, health, wisdom, or dexterity? ")
    if attribute in attributes:
    take=int(raw_input("how many points? "))
    if attribute=="strength" and take<=strength and take>0:
    strength-=take
    print name, "now has", strength, "strength points."
    points+=take
    elif attribute=="health" and take<=health and take>0:
    health-=take
    print name, "now has", health, "health points."
    points+=take
    elif attribute=="wisdom" and take<=wisdom and take>0:
    wisdom-=take
    print name, "now has", wisdom, "wisdom points."
    points+=take
    elif attribute=="dexterity" and take<=dexterity and take>0:
    dexterity-=take
    print name, "now has", dexterity, "dexterity points."
    points+=take
    else:
    print "invalid number of points."
    else:
    print "invalid attribute."
    elif choice=="3":
    print "strength -", strength
    print "health -", health
    print "wisdom -", wisdom
    print "dexterity -", dexterity
    elif choice=="4":
    if points==0:
    break
    else:
    print "use all your points!"
    else:
    print "invalid choice."
    print "congrats! you're done designing "+name+'.'
    print name, "has", strength, "strength points,", health, "health points,", wisdom, "wisdom points, and", dexterity, "dexterity points."

  16. The pictures (and some content) are literally copied from https://www.cs.cmu.edu/~epxing/Class/10701/slides/lecture21.pdf

  17. hii Siraj,

    I have recently working on a project called raspberry Cluster computer,
    I have built a small 6 node raspberry, can i use that cluster for q learning to make it learn Mario game.

    thank you.

  18. Can I give you an opinion?
    First, your videos are very interesting and helpful.
    But to me it seems like there's a conflict in the audience you target. On the one hand, you're speaking to developers who like to think in a structured and very logical way, but on the other hand, your quick paced style of editing, intercut with funny clips, seems made to attract passive viewers who only stay here because it moves a lot but don't really listen.
    This is fine, but at the same time it can be distracting for the focused viewer who really tries to deeply understand the core concepts. I'm just pointing this out because sometimes it's hard to pinpoint what is good and what is bad from the inside, and an outsider's opinion can help.

  19. Siraj, your videos are so excellent that they have driven me to make my first ever comment on Youtube. Just wanted to tell you to keep it up! You strike the perfect balance between between helpfulness and entertainment. And don't listen to the haters, keep rapping 🙂

  20. Bro you are simply awsome with lots of stuffs you came up with,presentation style and everything is incredible!!!!! ….

  21. I'm a novice in the field of machine learning and have some experience with different frameworks. I have a simple project in mind that I want to executed. I want to use machine learning to set signal light timers (especially green light duration) based on the number of cars (for a 4-way junction ). Basically a dynamic traffic light which will set a dynamic timer for each red light based on the number of cars at that red light. What kind of model should I use for this and a link to relevant resources will be very useful as well.

  22. When the cat was in the box, and it was not being observed, was the cat dead, alive, or did it exist in both states simultaneously?

  23. meh why is markov didn't mention?
    also made a blog for different approach of qlearn:
    https://planktonfun.github.io/q-learning-js/

  24. It's great! I am just a student first years and I am trying to studing IT. Your video make my problem easier. I hope you to make a lot of video about IT, learning! (Y) Better luck in the future

  25. hey can anyone tell me where can i find the whole package of the demo? If it is on GitHub please send me the link bros ! thanks

  26. How do we apply q learning for a turn based game(games like tic tac toe,chess etc) does max(s',a') be max q value in opponents turn

  27. I am using anaconda 3 python env, was not able to install TKinter. Tried all different ways. If anyone can help me …

  28. Thx, Great Video. First youtube programmer I saw that uses meme pics. Appreciate that. 🙂

    (Ps: thx for putting the source code too. Helped a lot)

  29. Cat: It might be this lever (translated from meow)
    Person: puts back on the box
    Cat: I HAVE NO CLUE (translated from meow)

  30. watching this video series has changed my life 🙂 amazing stuff! coming from an old SQL programmer…

  31. the code is so high level it made things more confusing until I read a lower level script plase stp wit da high levl

Leave a Reply

Your email address will not be published. Required fields are marked *