Google DeepMind's Deep Q-learning playing Atari Breakout

Công Nghệ

Google DeepMind created an artificial intelligence program using deep reinforcement learning that plays Atari games and improves itself to a superhuman level. It is capable of playing many Atari games and uses a combination of deep artificial neural networks and reinforcement learning. After presenting their initial results with the algorithm, Google almost immediately acquired the company for several hundred million dollars, hence the name Google DeepMind. Please enjoy the footage and let me know if you have any questions regarding deep learning!


Recommended for you:
1. How DeepMind’s AlphaGo Defeated Lee Sedol –
2. How DeepMind Conquered Go With Deep Learning (AlphaGo) –
3. Google DeepMind’s Deep Q-Learning & Superhuman Atari Gameplays –

Subscribe if you would like to see more content like this:

– Original DeepMind code:

– Ilya Kuzovkin’s fork with visualization:

– This patch fixes the visualization when reloading a pre-trained network. The window will appear after the first evaluation batch is done (typically a few minutes):

– This configuration file will run Ilya Kuzovkin’s version with less than 1GB of VRAM:

– The original Nature paper on this deep learning technique is available here:

– And some mirrors that are not behind a paywall:

Web →
Twitter →


Xem thêm bài viết khác:

34 thoughts on “Google DeepMind's Deep Q-learning playing Atari Breakout

  1. Do you think that within the decade q-learning could manage to figure out how to play super Mario brothers on the nes with only visual input. It would have to learn the concept of lives and fail states, some things could play naturally like if it got to the first castle it knows that it needs to move to the right to progress, and certain actions can give you score. It would get to bowser, the sprite is moving. So it might be an enemy, or it could be a platform. But you die when you touch it, so it determines that this is a hazard that is mobile. It figured out that stationary hazards like reaching the bottom of the screen it can't kill with fireballs, but a mobile hazard can, up to this point. So it shoots it with fireballs, maybe dying once or twice to the fire before realizing that you cant jump on that. So it either avoids the enemy by jumping over it or going around it, or blasting it with fireballs. Once the enemy is clear, it will continue to navigate to the right, and it sees the score going up from the extra time. Probably  way harder to do than that but it could be feasible to do. Something like Zelda? maybe later.

  2. i wonder if it could handle another level of abstraction – specifically it is only given 1. the 2d array of pixels, 2. a set of 3 actions (which map to left/right/no movement), and 3. the state of on/off – i.e. is the game still going or not. basically the task is now not to get the highest score (though that will likely be a side effect of success) but to keep the program running as long as possible.

  3. I’m sure this is obvious but how do you program an AI to have an open goal like “as many points as possible”?

    Does it just note everything that happened in achieving a higher score and attempt to replicate that with minor changes to leave open the possibility of a better one?

    Does it figure out how the game actually works (such as needing to bounce the thing back) and avoid missing it, or is this a brute force approach where it reaches that end through trial and error?

    I find these things to be so interesting but very confusing lol

  4. One important point with this is that when researchers moved the "paddle" up a pixel the AI couldn't play the game at all even though it was at superhuman master level. So it was not able to abstract to something that was basically the exact same. This is an example of a hypersmart computer that lacks the common sense of a mouse.

  5. I remember as a kid my brothers and I were struggling over the same level on a video game. We had all taken a shot at it for an entire day and frustrated, we went to bed. We woke up the next morning and immediately powered on the playstation and took our controllers. Just as we were ready to sit on the couch and move our controls, we suddenly realized that the player was moving without our controlling it. Confused, we looked at one another. I said, "I'm not controlling it, are you?" All of us agreed that none of us were in control. Our confusion slowly turned to awe as we watched the level completed with an exactness and expertise never seen before. Our awe quickly turned to glee and we began shortly triumphantly at the screen "Go computer! Kick their butts!" And cheering on the A.I. haha. It won the level and will forever stay in our minds as a glorious day, when the computer decided to look fondly upon us and give us kids a second chance 🙂

  6. If AI can accomplish all intellectual tasks, the only field left to us human being is to develop spiritual values and moral virtues: courage, wisdom, justice, temperance

  7. 사람은 기계를 이길 수 있을까…
    이길 수 없다면 이 이상의 발전은 그만둬야 하는게 아닐까

  8. the thing is though, does it really "see" that it has tunneled through and bounced the ball off the back, or did the network simply NOT select against that behavior of tunneling? To test its understanding of delayed gratification, you'd have to introduce a consequence for tunneling that the AI "sees" is worth taking.

  9. Brick by Brick….. piece by piece….Tomato….Tamato what's the difference WatchTower/Deep Mind…😉

  10. What would REALLY be astonishing is that if learning algorithms can learn to play games like Mario, which is an NP problem, they could learn to solve NP problems and tell us how. Thus leading to a unifying or differentiation between P and NP problems in general. Amazing!

  11. It is not technically an algorithm, its an artificial intelligence that uses Q-learning with a neural network.

Leave a Reply

Your email address will not be published. Required fields are marked *