In 2020, OpenAI’s GPT-3 machine learning algorithm thrilled people when, after entering billions of words scraped from the internet, it began spitting out well-thought-out sentences. This year, DALL-E 2, a cousin of GPT-3 trained in text and images, caused similar confusion on the internet when it began to evolve surreal images of horse-riding astronauts and, more recently, to create strange, photorealistic faces of people who do not exist.
Now, the company says its latest AI has learned to play Minecraft after watching about 70,000 hours of video showing people playing the game on YouTube.
Compared to a number of previous Minecraft algorithms that work in much simpler versions of the “sandbox” game, the new AI plays in the same environment as humans, using standard keyboard and mouse commands.
IN blog post and preprint In detail about the work, the OpenAI team says the algorithm has learned basic skills, such as cutting trees, making boards, and making tables for making. They also watched him swim, hunt, cook and “jump on a pole”.
“To our knowledge, there is no published work operating in the full, unaltered space of human action, which includes drag and drop inventory management and item making,” the authors wrote in their paper.
With fine-tuning – that is, training the model on a more focused data set – they found that the algorithm performed all these tasks more reliably, but they also began to improve their technological strength by making wooden and stone tools and building basic shelters, exploring villages and chests.
After further fine-tuning with reinforcement learning, he learned to build a diamond pickaxe – a skill that takes human players about 20 minutes and 24,000 actions.
This is a notable result. AI has long struggled with Minecraft’s wide-open gameplay. Games like chess and Go, which AI has already mastered, have clear goals and progress toward those goals can be measured. To conquer Go, the researchers used learning with reinforcement, where the algorithm gets a goal and rewards it for progress towards that goal. Minecraft, on the other hand, has an unlimited number of possible goals, progress is less linear, and deep-gain learning algorithms usually spin.
For example, in the MineRL Minecraft competition for AI developers in 2019, none of the 660 entries scored a relatively simple goal of competition is diamond mining.
It is worth noting that in order to reward creativity and show that throwing computing power on a problem is not always the solution, MineRL organizers have set strict limits on participants: they are allowed one NVIDIA GPU and 1000 hours of recorded gameplay. Although the competitors performed excellently, the OpenAI score, achieved with more data and a 720 NVIDIA GPU, seems to show that computing power still has its advantages.
AI becomes tricky
With its video pre-training (VPT) algorithm for Minecraft, OpenAI has returned to the approach used with GPT-3 and DALL-E: a pre-training algorithm on a high set of human-created content data. But the success of the algorithm was not only made possible by computing power or data. Practicing Minecraft AI on so many videos before was not practical.
Raw videos are not as useful for behavioral AIs as they are for content generators such as GPT-3 and DALL-E. It shows what people do, but it doesn’t explain how they do it. In order for the algorithm to associate video with actions, it needs tags. A video box showing a player’s collection of objects, for example, should be marked as “inventory” next to the “E” command key used to open the inventory.
Tagging each frame in 70,000 hours of video would be … insane. So the team paid Upwork contractors to record and tag basic Minecraft skills. They used 2000 hours of this video to learn another algorithm for tagging Minecraft videos, and that the algorithm, IDM, tagged all 70,000 hours of YouTube footage. (The team says the IDM was over 90 percent accurate when marking keyboard and mouse commands.)
This approach by people who train a data tagging algorithm to unlock online behavior datasets can help AI learn other skills as well. “VPT paves the way for allowing agents to learn to act watching a huge number of videos on the internet, ”the researcher wrote. In addition to Minecraft, OpenAI thinks VPT can bring new real-world applications, such as algorithms that manage computers in an instant (imagine, for example, asking your laptop to find a document and email it to your boss).
Diamonds are not eternal
Perhaps to the great regret of the organizers of the MineRL competition, the results seem to show that computing power and resources are still driving the needle on the most advanced artificial intelligence.
Regardless of the cost of the calculation, OpenAI said Upwork contractors alone cost $ 160,000. Although to be fair, manually tagging an entire dataset would be millions and would take a long time to complete. And while computing power was not negligible, the model was actually quite small. VPT’s hundreds of millions of parameters are orders of magnitude less than hundreds of billions of GPT-3.
Still, the desire to find smart new approaches that use less data and computation is valid. A child can learn the basics of Minecraft by watching one or two videos. Today’s artificial intelligence requires much more to learn even simple skills. Making of AI more efficient is a big, worthwhile challenge.
Anyway, OpenAI is in the mood to share this time. Researchers say VPT is not risk-free – they have strictly controlled access to algorithms like GPT-3 and DALL-E in part to limit abuse – but the risk is minimal for now. They are open to data, environment and algorithm and are in partnership with MineRL. This year’s contestants are free to use, modify and fine-tune the latest in Minecraft AI.
Chances are good that this time they will succeed far from diamond mining.
#OpenAIs #learned #play #Minecraft #watching #hours #YouTube #Singularity #Hub