Google DeepMind and Open AI , a lab partially funded by Elon Musk, released a research article outlining a new method of machine learning. It actually takes its cues from humans when it comes to learning new tasks. This could be safer than allowing an AI to figure out how to solve a problem on its own , which has the potential to introduce unwelcome surprises.
The main problem that the research tackled was when an AI discovers the most efficient way to achieve maximum rewards is to cheat — the equivalent of shoving everything on the floor of your room into a closet and declaring it “clean.” Technically, the room itself is clean, but that’s not what’s supposed to happen. Machines are able to find these workarounds and exploit them in any given problem.
The issue is with the reward system, and that’s where the two groups focused their efforts. Rather than crafting an overly complex reward system that machines can cut through, the teams used human input to reward the AI. When the AI solved a problem the way trainers wanted to, it got positive feedback. Using this method, the AI was able to learn play simple video games.
While this is an encouraging breakthrough, it’s not widely applicable: This type of human feedback is much too time consuming. But through collaborations like this, it’s possible that we can control and direct the development of AI and prevent machines from eventually becoming smart enough to destroy us all.