Google is using YouTube to help artificial intelligence understand how humans behave.
The firm hopes that by binge-watching 57,600 YouTube clips of 80 different types of human actions, its AI will be able to better predict people.
It could also help advertisers tailor their campaigns to actions people are more likely to watch, in order to more effectively sell their products.
A new Google project aims to help AI overcome a major hurdle – understanding how humans behave. The tech firm hopes that by binge-watching 57,600 YouTube clips of 80 different types of human actions, bots will be able to better predict people
Google is training its AI using what it terms atomic visual actions (AVAs).
These are three-second clips of people performing everyday actions, from walking and standing up to kicking and drinking from a bottle.
The Mountain View company says it sourced the content from a variety of genres and countries of origin, including clips from mainstream films and TV, to ensure a wide range of human behaviours appear in the data.
Writing in a blog post, Google software engineers, Chunhui Gu and David Ross, said: ‘Teaching machines to understand human actions in videos is a fundamental research problem in Computer Vision.
‘Despite exciting breakthroughs made over the past years in classifying and finding objects in images, recognising human actions still remains a big challenge.
‘This is due to the fact that actions are, by nature, less well-defined than objects in videos.
‘We hope that the release of AVA will help improve the development of human action recognition systems.’
The Google Research divides each of the videos into 15-minute segments.
It then split them into 300 non-overlapping AVAs.
Google is training its AI using what it terms atomic visual actions (AVAs), three-second clips of people performing everyday actions, from walking and standing up to kicking and drinking from a bottle
The Google Research began by dividing each of the videos into 15-minute segments. It then split them into 300 non-overlapping AVAs. Here the action shows a young man drinking from a bottle
Each clip contains labels for the action and the individuals featured.
Each activity is then put into one of three groups: pose and movement, person and object interaction or person to person interaction.
Overall, 96,000 individuals were labelled in the project, along with 210,000 distinct examples of an action being.
During training, the Google AI focuses on one or two individuals in each clip.
Each activity was put into one of three groups: pose and movement, person and object interaction or person to person interaction. Here we see two men sitting on a bench
The categories resulted in 96,000 individuals being labelled and identification of 210,000 distinct actions. Here the system recognises the man on the right standing to greet a dog
This allowed the machine understand that two people are required for certain actions, like hugging or shaking hands.
The algorithm was also able to understand when one person is doing a number of things at the same time, including singing while playing an instrument.
This allowed the bot learn that certain actions are often accompanied by another.
The full findings were released in an accompanying research paper, published on the online e-print repository Arixv.org
The Google AI focuses on one or two individuals in each clip. This lets the machine learn that two people are required for certain actions, like hugging