Facebook and Instagram AI-generated Image Captions Updated for better information

Read more from Author Rachel Maga here: https://globelivemedia.com/author/rachel-maga/

Photos posted on Facebook and Instagram are analyzed by image analysis AI and captions are created. This AI has been further strengthened recently. The new system will help visually handicapped users, and in the future it will also help ordinary users find photos quickly.

AI analysis produces captions such as “a person standing next to a horse in a field” and “a dog on a boat” and saves it in the image metadata. This allows people who cannot see the image to understand what the image looks like.

In the past, photographers and media have manually added these accessibility captions. However, ordinary users who upload photos to social media often do not enter captions one by one. Technologies that allow AI to analyze and search images like Google Photos have made great strides in the last few years. It was clear that the convenience would be dramatically improved if this feature was introduced into social media.

Facebook developed an Automatic Alt Text system in 2016. This was long before machine learning began to spread. Since then, the team has made many improvements to speed up the process and make the content more detailed. The latest update adds the option to generate detailed captions on demand.

The improved system recognizes about 1200 types of objects and concepts, 10 times the original. The explanation is also detailed. It used to be “two people on the side of the building”, but now it’s possible to have a caption “two people take a selfie on the side of the Eiffel Tower” (the actual caption says “maybe”. , Avoid overly bold guesses).

Although not necessarily significant, in the example below, AI recognizes the relative position of people and objects.

2021 01 22 facebook caption

Image credit: Facebook

If a person is standing, he is taller than the drum, and if he is wearing a hat, it is above the person’s head. In such cases, it is not necessary to explain the positional relationship one by one. But what about “houses, trees, and mountains”? For such images, is the house on top of the mountain or in the foreground? Is the position of the tree in front of or behind the house? Or is it growing in a distant mountain?

In other words, even if it can be easily explained with a small number of words, it is necessary to generate detailed information behind the scenes. We may click on the image to enlarge it for more information. The “Generate detailed image description” command plays a similar role in captions (hold down for Android apps, launch with custom actions for iOS).

Perhaps the explanation would be “a house and multiple trees in front of a snowy mountain.” If that’s the case, it’s useful for image comprehension (of course this example was just conceived for illustration, but I think it’s probably going to improve in that direction).

This “detailed description” feature will first be test-published on Facebook, followed by Instagram. Captions can be translated into other languages ​​that are already supported. However, the feature itself does not seem to be extended to many languages ​​for the time being.

Rachel Maga
Rachel Maga is a technology journalist currently working at Globe Live Media agency. She has been in the Technology Journalism field for over 5 years now. Her life's biggest milestone is the inside tour of Tesla Industries, which was gifted to her by the legend Elon Musk himself.