BaM™ Product Highlight: IBM

Mon 05, 11 2018

PRODUCE BaM™ Award Finalist: IBM Watson Captioning Dramatically Scales Caption Generation

Today, news moves at lighting speeds while consumers are adapting to media in new ways, such as watching content without sound in loud areas. The progress of technology, both in the speed that news is delivered and through broadening viewing options, presents challenges for those producing content and raises the need to caption material. However, delivering accurate captions for video is both time and resource intensive, which has been problematic as the amount of content being created grows.

To address this, IBM has developed an automated solution for generating closed captions for live and on-demand video. Called IBM Watson Captioning, the solution leans on IBM Watson’s machine learning capabilities, enabling the creation of accurate captions by training Watson with an expanded vocabulary and informs it to differentiate based on context.

[bctt tweet = “The IBM Watson Captioning solution leans on IBM Watson’s machine learning capabilities, enabling the creation of accurate captions by training Watson with an expanded vocabulary and informs it to differentiate based on context – BaM™ Product Highlight: IBM”]

Generation process

To automatically generate accurate captions, several different elements are at play. This includes speech recognition, the ability to accept audio and convert speech to a machine readable format like text, and audio recognition, the ability to separate noises from actual speech. The technology then goes through a vocabulary list where it will match up speech to words that it knows.

Additionally, the technology features a smart layout algorithm to manage how captions are laid out. This automatically segments caption cues at natural breaking points for readability, creating captions that aren’t too long. Content owners can also manually adjust caption length on a per line basis after the captions have been generated, shortening or combining caption lines as desired.

Context is also a sophisticated element of the automated captioning process where the AI shines. This pertains to the ability to differentiate between similar or identical sounding words and phrases, which is achieved through a combination of context, for example it might use “jeans” rather than “genes” if the conversation is around clothing, and also based on what the AI has learned through training.

AI training

To produce accurate captions, the AI can be trained to have an expanded vocabulary, learning names, new acronyms and more. In the instance that a word is improperly captioned, a quick correction will prompt the AI to learn and avoid that mistake in the future, getting smarter over time.

Through training, this technology can produce hyper-local captions, critical for live, local content. To facilitate this, IBM Watson Captioning can undergo intensive training to learn market-specific terminology, familiarizing the solution with relevant vocabulary, corpus, and acoustic data sets specific to a local market.

The technology’s ability to become more accurate over time occurs by providing past captioned content for IBM Watson to learn from, and by directly editing captions that IBM Watson generates from a web-based editor panel.

Learn more about the process by downloading this white paper.

Example

Below is an example of the technology, which was used to generate captions for a video detailing the greater AI-driven capabilities for video from IBM. In this instance, captions were generated and then edited through the online interface.

not assigned

watson(3)

BaM™ Product Highlight: IBM