Since it’s inception in 2006, YouTube has taken strides toward captioning the videos posted onto it’s site. In 2007, YouTube gave users the ability to add their own captions, and in 2009, YouTube provided automated captioning on videos.
After announcing last month that over 1 billion videos on YouTube use automatic captioning and in 10 different languages, YouTube has taken another stride toward improving the quality of entertainment by adding automated captions that catch sound effects.
Captions such as [APPLAUSE], [MUSIC] and [LAUGHTER] will now be added to videos using automated captions. The below video is an example of how the new captioning works. (Press the CC button first)
But it doesn’t stop there. According to the Google Research Blog,
Investing in the development of this infrastructure has the added benefit of allowing us to easily incorporate more sound types in the future, as we expand our algorithms to understand a wider vocabulary of sounds (e.g. [RING], [KNOCK], [BARK]). In doing so, we will be able to incorporate the detected sounds into the narrative to provide more relevant information (e.g. [PIANO MUSIC], [RAUCOUS APPLAUSE]) to viewers.
These new automated sound effects were created thanks to the use of a Deep Neural Network that jumpstarted the machine-learning process. Deep Machine Learning was recently awarded the “Breakout Trend” Award at the SXSW Interactive Innovation Awards.
The automated captions on YouTube haven’t always been the best way to fully get the most out of videos. Most automated captions mix up words and display similar sounding words instead, leaving users to have to manually type in their own captions to get 100 percent accuracy. YouTubers, Rhett and Link, have even created an entire series out of poking fun at the automated YouTube captions.
However, the steps YouTube is making towards increasing the quality of automated captioning is a bright sign and viewers can be excited for the future of captioned videos.