Today at AssemblyAI, we are excited to announce the release of several new Summarization AI models:

The new models are:

  1. Informative which is best for files with a single speaker, like a presentation or lecture
  2. Conversational which is best for any multi-person conversation, like customer/agent phone calls or interview/interviewee calls
  3. Catchy which is best for creating video, podcast, or media titles

Having been trained on data relevant to a specific use case, each model provides state-of-the-art results for that particular use case. Each model also supports various summary lengths, allowing product teams to further tailor the summaries to their specific use case.

The new AI models in action

Let’s take a look at each of these new AI models in turn.

Informative

The Informative summary model is best for audio in which a single person is speaking, like in a presentation or lecture. Below we can see the ground truth transcript for a news segment along with the summary generated by the Informative summary model:

Summary

A train heading from Queens into Manhattan was stalled underneath the East River around 8.30am Monday morning. Part of the train’s contact shoe is thought to have touched the board instead of the rail, sparking the incident. More than 500 passengers were taken to Manhattan after spending roughly an hour and a half trapped. Service on the 7 line was suspended for almost two hours.

Conversational

The Conversational summary model is best for audio in which 2 or more speakers are having a conversation. Below we see the ground truth transcript for an interview along with the summary generated by the Conversational summary model:

Summary

Mary Brown comes to Mister Thompson to apply for a secretary. She tells Mister Thompson she can do everything a secretary is expected to do and she expects a salary of around $800 a month. Mister Thompson will let her know the result as soon as possible.

Catchy

The Catchy summary model is best for automatically generating taglines, headlines, etc. Below we see the ground truth transcript for a story about scientists discovering gravitational waves along with the summary generated by the Catchy summary model:

Summary

Scientists Find Gravitational Waves for the First Time (New Study)

Summary types

In addition to the three model types, which are each best used with a specific type of input, each model also offers different summary types, which are used to tailor the desired output.

The summary types are

  1. gist – 3-10 word summary
  2. headline – about 20 word summary (1-2 sentences)
  3. paragraph – 30-100 word summary (3-5 sentences)
  4. bullets – Bulleted list of paragraph summaries (max of 6)
  5. bullets_verbose – Same as bullets, but there is no limit on the number of bullets

The gist and headline summary types are available for the Catchy model, while the headlineparagraphbullets, and bullets_verbose summary types are available for the Informative and Conversational models.

The Catchy model generated the following summary above using the headline summary type:

Using the gist summary type, we instead get this summary:

The Informative model generated the following summary above using the paragraph summary type:

Similarly, using the headline summary type, we instead get this summary:

Uses cases for summarization

Our customers have already built innovative, high ROI features using summarization. Our new Summarization models will open the doors to new possibilities and creative solutions. Here are a few use cases for which summarization is well suited

Conversation Intelligence

  1. Call centers – summarization makes it easy to pass information up the chain of command, make reviews, and monitor calls.
  2. Meetings – easily summarize virtual or in-person meetings, interviews, etc. for people who couldn’t be in attendance or for record-keeping.

Video and Podcasting

  1. Podcasting – summarization allows you to e.g. automatically generate episode descriptions at scale.
  2. Title Generation – add summarization to your workflow to automatically generate titles for video clips, making it easy to quickly release TikToks, YouTube Shorts, etc.

Media Monitoring

  1. News Aggregation – aggregate news intelligently by using summarization to generate titles and descriptions for news segments across many channels
  2. Social Monitoring – make it easier to digest and process huge quantities of data across social media platforms and more

These are just a few ways companies can incorporate AI-powered summarization into their processes to stay ahead of the competition.

Using the Summarization Models

You can check out this Colab notebook to see how to use the new Summarization models with Python, or move on to the next section to test them in a no-code fashion.

Using the new Summarization models is as simple as sending a POST request to the AssemblyAI API.

Once processing is completed, a simple GET request will fetch the results:

The corresponding summary for the audio file we used can be seen below:

Test our Summarization models today

The easiest way to test our new Summarization models is by going to the AssemblyAI Playground. The Playground is a no-code way to instantly see the results of our models on a local file or YouTube video.

Go to the AssemblyAI Playground and choose your audio source. We use the AssemblyAI Product Overview video from YouTube.

Next, select the Summarization model from the list of available models (as well as any other models you would like to test)

To change which Summarization model you are using and the type of summary to generate, click the three dots in the upper right hand corner of the Summarization model card.

After the audio has passed through our models, all results will be displayed in the browser. Below we can see the results from the Informative Summarization model on the above video with bullets summary type.