Today at AssemblyAI, we are excited to announce the release of several new Summarization AI models:

The new models are:

  1. Informative which is best for files with a single speaker, like a presentation or lecture
  2. Conversational which is best for any multi-person conversation, like customer/agent phone calls or interview/interviewee calls
  3. Catchy which is best for creating video, podcast, or media titles

Having been trained on data relevant to a specific use case, each model provides state-of-the-art results for that particular use case. Each model also supports various summary lengths, allowing product teams to further tailor the summaries to their specific use case.

The new AI models in action

Let’s take a look at each of these new AI models in turn.

Informative

The Informative summary model is best for audio in which a single person is speaking, like in a presentation or lecture. Below we can see the ground truth transcript for a news segment along with the summary generated by the Informative summary model:

Summary

A train heading from Queens into Manhattan was stalled underneath the East River around 8.30am Monday morning. Part of the train’s contact shoe is thought to have touched the board instead of the rail, sparking the incident. More than 500 passengers were taken to Manhattan after spending roughly an hour and a half trapped. Service on the 7 line was suspended for almost two hours.

Conversational

The Conversational summary model is best for audio in which 2 or more speakers are having a conversation. Below we see the ground truth transcript for an interview along with the summary generated by the Conversational summary model:

Summary

Mary Brown comes to Mister Thompson to apply for a secretary. She tells Mister Thompson she can do everything a secretary is expected to do and she expects a salary of around $800 a month. Mister Thompson will let her know the result as soon as possible.

Catchy

The Catchy summary model is best for automatically generating taglines, headlines, etc. Below we see the ground truth transcript for a story about scientists discovering gravitational waves along with the summary generated by the Catchy summary model:

Summary

Scientists Find Gravitational Waves for the First Time (New Study)

Summary types

In addition to the three model types, which are each best used with a specific type of input, each model also offers different summary types, which are used to tailor the desired output.

The summary types are

  1. gist – 3-10 word summary
  2. headline – about 20 word summary (1-2 sentences)
  3. paragraph – 30-100 word summary (3-5 sentences)
  4. bullets – Bulleted list of paragraph summaries (max of 6)
  5. bullets_verbose – Same as bullets, but there is no limit on the number of bullets

The gist and headline summary types are available for the Catchy model, while the headlineparagraphbullets, and bullets_verbose summary types are available for the Informative and Conversational models.

The Catchy model generated the following summary above using the headline summary type:

Scientists Find Gravitational Waves for the First Time (New Study)

Using the gist summary type, we instead get this summary:

Scientists Find Gravitational Wave Signals

The Informative model generated the following summary above using the paragraph summary type:

A train heading from Queens into Manhattan was stalled underneath the East River around 8.30am Monday morning. Part of the train's contact shoe is thought to have touched the board instead of the rail, sparking the incident. More than 500 passengers were taken to Manhattan after spending roughly an hour and a half trapped. Service on the 7 line was suspended for almost two hours.

Similarly, using the headline summary type, we instead get this summary:

A train heading from Queens into Manhattan was stalled underneath the East River

Uses cases for summarization

Our customers have already built innovative, high ROI features using summarization. Our new Summarization models will open the doors to new possibilities and creative solutions. Here are a few use cases for which summarization is well suited

Conversation Intelligence

  1. Call centers – summarization makes it easy to pass information up the chain of command, make reviews, and monitor calls.
  2. Meetings – easily summarize virtual or in-person meetings, interviews, etc. for people who couldn’t be in attendance or for record-keeping.

Video and Podcasting

  1. Podcasting – summarization allows you to e.g. automatically generate episode descriptions at scale.
  2. Title Generation – add summarization to your workflow to automatically generate titles for video clips, making it easy to quickly release TikToks, YouTube Shorts, etc.

Media Monitoring

  1. News Aggregation – aggregate news intelligently by using summarization to generate titles and descriptions for news segments across many channels
  2. Social Monitoring – make it easier to digest and process huge quantities of data across social media platforms and more

These are just a few ways companies can incorporate AI-powered summarization into their processes to stay ahead of the competition.

Using the Summarization Models

You can check out this Colab notebook to see how to use the new Summarization models with Python, or move on to the next section to test them in a no-code fashion.

Using the new Summarization models is as simple as sending a POST request to the AssemblyAI API.

import requests
import time

API_TOKEN = "YOUR-TOKEN-HERE"
ENDPOINT = "https://api.assemblyai.com/v2/transcript"

json = {
    "audio_url": "https://bit.ly/3qDXLG8",
    "summarization": True,
    "summary_model": "informative",
    "summary_type": "bullets"
}
headers = {
    "authorization": API_TOKEN,
    "content-type": "application/json"
}

response = requests.post(ENDPOINT, json=json, headers=headers)

Once processing is completed, a simple GET request will fetch the results:

r = requests.get(f"{ENDPOINT}/{response.json()['id']}", headers=headers)
print(r.json()['summary'])

The corresponding summary for the audio file we used can be seen below:

- Neuroscientist Lisa Genova talks about the science of memory. Followed by a Q and A session with Ted science curator David Ballo. Recorded at a Ted Membership exclusive event in 2021.
- There's so many people who experience normal moments of forgetting. Part of the reason is perspective. Memory is very much influenced by context. Go back to the room you were in before you landed in this one and imagine the cues that were there.
- Most people don't think they have any influence over their brain health. What kinds of memory cues would be signs of abnormality? Or you should get further testing and checking. This becomes information that you can be in conversation with your doctor about.
- There are lots of reasons for having issues with retrieving memories. It can be sleep deprivation, it can be B twelve. Doesn't have to be Alzheimer's. It is something that you can hopefully address again, be involved in your brain health.
- Can diet help us to avoid memory loss? And can you kind of exercise your neurons into better memory through crossword puzzles or deeper relationships or anything like that? Exercise the diet. Sleep and stress and learning new things is also helpful.
- Adam Grant: The myth that you only use 10% of your brain is a fallacy. He says as you grow older, you don't lose the information of stuff you've learned. Grant: It's unlimited. There's no reason to think there's a limit to it.

Test our Summarization models today

The easiest way to test our new Summarization models is by going to the AssemblyAI Playground. The Playground is a no-code way to instantly see the results of our models on a local file or YouTube video.

Go to the AssemblyAI Playground and choose your audio source. We use the AssemblyAI Product Overview video from YouTube.

Next, select the Summarization model from the list of available models (as well as any other models you would like to test)

To change which Summarization model you are using and the type of summary to generate, click the three dots in the upper right hand corner of the Summarization model card.

After the audio has passed through our models, all results will be displayed in the browser. Below we can see the results from the Informative Summarization model on the above video with bullets summary type.