Displaying caption transcript with React

In this article, we'll learn how to display caption transcripts of your videos using React.

There is a common format that video captions are typically formatted in, known as WebVTT, the format is straightforward, with only a couple of rules.

The file first starts with WEBVTT, we can use that to confirm we're reading the right kind of file, this line can also contain a bit of text which is known as the header. After a blank line, we're either going to find a note, or a cue.

A note is prefixed with NOTE, notes can either single line, or multiline.

A cue might start with an identifier, but typically starts with a time index, and then the cue body, which is the text that is displayed at the described time index.

The WebVTT files are going to look something like this:

vtt
WEBVTT Header text

NOTE A single line note

NOTE
A multiline note,
which spans multiple lines

00:00:00.000 --> 00:00:02.000
This is the first two seconds

00:00:02.001 --> 00:00:04.000
This is the second two seconds

00:00:04.001 --> 00:00:06.000
This is the last of the video

WebVTT does have many other little options that can be included, for example STYLE, but instead of discussing and implementing all that, we're going to utilise a pre-built module to parse & output our captions, meaning all we have to do is implement our transcription component.

For this project we're going to use Codesandbox.io using the react starter template

Now lets map out what we're going to build for our component, we will need:

A component to display the time component of the captions
A component to display the text component of the captions
A component to display individual captions
A component to consume the captions, and list it
Something to take the raw captions, parse, and then display

We're going to implement our components from the smallest component, to the "largest", but first we're going to define the model object we're going to be consuming, to parse our captions we're going to utilise subtitle, so our best option is to follow the same model returned by that, which will look like:

Fjson
[
  {
    "start": 0,
    "end": 2000,
    "text": "This is the first two seconds"
  },
  {
    "start": 2001,
    "end": 4000,
    "text": "This is the second two seconds"
  }
]

Lets start with our time component, we're going to create this with time.js, this component will accept three props, start, end, and format, we're going to wrap it in memo as well, which will stop any unneeded re-renders as we only want to re-render when the props change.

We're going to allow end to be undefined, which will allow us to control externally whether or not there is a time-range, or if we should just show start, which will result in less text to consume if end will display the same as the next start.

format is going to be used so we can keep a consistent format across all timestamps, which will be calculated by our getFormat function shown below.

We're going to utilise luxon to format our durations, you will need to add this as a dependency within Codesandbox.

Here is our time component, with both format generation shown, and our really small component that we can now use in our que component later on.

js
import React, { memo } from "react";
import { Duration } from "luxon";

export function getFormat(total) {
  const totalDuration = Duration.fromMillis(total);
  const format = "yy:MM:dd:hh:mm:ss";

  // Lets get our total, but with all our possible values
  const totalFormatted = totalDuration.toFormat(format);
  // We're going to split up our formatted string, strip leading zeros,
  // then get the index of the first section without a zero value
  const firstIndexWithoutZero = totalFormatted
    .split(":")
    .map(value => +value.replace(/^0/))
    .findIndex(numeric => numeric > 0);

  const formatSplit = format.split(":"),
    // Use mm:ss if total is wack
    indexToSplitFrom =
      firstIndexWithoutZero === -1
        ? formatSplit.length - 2
        : firstIndexWithoutZero;
  return formatSplit.slice(indexToSplitFrom).join(":");
}

export default memo(function time({ start, end = undefined, format }) {
  const startDuration = Duration.fromMillis(start),
    endDuration = end && Duration.fromMillis(end);
  const startFormatted = startDuration.toFormat(format),
    endFormatted = end && endDuration.toFormat(format);
  return (
    <div className="caption-time">
      {startFormatted}
      {end ? ` - ${endFormatted}` : undefined}
    </div>
  );
});

We can test our component by adding it to the default component in index.js:

js
// At the top of our file
import CaptionTime from "./time";

// Within `<div>` of the `App` component:
{getFormat(5000)}
<CaptionTime start={2000} format={getFormat(5000)} />
<CaptionTime start={2001} end={5000} format={getFormat(5000)} />

Within the viewer, we should see:

txt
mm:ss
00:02
00:02 - 00:05

Next we're going to create our text component within text.js, this component is super small, and wraps our text in an element we're going to use for styling. An additional feature of this component is that we're going to want to replace new lines with <br/> which will allow for multi-line captions, this component will accept a single prop, text.

To know if we need a line break or not (<br/> we're going to compare the index with the length of the split array, if its not the last element, we'll add it in.

js
import React, { memo, Fragment } from "react";

export default memo(function time({ text }) {
  return (
    <div className="caption-text">
      {
        text
          .split("\n")
          .map(
            (item, index, array) => (
            <Fragment key={index}>
              {item}
              {
                (index + 1) !== array.length ? <br /> : undefined
              }
             </Fragment>
            )
          )
      }
    </div>
  )
});

The same as with our time component, we can test this by adding it to our App component:

js
// At the top of our file
import CaptionText from "./text";

// Within `<div>` of the `App` component:
<CaptionText text="This is a test" />
<CaptionText text={"This is a test\nWith multiple lines\nThis is another!"} />

Just a note, when passing \n within a string, within a prop, wrap the string with {}, else the character will e converted to \\n!

Within the viewer, we should see:

txt
This is a test
This is a test
With multiple lines
This is another!

Now, lets create our caption component in caption.js, this will take the properties requested by CaptionTime and CaptionText, we're going to define this as a list item as well:

js
import React, { memo } from "react";
import CaptionTime from "./time";
import CaptionText from "./text";

export default memo(function Caption({ start, end = undefined, format, text}) {
  return (
    <li className="caption">
        <CaptionTime start={start} end={end} format={format} />
          <CaptionText text={text} />
    </li>
  );
});

Now, as with our components before, we'll add it to our App component to see how it works:

js
// At the top of our file
import Caption from "./caption";

// Within `<div>` of the `App` component:
<Caption
  start={2001}
  end={5000}
  format={getFormat(5000)}
  text="This is a test with time"
/>

Within the viewer, we should see:

txt
00:02 - 00:05
This is a test with time

Now, lets create our list component within list.js, this will take the original array we defined earlier, this is the component that we will use to find our consistent format, excluding that, we're going to map our captions directly to a caption component, we're also going to define our captions element as a list:

js
import React, { memo } from "react";
import { getFormat } from "./time";
import Caption from "./caption";

function getFormatFromList(captions) {
  if (!captions.length) return "mm:ss";
  return getFormat(captions[captions.length - 1].end);
}

function shouldUseEnd(end, index, captions) {
  const nextCaption = captions[index + 1];
  if (!nextCaption) {
    return false; // The end of the video
  }
  // Captions usually follow the rule where if they
  // are continous, the next caption starts at the next millisecoond
  return end !== nextCaption.start && end + 1 !== nextCaption.start;
}

export default memo(function Captions({ captions }) {
  const format = getFormatFromList(captions);

  return (
    <ol className="captions">
      {captions.map(({ end, ...rest }, index) => {
        const endToUse = shouldUseEnd(end, index, captions) ? end : undefined;
        return <Caption key={index} end={endToUse} {...rest} format={format} />;
      })}
    </ol>
  );
});

Now, again, lets test!

js
// At the top of our file
import Captions from "./captions";

// Within `<div>` of the `App` component:
<Captions
  captions={
    [
      {
        "start": 0,
        "end": 2000,
        "text": "This is the first two seconds"
      },
      {
        "start": 2001,
        "end": 4000,
        "text": "This is the second two seconds"
      },
      {
        "start": 5000,
        "end": 6000,
        "text": "This is the third caption"
      }
    ]
  }
/>

Now in our viewer we should see:

txt
00:00
This is the first two seconds
00:02 - 00:04
This is the second two seconds
00:05
This is the third caption

This is pretty good, we're almost there!

Now lets just add a bit of styling, this couldn't be any easier, all we need to do is add this to our styles.css file, which will auto size our "time column" so our "text column" is nicely aligned

css
.captions {
  display: grid;
  grid-template-columns: auto 1fr;
  text-align: left;
  border-bottom: 1px dashed rgb(140, 140, 140);
  list-style: none;
  margin: 0;
  padding: 0;
}

.caption {
  display: contents;
}

.caption .caption-text,
.caption .caption-time {
  border-top: 1px dashed rgb(140, 140, 140);
}

.caption .caption-text {
  padding-left: 5px;
}

We can now use this to display our captions anywhere.

Lastly we'll create a component in captions-raw.js that will take a raw captions string, which we will then parse and pass to the Captions component. We will need to add subtitle as dependency beforehand.

js
import React, { useMemo } from "react";
import { parse } from "subtitle";
import Captions from "./captions";

export default function CaptionsRaw({ raw }) {
  const parsed = useMemo(() => parse(raw), [raw]);
  return <Captions captions={parsed} />;
}

We can now do our final test, for this we're going to define an additional variable outside of our App.js component that we will pass to our CaptionsRaw component:

js
// At the top of our file
import CaptionsRaw from "./captions-raw";

const captions = `
WEBVTT

00:00.000 --> 00:02.000
This is the first two seconds

00:02.000 --> 00:04.000
This is the second two seconds
This is multiple lines
Another

00:05.000 --> 00:06.000
This is the third caption
`.trim();

// Within `<div>` of the `App` component:
<CaptionsRaw
  raw={captions}
/>

We should then see in our viewer:

txt
00:00
This is the first two seconds
00:02 - 00:04
This is the second two seconds
This is multiple lines
Another
00:05
This is the third caption

And thats that!

I hope you enjoyed this article, and get to using captions with your video media, along with being able to provide your users with a full transcript of videos.

Fabian Cook

Software Engineer @ Dovetail

JavaScript Developer.

Read similar articles

How to Build and Deploy Superheroes React PWA Using Buddy React Quickstart For Beginners

Sep 22, 2020

Displaying caption transcript with React

Start for free

Self-hosted

Cloud