Real time captions in your meetings with OpenVidu

OpenVidu
3 min readFeb 10, 2023

Do you need your video app to generate subtitles in real time? OpenVidu makes this task really simple with our Speech To Text feature.

OpenVidu Speech To Text is available in OpenVidu Pro and OpenVidu Enterprise editions. You have detailed instructions on how to deploy any OpenVidu edition here: Deploying OpenVidu

Let’s assume you have your OpenVidu application up and running. If you are still looking for an easy-to-use platform to add videoconference capabilities to your app, just check out our Hello World tutorial and you will have your first OpenVidu app running in a matter of minutes. Integrating the necessary code in your own pre-existing app will be very easy once you grasp the basics. And of course OpenVidu works with the most popular client technologies: JavaScript, React, Vue, Angular, Ionic, React Native…

Now let’s see how to use the Speech To Text feature in your app:

1. Choose the desire Speech To Text engine

OpenVidu supports multiple engines to transcript real time audio to text. You can choose from:

  • Azure
  • AWS
  • Vosk

Azure and AWS options require the developer to have accounts for those cloud providers, and users with permissions to manage Azure’s Speech to text service and AWS Transcribe service respectively.

Vosk is the open source option provided by OpenVidu. It won’t require any cloud provider key and you won’t incur extra expenses for using Azure or AWS services.

2. Enable Speech To Text in your OpenVidu deployment

Open the .envconfiguration file of your OpenVidu deployment (default path is /opt/openvidu/.env) and set the following environment variables, depending on the Speech To Text engine you chose:

For Azure (the credentials must have permissions for Azure’s Speech To Text service):

OPENVIDU_PRO_SPEECH_TO_TEXT=azure
OPENVIDU_PRO_SPEECH_TO_TEXT_AZURE_KEY=<AzureKey> ## e.g. rywfyDIAL5BM70ErU9O1XSIFzWk2QQhP
OPENVIDU_PRO_SPEECH_TO_TEXT_AZURE_REGION=<AzureRegion> ## e.g. westeurope

For AWS (the credentials must have permissions for the AWS Transcribe service):

OPENVIDU_PRO_SPEECH_TO_TEXT=aws
OPENVIDU_PRO_AWS_ACCESS_KEY=<AWS_ACCESS_KEY_ID> ## e.g. AKIAIOSFODNN7EXAMPLE
OPENVIDU_PRO_AWS_SECRET_KEY=<AWS_SECRET_ACCESS_KEY> ## e.g. wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
OPENVIDU_PRO_AWS_REGION=<AWS_DEFAULT_REGION> ## e.g. eu-west-1

For Vosk:

OPENVIDU_PRO_SPEECH_TO_TEXT=vosk

Restart your OpenVidu deployment for those variables to take effect, and you’ll be ready to go!

3. Use Speech To Text in your app

Receiving Speech To Text events in your application’s client code is a piece of cake. Let’s take for example this simple code that is enough to connect users to an OpenVidu Session and send and receive video:

var OV = new OpenVidu();
var session = OV.initSession();
var subscriberStream;
var publisherStream;

session.on("streamCreated", event => {
subscriberStream = event.stream;
session.subscribe(subscriberStream, “subscribers”);
});

await session.connect(TOKEN);

var publisher = OV.initPublisher("publisher");
await session.publish(publisher);

var publisherStream = publisher.stream;

You get a TOKEN to connect to the OpenVidu Session from your application’s server. See Application Server for further info.

Let’s add the final piece to receive Speech To Text events:

session.on("speechToTextMessage", event => {
if (event.reason === "recognizing") {
console.log("User " + event.connection.connectionId + " is speaking: " + event.text);
} else if (event.reason === "recognized") {
console.log("User " + event.connection.connectionId + " spoke: " + event.text);
}
});

await session.subscribeToSpeechToText(subscriberStream, "en-US");
await session.subscribeToSpeechToText(publisherStream, "en-US");

And voilà… That’s all. Adding a handler to the Session event speechToTextMessage and calling method subscribeToSpeechToText for every stream we want to transcribe is all we need. You can choose the language too! In this case people will be talking in US English (en-US). If you open your browser’s console and start talking you will see your transcription in real time.

That was easy, right? You have a working tutorial available: openvidu-speech-to-text. And our open source flagship app OpenVidu Call comes with Speech To Text built-in. Give those a try!

--

--