Convert Speech to Text using OpenAI Whisper in Power Apps

OpenAI has released a new neural network called Whisper, which is an open-source model that can convert speech to text with impressive accuracy. This model is specifically designed to transcribe spoken language into text with high precision and speed, making it an ideal tool for a variety of applications, such as virtual assistants and video captioning. Whisper relies on advanced machine learning algorithms to analyze audio signals from multiple languages and convert them into written text. OpenAI has recently made API endpoints available to the public since March 1, 2023, allowing developers to easily integrate this powerful technology into their own applications.

The Speech to Text Open API can

  • Transcribe audio into whatever language the audio is in.
  • Translate and transcribe the audio into English.

As of the date I am writing this post, this model is not available in Azure. In this blog post, I will cover how to use the Microphone control and File Upload control to convert speech to text using the OpenAI Whisper API in a Power Automate flow.

Download Link to the Sample App: https://github.com/ashiqf/powerplatform/blob/main/OpenAI-SpeechtoText.msapp. Replace the API Key in the Power Automate flow HTTP Action Authorization Header.

OpenAPI Speech to Text API:

The speech to text API provides two endpoints, transcriptions and translations. At present, the maximum file size allowed for uploads is 25 MB and the supported audio formats are mp3, mp4, mpeg, mpga, m4a, wav, and webm. In this blog post, I utilized the Translation API to demonstrate its capability to convert English audio into text, it can understand other languages as well

POST https://api.openai.com/v1/audio/translations

If you have not yet created an API key, please sign up/login for OpenAI and obtain it from there.

Body:

Integration with Power Apps:

I have used a Power Automate flow with the Power Apps trigger to invoke the Speech to Text API via the HTTP connector in Power Automate. Alternatively, you can achieve the same outcome by constructing a Custom Connector. This sample app can be downloaded from this github link.

Microphone Control:

The audio control captures audio input through the device’s microphone and will be sent to the Power Automate flow for conversion into text using the Whisper API. The audio format of the recording depends on the type of device being used

  • 3gp format for Android.
  • AAC format for iOS.
  • Webm format for web browsers.

I’ve tested this control from the app accessed through the web browser. If you encounter an unsupported audio format for OpenAI, you can use utilities such as FFMpeg. Additionally, a .Net version of the control is available for download which can be used in Azure Function. John Liu (MVP) has written a sample Azure function that handles the conversion of audio formats using the aforementioned utility.

Step 1: To add a microphone control to the canvas, insert the Microphone control from the command bar. To preview the recorded audio from the Microphone control, add an Audio control

Step 2: Add a button to convert and to trigger the Power Automate flow. Find below the Power FX code

//Generates a JSON Text with the binary of the Audio file or Recorded audio
Set(varJson,JSON(Microphone1.Audio,JSONFormat.IncludeBinaryData));
Set(strB64Audio, Last(Split(varJson, ",")).Value);
Set(strB64AudioContent, Left(strB64Audio, Len(strB64Audio) - 1));
//Extract Audio Format
Set(varAudioFileType,Mid(varFileContent,Find(":",varFileContent)+1,Find(";",varFileContent)-Find(":",varFileContent)-1));
//Call the Power Automate Flow
Set(audioText,'SpeechtoText-OpenAIWhisper'.Run(strB64AudioContent,varAudioFileType).audiotext);

The Power FX code performs the following task

  • Stores the audio captured by a Microphone control in a variable as JSON data, including binary data.
  • Extracts the base64-encoded audio content from the JSON data using the string manipulation functions Split, Left, Mid.
  • Determines the audio file type by parsing a string variable.
  • Uses the extracted audio content and file type to call the Power Automate flow ‘SpeechtoText-OpenAIWhisper’ to obtain the corresponding text transcription which comes in later section of this post.
  • Assigns the resulting text transcription to a variable named ‘audioText’, this is assigned to a Text Label to display the converted text from the OpenAI Whisper API.

Step 3: Add a Label control to display the converted Text set to the variable audioText

File Upload Control

As of the day I am writing this post there is no file control that can handle all types of files in Power Apps, I have created a custom component utilizing the Attachment control to create a file attachment control. For further details, please refer to blogpost Uploading Files Made Easy: A Guide to Using the Attachment Control in Power Apps to add the control to the app.

Step 1: Add the file attachment control to the app from the component library. Set the input property for Maximum Attachments to 1 from the component.

Step 2: To extract the binary content of an audio file, add an Image control to the app. The Image control is capable of working with any type of file to extract its content.

Step 3: Add a Button control to convert the Audio from the uploaded file. Find the PowerFX below

//Generates a JSON Text with the binary of the Audio file using the Image control
Set(varFileContent,JSON(Image1.Image,JSONFormat.IncludeBinaryData));
//Extract Base64 content
Set(varExtractedFileContent,Last(Split(varFileContent,",")).Value);
//Remove the last character " from the string
Set(varExtractedFileContent,Left(varExtractedFileContent,Len(varExtractedFileContent)-1));
//Extract Audio Format
Set(varAudioFileType,Mid(varFileContent,Find(":",varFileContent)+1,Find(";",varFileContent)-Find(":",varFileContent)-1));
//Call the Power Automate Flow
Set(audioText,'SpeechtoText-OpenAIWhisper'.Run(varExtractedFileContent,varAudioFileType).audiotext);

Step 4: Add a Label control to display the converted Text set to the variable audioText

Power Automate Flow

Now, let’s create a Power Automate flow with the Trigger type Power Apps to invoke the OpenAI Whisper API and convert speech to text. Step 1: Add two compose action (input parameters) to receive the audio format and content from either the recorded audio captured by the Microphone control or the uploaded audio file from the file attachment control in the Power Apps

{
  "$content-type": @{outputs('Compose-AudioFormat')},
  "$content": @{triggerBody()['Compose-FileContent_Inputs']}
}

Step 2: Add a HTTP connector to make a request to the Whisper API endpoint. Refer to the blog post How to use form-data and form-urlencoded content type in Power Automate or Logic Apps HTTP action for handling multipart/form-data in the HTTP action

Request Body:

{
  "$content-type": "multipart/form-data",
  "$multipart": [
    {
      "headers": {
        "Content-Disposition": "form-data; name=\"model\""
      },
      "body": "whisper-1"
    },
    {
      "headers": {
        "Content-Disposition": "form-data; name=\"file\";filename=\"audiofile.webm\""
      },
      "body": @{outputs('Compose-FileContent')}
    }
  ]
}

Step 3: Add the Respond to a PowerApp or a flow action to pass the converted text back to the app. To get the converted text, use the following expression

body('HTTP-CallaOpenApiModel')['Text']

The expression was constructed based on the response of the Whisper API call. In the event that the response property changes in the future, please ensure to update the expression accordingly.

Summary:

In this post, I’ve outlined a step-by-step guide on how to develop a basic app with Speech to Text functionality using Power Apps and a Power Automate flow leveraging the OpenAI’s Whisper API. The possibilities for using this technology are endless, from creating virtual assistants to generating audio captions and translations. Furthermore, the Whisper API can also be used to transcribe video files, adding even more versatility to its capabilities. It’s worth noting that while Azure offers its own Speech to Text service, it currently does not rely on the OpenAI Whisper Model. However, it’s possible that the two services will eventually integrate in the future. Hope you have found this informational & thanks for reading. If you are visiting my blog for the first time, please do look at my other blogposts.

Do you like this article?

Subscribe to my blog with your email address using the widget on the right side or on the bottom of this page to have new articles sent directly to your inbox the moment I publish them.

Advertisement

Convert Outlook Email with embedded images to PDF using PowerAutomate

Recently I’ve came across a business case with need to automate the conversion of Outlook email messages with embedded images to PDF document. This could be done manually on Outlook client using Microsoft Print to PDF or browser Print if opened using Outlook on the Web. This process can be automated with the help of PowerAutomate trigger When a new email arrives and actions Export Email, Convert File, Create file but if an email has an embedded image or HTML content it will not work as of now. There are Third party connectors in Power Automate from Muhimbi, Plumsail which might have this functionality but I’ve not tested those yet. PowerAutomate action Export Email converts the email to .eml file.

An EML file is an email message containing the content of the message, along with the subject, sender, recipient(s), and date of the message in plain text format. Once you have the .eml file change the file extension from .eml to .txt where you can see the content. If there is any embedded image it will stored in the Base64 format. You can also change the .eml file extension to .mht and open it directly in Internet Explorer

For this blogpost I’ve used third party API service from ConvertAPI to convert Email message to PDF, they have REST API endpoints to convert Word, Excel, PowerPoint, HTML, PDF and Image formats. There is also a Free Plan with ConvertAPI where you get 1500 seconds API execution time if you sign up.

You can also create your own API service hosted in Azure for conversion with the .NET libraries like iTextSharp, GroupDocs, PDFSharp etc. Let’s go ahead & create flow to

  1. Convert Email to PDF – Without Embedded image
  2. Convert Email to PDF – With Embedded image

The above two flows packages can be downloaded from Github repo.

Convert Email to PDF – Without Embedded image:

Power Automate connector OneDrive for Business has an action Convert file (preview) converts files to different formats such as PDF, HTML, JPG etc. This connector can be used to convert a simple email with out an embedded image.

Step 1: Create a flow with Automated trigger When a new email arrives & configure the trigger parameters by clicking Show advanced options.

Step 2: Add the action Export email with Message Id from the output of the previous action. This action creates the .eml file

Step 3: Add the action Create file from the connector OneDrive for Business. Select the Folder path from your One drive, Enter the File Name for the .eml file & the File Content should be Body from the output of the action Export email (Previous). Find the screenshot below

Step 4: Add the action Convert file from the connector OneDrive for Business with Id from the output of the previous action Create File.

Step 5: Add the action Create file from the connector OneDrive for Business. This step is for storing the PDF file back to the OneDrive. Select the Folder path from your One drive to store the PDF file, Enter the File Name for the PDF file & the File Content should be File content from the output of the action Convert file. Find screenshot below

Note: The storage location I’ve chosen is Onedrive, you can choose SharePoint, Azure blob etc. Based on the need you can choose to delete the .eml files after the file conversion is done.

Convert Email to PDF – With Embedded image:

As already said the previous flow will not convert an email with embedded image as expected. Be ready with the API endpoint from ConvertAPI to convert email to PDF. The endpoint will have the secret as a query string shown as below

https://v2.convertapi.com/convert/eml/to/pdf?Secret=yoursecretkeyfromconvertapi

Note: On this flow I will be using the .eml file generated from the previous flow.

Step 1: Create a flow with Instant trigger Manually trigger a flow.

Step 2: Add the action Get file content from the connector OneDrive for Business. Select the .eml file which has the embedded image from the storage location i.e the file from OneDrive.

Step 3: Add the action Compose from the connector Data Operation. This step is to convert in to base64 representation a requirement for the convert API to work. On the Inputs file go to the expression editor and add the function base64(file content from the previous action get file) for converting .eml to base64.

Step 4: Add the action HTTP (Premium) from the connector HTTP to make a POST request to the API convert API endpoint.

Method: POST

URI: https://v2.convertapi.com/convert/eml/to/pdf?Secret=yoursecretkeyfromconvertapi

Headers:

Key: Content-Type

Value: application/json

Body: You can generate this from the ConvertAPI site by uploading a .eml file on the site. Once this data is added to the HTTP action Body parameter change the Data parameter should be the Output of the previous action Compose – Convert to Base64

{
  "Parameters": [
    {
      "Name": "File",
      "FileValue": {
        "Name": "myemailfile.eml",
        "Data": "@{outputs('Compose_-_Convert_to_Base64')}"
      }
    }
  ]
}

Step 5: Add the action Parse JSON from the connector Data Operation. This step is to parse the response of the HTTP POST action to the ConverAPI endpoint. You can generate the scheme by copying from the Flow run history for the HTTP action output. The schema will be look like

{
    "type": "object",
    "properties": {
        "ConversionCost": {
            "type": "integer"
        },
        "Files": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "FileName": {
                        "type": "string"
                    },
                    "FileExt": {
                        "type": "string"
                    },
                    "FileSize": {
                        "type": "integer"
                    },
                    "FileData": {
                        "type": "string"
                    }
                },
                "required": [
                    "FileName",
                    "FileExt",
                    "FileSize",
                    "FileData"
                ]
            }
        }
    }
}

Step 6: Add the Compose action to convert the base64 data to binary to create the PDF from the HTTP request response. Select the filedata from the Output of the Parse JSON action which will automatically create a Apply to each since the Files is an array. Then add the following to the inputs of the of the compose action

base64toBinary(items(‘Apply_to_each’)?[‘FileData’]).

Now add the Create file action from the connector OneDrive for Business as shown below. The parameter File content should be output of the Compose action. PFB the screenshot of the flow actions

Now its time to test the flow, run the flow & check your OneDrive for the PDF file. PFB the screenshot of the PDF file with embedded image

Summary: I am not vouching to use the ConvertAPI service for converting the email to PDF. Just a sample for a use case where you get some knowledge on different actions usage & some information on the .eml file which Microsoft has used for storing email content. If its going to be heavily used or if the data is secure, then I advise you to create a REST API endpoint of your own hosted in Azure for the conversion. Hope you find this post useful & informational. Let me know if there is any comments or feedback by posting a comment below.