Speech service quotas and limits - Azure Cognitive Services (2023)

  • Article
  • 10 minutes to read

This article contains a quick reference and a detailed description of the quotas and limits for the Speech service in Azure Cognitive Services. The information applies to all pricing tiers of the service. It also contains some best practices to avoid request throttling.

Quotas and limits reference

The following sections provide you with a quick guide to the quotas and limits that apply to Speech service.

Speech-to-text quotas and limits per resource

In the following tables, the parameters without the Adjustable row aren't adjustable for all price tiers.

Online transcription

You can use online transcription with the Speech SDK or the speech-to-text REST API for short audio.

QuotaFree (F0)1Standard (S0)
Concurrent request limit - base model endpoint1100 (default value)
AdjustableNo2Yes2
Concurrent request limit - custom endpoint1100 (default value)
AdjustableNo2Yes2

Batch transcription

QuotaFree (F0)1Standard (S0)
Speech-to-text REST API limitNot available for F0300 requests per minute
Max audio input file sizeN/A1 GB
Max input blob size (for example, can contain more than one file in a zip archive). Note the file size limit from the preceding row.N/A2.5 GB
Max blob container sizeN/A5 GB
Max number of blobs per containerN/A10000
Max number of files per transcription request (when you're using multiple content URLs as input).N/A1000

Model customization

QuotaFree (F0)1Standard (S0)
REST API limit300 requests per minute300 requests per minute
Max number of speech datasets2500
Max acoustic dataset file size for data import2 GB2 GB
Max language dataset file size for data import200 MB1.5 GB
Max pronunciation dataset file size for data import1 KB1 MB
Max text size when you're using the text parameter in the Models_Create API request200 KB500 KB

1 For the free (F0) pricing tier, see also the monthly allowances at the pricing page.
2 See additional explanations, best practices, and adjustment instructions.

Text-to-speech quotas and limits per Speech resource

In the following tables, the parameters without the Adjustable row aren't adjustable for all price tiers.

(Video) Create the best consumption plan for Azure Speech Service

General

QuotaFree (F0)3Standard (S0)
Max number of transactions per certain time period
Real-time API. Prebuilt neural voices and custom neural voices.20 transactions per 60 seconds200 transactions per second (TPS) (default value)
AdjustableNo4Yes5, up to 1000 TPS
HTTP-specific quotas
Max audio length produced per request10 min10 min
Max total number of distinct <voice> and <audio> tags in SSML5050
Websocket specific quotas
Max audio length produced per turn10 min10 min
Max total number of distinct <voice> and <audio> tags in SSML5050
Max SSML message size per turn64 KB64 KB

Custom Neural Voice

QuotaFree (F0)3Standard (S0)
Max number of transactions per second (TPS)Not available for F0See General
Max number of datasetsN/A500
Max number of simultaneous dataset uploadsN/A5
Max data file size for data import per datasetN/A2 GB
Upload of long audios or audios without scriptN/AYes
Max number of simultaneous model trainingsN/A3
Max number of custom endpointsN/A50

Audio Content Creation tool

QuotaFree (F0)Standard (S0)
File size3,000 characters per file20,000 characters per file
Export to audio library1 concurrent taskN/A

3 For the free (F0) pricing tier, see also the monthly allowances at the pricing page.
4 See additional explanations and best practices.
5 See additional explanations, best practices, and adjustment instructions.

Detailed description, quota adjustment, and best practices

Before requesting a quota increase (where applicable), ensure that it's necessary. Speech service uses autoscaling technologies to bring the required computational resources in on-demand mode. At the same time, Speech service tries to keep your costs low by not maintaining an excessive amount of hardware capacity.

Let's look at an example. Suppose that your application receives response code 429, which indicates that there are too many requests. Your application receives this response even though your workload is within the limits defined by the Quotas and limits reference. The most likely explanation is that Speech service is scaling up to your demand and didn't reach the required scale yet. Therefore the service doesn't immediately have enough resources to serve the request. In most cases, this throttled state is transient.

General best practices to mitigate throttling during autoscaling

To minimize issues related to throttling, it's a good idea to use the following techniques:

  • Implement retry logic in your application.
  • Avoid sharp changes in the workload. Increase the workload gradually. For example, let's say your application is using text-to-speech, and your current workload is 5 TPS. The next second, you increase the load to 20 TPS (that is, four times more). Speech service immediately starts scaling up to fulfill the new load, but is unable to scale as needed within one second. Some of the requests will get response code 429 (too many requests).
  • Test different load increase patterns. For more information, see the workload pattern example.
  • Create additional Speech service resources in different regions, and distribute the workload among them. (Creating multiple Speech service resources in the same region will not affect the performance, because all resources will be served by the same backend cluster).

The next sections describe specific cases of adjusting quotas.

Speech-to-text: increase online transcription concurrent request limit

By default, the number of concurrent requests is limited to 100 per resource in the base model, and 100 per custom endpoint in the custom model. For the standard pricing tier, you can increase this amount. Before submitting the request, ensure that you're familiar with the material discussed earlier in this article, such as the best practices to mitigate throttling.

Note

(Video) Google Cloud Text-to-Speech AI API in Python - Getting Started (Part 1)

If you use custom models, be aware that one Speech service resource might be associated with many custom endpoints hosting many custom model deployments. Each custom endpoint has the default limit of concurrent requests (100) set by creation. If you need to adjust it, you need to make the adjustment of each custom endpoint separately. Note also that the value of the limit of concurrent requests for the base model of a resource has no effect to the custom endpoints associated with this resource.

Increasing the limit of concurrent requests doesn't directly affect your costs. Speech service uses a payment model that requires that you pay only for what you use. The limit defines how high the service can scale before it starts throttle your requests.

Concurrent request limits for base and custom models need to be adjusted separately.

You aren't able to see the existing value of the concurrent request limit parameter in the Azure portal, the command-line tools, or API requests. To verify the existing value, create an Azure support request.

Note

Speech containers don't require increases of the concurrent request limit, because containers are constrained only by the CPUs of the hardware they are hosted on. Speech containers do, however, have their own capacity limitations that should be taken into account. For more information, see the Speech containers FAQ.

Have the required information ready

  • For the base model:
    • Speech resource ID
    • Region
  • For the custom model:
    • Region
    • Custom endpoint ID

How to get information for the base model:

(Video) Azure AI Fundamentals Certification (AI-900) - Full Course to PASS the Exam

  1. Go to the Azure portal.
  2. Select the Speech service resource for which you would like to increase the concurrency request limit.
  3. From the Resource Management group, select Properties.
  4. Copy and save the values of the following fields:
    • Resource ID
    • Location (your endpoint region)

How to get information for the custom model:

  1. Go to the Speech Studio portal.
  2. Sign in if necessary, and go to Custom Speech.
  3. Select your project, and go to Deployment.
  4. Select the required endpoint.
  5. Copy and save the values of the following fields:
    • Service Region (your endpoint region)
    • Endpoint ID

Create and submit a support request

Initiate the increase of the limit for concurrent requests for your resource, or if necessary check the current limit, by submitting a support request. Here's how:

  1. Ensure you have the required information listed in the previous section.
  2. Go to the Azure portal.
  3. Select the Speech service resource for which you would like to increase (or to check) the concurrency request limit.
  4. In the Support + troubleshooting group, select New support request. A new window will appear, with auto-populated information about your Azure subscription and Azure resource.
  5. In Summary, describe what you want (for example, "Increase speech-to-text concurrency request limit").
  6. In Problem type, select Quota or Subscription issues.
  7. In Problem subtype, select either:
    • Quota or concurrent requests increase for an increase request.
    • Quota or usage validation to check the existing limit.
  8. Select Next: Solutions. Proceed further with the request creation.
  9. On the Details tab, in the Description field, enter the following:
    • A note that the request is about the speech-to-text quota.
    • Choose either the base or custom model.
    • The Azure resource information you collected previously.
    • Any other required information.
  10. On the Review + create tab, select Create.
  11. Note the support request number in Azure portal notifications. You'll be contacted shortly about your request.

Example of a workload pattern best practice

Here's a general example of a good approach to take. It's meant only as a template that you can adjust as necessary for your own use.

Suppose that a Speech service resource has the concurrent request limit set to 300. Start the workload from 20 concurrent connections, and increase the load by 20 concurrent connections every 90-120 seconds. Control the service responses, and implement the logic that falls back (reduces the load) if you get too many requests (response code 429). Then, retry the load increase in one minute, and if it still doesn't work, try again in two minutes. Use a pattern of 1-2-4-4 minutes for the intervals.

Generally, it's a very good idea to test the workload and the workload patterns before going to production.

Text-to-speech: increase concurrent request limit

For the standard pricing tier, you can increase this amount. Before submitting the request, ensure that you're familiar with the material discussed earlier in this article, such as the best practices to mitigate throttling.

Increasing the limit of concurrent requests doesn't directly affect your costs. Speech service uses a payment model that requires that you pay only for what you use. The limit defines how high the service can scale before it starts throttle your requests.

You aren't able to see the existing value of the concurrent request limit parameter in the Azure portal, the command-line tools, or API requests. To verify the existing value, create an Azure support request.

(Video) Is Azure Free Account really free? | Azure Tips and Tricks

Note

Speech containers don't require increases of the concurrent request limit, because containers are constrained only by the CPUs of the hardware they are hosted on.

Prepare the required information

To create an increase request, you need to provide your information.

  • For the prebuilt voice:
    • Speech resource ID
    • Region
  • For the custom voice:
    • Deployment region
    • Custom endpoint ID

How to get information for the prebuilt voice:

  1. Go to the Azure portal.
  2. Select the Speech service resource for which you would like to increase the concurrency request limit.
  3. From the Resource Management group, select Properties.
  4. Copy and save the values of the following fields:
    • Resource ID
    • Location (your endpoint region)

How to get information for the custom voice:

  1. Go to the Speech Studio portal.
  2. Sign in if necessary, and go to Custom Voice.
  3. Select your project, and go to Deploy model.
  4. Select the required endpoint.
  5. Copy and save the values of the following fields:
    • Service Region (your endpoint region)
    • Endpoint ID

Create and submit a support request

Initiate the increase of the limit for concurrent requests for your resource, or if necessary check the current limit, by submitting a support request. Here's how:

  1. Ensure you have the required information listed in the previous section.
  2. Go to the Azure portal.
  3. Select the Speech service resource for which you would like to increase (or to check) the concurrency request limit.
  4. In the Support + troubleshooting group, select New support request. A new window will appear, with auto-populated information about your Azure subscription and Azure resource.
  5. In Summary, describe what you want (for example, "Increase text-to-speech concurrency request limit").
  6. In Problem type, select Quota or Subscription issues.
  7. In Problem subtype, select either:
    • Quota or concurrent requests increase for an increase request.
    • Quota or usage validation to check the existing limit.
  8. On the Recommended solution tab, select Next.
  9. On the Additional details tab, fill in all the required items. And in the Details field, enter the following:
    • A note that the request is about the text-to-speech quota.
    • Choose either the prebuilt voice or custom voice.
    • The Azure resource information you collected previously.
    • Any other required information.
  10. On the Review + create tab, select Create.
  11. Note the support request number in Azure portal notifications. You'll be contacted shortly about your request.

Videos

1. AZ-900 Azure Fundamentals Study Cram - 2022 Edition! - OVER 500,000 VIEWS!
(John Savill's Technical Training)
2. AWS Certified Solutions Architect Associate 2022 (Full Free AWS course!) Part 5
(Go Cloud Architects)
3. AWS Certified Solutions Architect Associate 2023 | Learn AWS Free | AWS Full Crash Course
(Go Cloud Architects)
4. Create personalized experiences using the Azure Personalizer service - VACD 2020
(Code Stories by Georgia Kalyva)
5. A.I. voice generator with emotion for FREE!
(Master Editor)
6. Darshna Shah - Sustainable Finance and Sentiment Analysis with Azure Cognitive Services
(TECHKNOW)
Top Articles
Latest Posts
Article information

Author: Tyson Zemlak

Last Updated: 12/08/2022

Views: 6520

Rating: 4.2 / 5 (43 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Tyson Zemlak

Birthday: 1992-03-17

Address: Apt. 662 96191 Quigley Dam, Kubview, MA 42013

Phone: +441678032891

Job: Community-Services Orchestrator

Hobby: Coffee roasting, Calligraphy, Metalworking, Fashion, Vehicle restoration, Shopping, Photography

Introduction: My name is Tyson Zemlak, I am a excited, light, sparkling, super, open, fair, magnificent person who loves writing and wants to share my knowledge and understanding with you.