TTS interface
This document describes the use of the Readit TTS interface (Text-to-Speech).
For synthesis, there are two different ways to use the interface:
- Regular synthesis
- Intended for producing shorter audio samples. Max 1000 characters.
- The audio data is returned directly in the response.
- Batch synthesis
- Intended for producing longer audio samples.
- Batch synthesis is done with two different requests. The first request provides the text to be synthesized, which starts the synthesis in the background. Depending on the input length, this can take several minutes. The second request retrieves the audio data when it's ready.
Requests
Regular Synthesis
HTTP Request
POST https://api.aimater.com/tts/v1/synthesize
Request Body
Contains the input text to be synthesized in a JSON object.
{
"input": {
"text": string,
"ssml": string
},
"inputConfig": {
"phonemized": boolean
},
"voice": {
"languageCode": string,
"name": string
},
"audioConfig": {
"audioEncoding": string,
"speakingRate": number,
"volumeGainDb": number
},
"responseConfig": {
"responseType": string,
"includePhonemizedText": boolean
},
"auth": {
"key": string
}
}
Request fields
inputInput can be provided either as raw text or in SSML format.text: string- Raw text to be synthesized. Maximum 1000 characters.
ssml: string- Input in SSML format. Maximum 1500 characters. See SSML.
inputConfigphonemized: boolean- Whether the input text is already in phonemized form. Default is
false. - Note! Not supported for English language.
- Whether the input text is already in phonemized form. Default is
voicelanguageCode: string- Language code for the material to be synthesized. Options are
fi,sv-fianden. Default isfi.
- Language code for the material to be synthesized. Options are
name: string- If you have a custom voice, you can switch to it by providing the voice name here.
audioConfigaudioEncoding: string- Encoding of the synthesized audio. Options are
WAV/LINEAR16,MP3andRAW(WAV without header). Default isWAV.
- Encoding of the synthesized audio. Options are
speakingRate: number- Speed multiplier for synthesized audio between [
0.5,2.0]. Default is1.0.
- Speed multiplier for synthesized audio between [
volumeGainDb: number- Volume gain in decibels for synthesized audio. Default is
0.0.
- Volume gain in decibels for synthesized audio. Default is
responseConfigresponseType: string- Response type. Options are
JSONandBINARY. Default isJSON.
- Response type. Options are
includePhonemizedText: boolean- Whether to return the phonemized form of the text along with the audio. Works only when
responseTypeisJSON. Default isfalse. - Note! Not supported for English language.
- Whether to return the phonemized form of the text along with the audio. Works only when
authkey: string- API authentication key. If missing or incorrect, HTTP code
401is returned.
- API authentication key. If missing or incorrect, HTTP code
Response Body
The requested response type affects the response body. When the response type is binary, the body is directly synthesized data.
If the type is JSON, the body looks like this:
{
"audioContent": string,
"phonemizedText": string // Only if `includePhonemizedText` parameter was used.
}
Response fields
audioContent: string- Audio data base64 encoded.
phonemizedText: string- Input text in phonemized form from which the audio was generated.
Error Situations
In error situations, the response body is always JSON, even if the requested type was binary. In error situations, the response body contains an error response:
{
"error": string,
"errorCode": string,
"status": number
}
Error response fields
error: string- Error message explaining what is not allowed in the given parameters.
errorCode: string- Error code corresponding to the error message.
status: number- HTTP status of the error response.
Possible Errors
errorCode |
status |
error |
|---|---|---|
empty-request-body |
400 |
Invalid argument: empty request body. |
invalid-audio-encoding |
400 |
Invalid argument: AudioEncoding. |
invalid-input |
400 |
Invalid argument: only one of text or ssml allowed. |
invalid-language |
400 |
Invalid argument: invalid language given. |
invalid-request-body |
400 |
Invalid argument: invalid JSON. |
invalid-response-type |
400 |
Invalid argument: responseType must be json or binary. |
invalid-speaking-rate |
400 |
Invalid argument: speakingRate has to be between 0.5 and 2.0. |
invalid-ssml |
400 |
Invalid argument: ssml field maximum length is 1500 characters. |
invalid-text |
400 |
Invalid argument: text field maximum length is 1000 characters. |
invalid-voice |
400 |
Invalid argument: The given voice is not allowed. |
invalid-volume-gain-db |
400 |
Invalid argument: volumeGainDb must be between -100 and 100. |
missing-input |
400 |
Invalid argument: text or ssml must be specified. |
invalid-api-key |
401 |
Invalid API key. |
Batch Synthesis Start
HTTP Request
POST https://api.aimater.com/tts/v1/batch/synthesize
Request Body
Contains the input text to be synthesized in a JSON object.
{
"input": {
"text": string,
"ssml": string
},
"voice": {
"languageCode": string,
"name": string
},
"audioConfig": {
"audioEncoding": string,
"speakingRate": number,
"volumeGainDb": number
},
"responseConfig": {
"responseType": string
},
"auth": {
"key": string
}
}
Request fields
inputInput can be provided either as raw text or in SSML format.text: string- Raw text to be synthesized.
ssml: string- Input in SSML format. See SSML.
voicelanguageCode: string- Language code for the material to be synthesized. Options are
fi,sv-fianden. Default isfi.
- Language code for the material to be synthesized. Options are
name: string- If you have a custom voice, you can switch to it by providing the voice name here.
audioConfigaudioEncoding: string- Encoding of the synthesized audio. Options are
WAV/LINEAR16,MP3andRAW(WAV without header). Default isWAV.
- Encoding of the synthesized audio. Options are
speakingRate: number- Speed multiplier for synthesized audio between [
0.5,2.0]. Default is1.0.
- Speed multiplier for synthesized audio between [
volumeGainDb: number- Volume gain in decibels for synthesized audio. Default is
0.0.
- Volume gain in decibels for synthesized audio. Default is
responseConfigresponseType: string- Response type. Options are
JSONandBINARY. Default isJSON.
- Response type. Options are
authkey: string- API authentication key. If missing or incorrect, HTTP code
401is returned.
- API authentication key. If missing or incorrect, HTTP code
Response Body
If the request is successful, the response body contains JSON data formatted as follows:
{
"audioId": string
}
Response fields
audioId: string- UUID identifier used to retrieve the audio data.
Error Situations
In error situations, the response body contains an error message:
{
"error": string,
"errorCode": string,
"status": number
}
Error response fields
error: string- Error message explaining what is not allowed in the given parameters.
errorCode: string- Error code corresponding to the error message.
status: number- HTTP status of the error response.
Possible Errors
errorCode |
status |
error |
|---|---|---|
empty-request-body |
400 |
Invalid argument: empty request body. |
invalid-audio-encoding |
400 |
Invalid argument: AudioEncoding. |
invalid-input |
400 |
Invalid argument: only one of text or ssml allowed. |
invalid-language |
400 |
Invalid argument: invalid language given. |
invalid-request-body |
400 |
Invalid argument: invalid JSON. |
invalid-response-type |
400 |
Invalid argument: responseType must be json or binary. |
invalid-speaking-rate |
400 |
Invalid argument: speakingRate has to be between 0.5 and 2.0. |
invalid-voice |
400 |
Invalid argument: The given voice is not allowed. |
invalid-volume-gain-db |
400 |
Invalid argument: volumeGainDb must be between -100 and 100. |
missing-input |
400 |
Invalid argument: text or ssml must be specified. |
invalid-api-key |
401 |
Invalid API key. |
Retrieving Completed Batch Synthesis Audio
HTTP Request
POST https://api.aimater.com/tts/v1/batch/fetch
Request Body
Contains the UUID identifier corresponding to the synthesized audio data in a JSON object.
{
"input": {
"audioId": string
},
"auth": {
"key": string
}
}
Request fields
inputaudioId: string- Corresponding UUID identifier received from the previous request.
authkey: string- API authentication key. If missing or incorrect, HTTP code
401is returned.
- API authentication key. If missing or incorrect, HTTP code
Response Body
The requested response type affects the response body. When the response type is binary, the body is directly the synthesized data.
If the type is JSON, the body looks like this:
{
"audioContent": string
}
Response fields
audioContent: string- Audio data base64 encoded.
Error Situations
In error situations, the response body is always JSON, even if the requested type was binary. If synthesis is still in progress or has failed, the response body contains an error message:
{
"error": string,
"errorCode": string,
"status": number,
"details": string | null
}
Error response fields
error: string- Error message. This changes depending on whether synthesis is still in progress or has failed.
errorCode: string- Error code corresponding to the error message.
status: number- HTTP status of the error response.
details: string | null- May contain a more detailed error description if available.
The response status codes directly indicate the synthesis status:
- Complete:
200 - In Progress:
202 - Failed:
400
Possible Errors
errorCode |
status |
error |
|---|---|---|
audio-not-ready |
202 |
Audio generation for the given UUID is not ready yet. |
audio-generation-failed |
400 |
Audio generation for the given UUID has failed. |
empty-request-body |
400 |
Invalid argument: empty request body. |
invalid-request-body |
400 |
Invalid argument: invalid JSON. |
invalid-uuid |
400 |
Invalid UUID or the UUID is not allowed for the given API key. |
invalid-api-key |
401 |
Invalid API key. |
Fetching Available Voices
You can fetch all available voices for your project by calling this endpoint with your API key.
HTTP Request
GET https://api.aimater.com/tts/v2/voices/<api_key>
Replace <api_key> with your API key.
Response Body
{
"voices": [
{
"background": string, // Deprecated
"color": string,
"languageCode": string,
"name": string
},
...
]
}
Response fields
voices- List of available voices.
- These objects can be used in the
voicefield of synthesis requests. - The objects have the following fields:
background: string | null- Voice background as the CSS
backgroundattribute. - Not used actively anymore. May be removed in future versions.
- Voice background as the CSS
color: string- Voice color.
- Used in Readit's products to distinguish voices from each other.
languageCode: string- Voice language code.
name: string- Voice name.
Other
General Notes
- A sentence can never be longer than 1000 characters.
SSML input
SSML input must be enclosed within <speak> tags.
<speak>Test.</speak>
Supported SSML Tags
Break
<break>
Attributes:
- time
- Defines the pause duration either in milliseconds (ms) or seconds (s).
Example:
<speak>One second pause.<break time="1s" />Half second pause.<break time="500ms" /></speak>
Examples
Regular Synthesis
Input
{
"input": {
"text": "Tämä on testi."
},
"voice": {
"languageCode": "fi"
},
"responseConfig": {
"includePhonemizedText": true
},
"auth": {
"key": "12345678-9abc-def1-2345-6789abcdef12"
}
}
Output
{
"audioContent": "UklGRlYHAQBXQVZFZm10IBAAAAABAAEAwF0AAIC7AAACABAAZGF0YTIHAQAAAAAAAAA...",
"phonemizedText": "Tämä on testi."
}
Regular Synthesis with SSML input
Input
{
"input": {
"ssml": "<speak>Tämä on testi. <break time=\"1s\" /></speak>"
},
"voice": {
"languageCode": "fi"
},
"auth": {
"key": "12345678-9abc-def1-2345-6789abcdef12"
}
}
Output
{
"audioContent": "UklGRlYHAQBXQVZFZm10IBAAAAABAAEAwF0AAIC7AAACABAAZGF0YTIHAQAAAAAAAAA..."
}
Batch Synthesis Start
Input
{
"input": {
"text": "This is a long text that is more than 1000 characters..."
},
"voice": {
"languageCode": "en"
},
"auth": {
"key": "12345678-9abc-def1-2345-6789abcdef12"
}
}
Output
{
"audioId": "e8cb3912-543b-43a2-89c5-48b0867c54dd"
}
Retrieving Completed Batch Synthesis Audio
Input
{
"input": {
"audioId": "e8cb3912-543b-43a2-89c5-48b0867c54dd"
},
"auth": {
"key": "12345678-9abc-def1-2345-6789abcdef12"
}
}
Output
{
"audioContent": "UklGRlYHAQBXQVZFZm10IBAAAAABAAEAwF0AAIC7AAACABAAZGF0YTIHAQAAAAAAAAA..."
}