TTS interface

This document describes the use of the Readit TTS interface (Text-to-Speech).

For synthesis, there are two different ways to use the interface:

Regular synthesis
- Intended for producing shorter audio samples. Max 1000 characters.
- The audio data is returned directly in the response.
Batch synthesis
- Intended for producing longer audio samples.
- Batch synthesis is done with two different requests. The first request provides the text to be synthesized, which starts the synthesis in the background. Depending on the input length, this can take several minutes. The second request retrieves the audio data when it's ready.

Requests

Regular Synthesis

HTTP Request

POST https://api.aimater.com/tts/v1/synthesize

Request Body

Contains the input text to be synthesized in a JSON object.

{
  "input": {
    "text": string,
    "ssml": string
  },
  "inputConfig": {
    "phonemized": boolean
  },
  "voice": {
    "languageCode": string,
    "name": string
  },
  "audioConfig": {
    "audioEncoding": string,
    "speakingRate": number,
    "volumeGainDb": number
  },
  "responseConfig": {
    "responseType": string,
    "includePhonemizedText": boolean
  },
  "auth": {
    "key": string
  }
}

Request fields

input Input can be provided either as raw text or in SSML format.
- text: string
  - Raw text to be synthesized. Maximum 1000 characters.
- ssml: string
  - Input in SSML format. Maximum 1500 characters. See SSML.
inputConfig
- phonemized: boolean
  - Whether the input text is already in phonemized form. Default is false.
  - Note! Not supported for English language.
voice
- languageCode: string
  - Language code for the material to be synthesized. Options are fi, sv-fi and en. Default is fi.
- name: string
  - If you have a custom voice, you can switch to it by providing the voice name here.
audioConfig
- audioEncoding: string
  - Encoding of the synthesized audio. Options are WAV/LINEAR16, MP3 and RAW (WAV without header). Default is WAV.
- speakingRate: number
  - Speed multiplier for synthesized audio between [0.5, 2.0]. Default is 1.0.
- volumeGainDb: number
  - Volume gain in decibels for synthesized audio. Default is 0.0.
responseConfig
- responseType: string
  - Response type. Options are JSON and BINARY. Default is JSON.
- includePhonemizedText: boolean
  - Whether to return the phonemized form of the text along with the audio. Works only when responseType is JSON. Default is false.
  - Note! Not supported for English language.
auth
- key: string
  - API authentication key. If missing or incorrect, HTTP code 401 is returned.

Response Body

The requested response type affects the response body. When the response type is binary, the body is directly synthesized data.

If the type is JSON, the body looks like this:

{
  "audioContent": string,
  "phonemizedText": string // Only if `includePhonemizedText` parameter was used.
}

Response fields

audioContent: string
- Audio data base64 encoded.
phonemizedText: string
- Input text in phonemized form from which the audio was generated.

Error Situations

In error situations, the response body is always JSON, even if the requested type was binary. In error situations, the response body contains an error response:

{
  "error": string,
  "errorCode": string,
  "status": number
}

Error response fields

error: string
- Error message explaining what is not allowed in the given parameters.
errorCode: string
- Error code corresponding to the error message.
status: number
- HTTP status of the error response.

Possible Errors

`errorCode`	`status`	`error`
`empty-request-body`	`400`	`Invalid argument: empty request body.`
`invalid-audio-encoding`	`400`	`Invalid argument: AudioEncoding.`
`invalid-input`	`400`	`Invalid argument: only one of text or ssml allowed.`
`invalid-language`	`400`	`Invalid argument: invalid language given.`
`invalid-request-body`	`400`	`Invalid argument: invalid JSON.`
`invalid-response-type`	`400`	`Invalid argument: responseType must be json or binary.`
`invalid-speaking-rate`	`400`	`Invalid argument: speakingRate has to be between 0.5 and 2.0.`
`invalid-ssml`	`400`	`Invalid argument: ssml field maximum length is 1500 characters.`
`invalid-text`	`400`	`Invalid argument: text field maximum length is 1000 characters.`
`invalid-voice`	`400`	`Invalid argument: The given voice is not allowed.`
`invalid-volume-gain-db`	`400`	`Invalid argument: volumeGainDb must be between -100 and 100.`
`missing-input`	`400`	`Invalid argument: text or ssml must be specified.`
`invalid-api-key`	`401`	`Invalid API key.`
`insufficient-permissions`	`403`	`Insufficient permissions.`

Batch Synthesis Start

HTTP Request

POST https://api.aimater.com/tts/v1/batch/synthesize

Request Body

Contains the input text to be synthesized in a JSON object.

{
  "input": {
    "text": string,
    "ssml": string
  },
  "voice": {
    "languageCode": string,
    "name": string
  },
  "audioConfig": {
    "audioEncoding": string,
    "speakingRate": number,
    "volumeGainDb": number
  },
  "responseConfig": {
    "responseType": string
  },
  "auth": {
    "key": string
  }
}

Request fields

input Input can be provided either as raw text or in SSML format.
- text: string
  - Raw text to be synthesized.
- ssml: string
  - Input in SSML format. See SSML.
voice
- languageCode: string
  - Language code for the material to be synthesized. Options are fi, sv-fi and en. Default is fi.
- name: string
  - If you have a custom voice, you can switch to it by providing the voice name here.
audioConfig
- audioEncoding: string
  - Encoding of the synthesized audio. Options are WAV/LINEAR16, MP3 and RAW (WAV without header). Default is WAV.
- speakingRate: number
  - Speed multiplier for synthesized audio between [0.5, 2.0]. Default is 1.0.
- volumeGainDb: number
  - Volume gain in decibels for synthesized audio. Default is 0.0.
responseConfig
- responseType: string
  - Response type. Options are JSON and BINARY. Default is JSON.
auth
- key: string
  - API authentication key. If missing or incorrect, HTTP code 401 is returned.

Response Body

If the request is successful, the response body contains JSON data formatted as follows:

{
  "audioId": string
}

Response fields

audioId: string
- UUID identifier used to retrieve the audio data.

Error Situations

In error situations, the response body contains an error message:

{
  "error": string,
  "errorCode": string,
  "status": number
}

Error response fields

error: string
- Error message explaining what is not allowed in the given parameters.
errorCode: string
- Error code corresponding to the error message.
status: number
- HTTP status of the error response.

Possible Errors

`errorCode`	`status`	`error`
`empty-request-body`	`400`	`Invalid argument: empty request body.`
`invalid-audio-encoding`	`400`	`Invalid argument: AudioEncoding.`
`invalid-input`	`400`	`Invalid argument: only one of text or ssml allowed.`
`invalid-language`	`400`	`Invalid argument: invalid language given.`
`invalid-request-body`	`400`	`Invalid argument: invalid JSON.`
`invalid-response-type`	`400`	`Invalid argument: responseType must be json or binary.`
`invalid-speaking-rate`	`400`	`Invalid argument: speakingRate has to be between 0.5 and 2.0.`
`invalid-voice`	`400`	`Invalid argument: The given voice is not allowed.`
`invalid-volume-gain-db`	`400`	`Invalid argument: volumeGainDb must be between -100 and 100.`
`missing-input`	`400`	`Invalid argument: text or ssml must be specified.`
`invalid-api-key`	`401`	`Invalid API key.`
`insufficient-permissions`	`403`	`Insufficient permissions.`

Retrieving Completed Batch Synthesis Audio

HTTP Request

POST https://api.aimater.com/tts/v1/batch/fetch

Request Body

Contains the UUID identifier corresponding to the synthesized audio data in a JSON object.

{
  "input": {
    "audioId": string
  },
  "auth": {
    "key": string
  }
}

Request fields

input
- audioId: string
  - Corresponding UUID identifier received from the previous request.
auth
- key: string
  - API authentication key. If missing or incorrect, HTTP code 401 is returned.

Response Body

The requested response type affects the response body. When the response type is binary, the body is directly the synthesized data.

If the type is JSON, the body looks like this:

{
  "audioContent": string
}

Response fields

audioContent: string
- Audio data base64 encoded.

Error Situations

In error situations, the response body is always JSON, even if the requested type was binary. If synthesis is still in progress or has failed, the response body contains an error message:

{
  "error": string,
  "errorCode": string,
  "status": number,
  "details": string | null
}

Error response fields

error: string
- Error message. This changes depending on whether synthesis is still in progress or has failed.
errorCode: string
- Error code corresponding to the error message.
status: number
- HTTP status of the error response.
details: string | null
- May contain a more detailed error description if available.

The response status codes directly indicate the synthesis status:

Complete: 200
In Progress: 202
Failed: 400

Possible Errors

`errorCode`	`status`	`error`
`audio-not-ready`	`202`	`Audio generation for the given UUID is not ready yet.`
`audio-generation-failed`	`400`	`Audio generation for the given UUID has failed.`
`empty-request-body`	`400`	`Invalid argument: empty request body.`
`invalid-request-body`	`400`	`Invalid argument: invalid JSON.`
`invalid-uuid`	`400`	`Invalid UUID or the UUID is not allowed for the given API key.`
`invalid-api-key`	`401`	`Invalid API key.`
`insufficient-permissions`	`403`	`Insufficient permissions.`

Fetching Available Voices

You can fetch all available voices for your project by calling this endpoint with your API key.

HTTP Request

GET https://api.aimater.com/tts/v2/voices/<api_key>

Replace <api_key> with your API key.

Response Body

{
    "voices": [
        {
            "background": string, // Deprecated
            "color": string,
            "languageCode": string,
            "name": string
        },
        ...
    ]
}

Response fields

voices
- List of available voices.
- These objects can be used in the voice field of synthesis requests.
- The objects have the following fields:
  - background: string | null
    - Voice background as the CSS background attribute.
    - Not used actively anymore. May be removed in future versions.
  - color: string
    - Voice color.
    - Used in Readit's products to distinguish voices from each other.
  - languageCode: string
    - Voice language code.
  - name: string
    - Voice name.

Other

SSML input

SSML input must be enclosed within <speak> tags.

<speak>Test.</speak>

Supported SSML Tags

Break

<break>

Attributes: - time - Defines the pause duration either in milliseconds (ms) or seconds (s).

Example:

<speak>One second pause.<break time="1s" />Half second pause.<break time="500ms" /></speak>

Examples

Regular Synthesis

Input

{
  "input": {
    "text": "Tämä on testi."
  },
  "voice": {
    "languageCode": "fi"
  },
  "responseConfig": {
    "includePhonemizedText": true
  },
  "auth": {
    "key": "12345678-9abc-def1-2345-6789abcdef12"
  }
}

Output

{
  "audioContent": "UklGRlYHAQBXQVZFZm10IBAAAAABAAEAwF0AAIC7AAACABAAZGF0YTIHAQAAAAAAAAA...",
  "phonemizedText": "Tämä on testi."
}

Regular Synthesis with SSML input

Input

{
  "input": {
    "ssml": "<speak>Tämä on testi. <break time=\"1s\" /></speak>"
  },
  "voice": {
    "languageCode": "fi"
  },
  "auth": {
    "key": "12345678-9abc-def1-2345-6789abcdef12"
  }
}

Output

{
  "audioContent": "UklGRlYHAQBXQVZFZm10IBAAAAABAAEAwF0AAIC7AAACABAAZGF0YTIHAQAAAAAAAAA..."
}

Batch Synthesis Start

Input

{
  "input": {
    "text": "This is a long text that is more than 1000 characters..."
  },
  "voice": {
    "languageCode": "en"
  },
  "auth": {
    "key": "12345678-9abc-def1-2345-6789abcdef12"
  }
}

Output

{
  "audioId": "e8cb3912-543b-43a2-89c5-48b0867c54dd"
}

Retrieving Completed Batch Synthesis Audio

Input

{
  "input": {
    "audioId": "e8cb3912-543b-43a2-89c5-48b0867c54dd"
  },
  "auth": {
    "key": "12345678-9abc-def1-2345-6789abcdef12"
  }
}

Output

{
  "audioContent": "UklGRlYHAQBXQVZFZm10IBAAAAABAAEAwF0AAIC7AAACABAAZGF0YTIHAQAAAAAAAAA..."
}

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search