skip to Main Content

I am making calls in php to the rest api of google text to speech.

I make the call in this way:

    $url = "https://texttospeech.googleapis.com/v1/text:synthesize?key=".$key;

    $data = array(
                    'input' => array(
                        'ssml' => $testo
                    ),
                    'voice' => array(
                        "languageCode" => "en-us",
                        "name" => "en-US-Wavenet-I",
                        "ssmlGender" => "MALE"
                    ),
                    'audioConfig' => array(
                        'audioEncoding' => "LINEAR16",
                        "effectsProfileId" =>  [
                          "small-bluetooth-speaker-class-device"
                        ],
                        "speakingRate" => 1,
                        "pitch" => 4,
                     )
              );

    $options = array(
        'http' => array(
                    'header' => "Content-Type: application/jsonrn",
                    'method' => 'POST',
                    'content' => json_encode($data)
                )
            );

            $context = stream_context_create($options);
            $response = file_get_contents($url, false, $context);
          
            if ($response === false) {
                
            } else {
            
                $response_data = json_decode($response, true);
                ......
            }

The response seems to disregard the pitch value. I have tried both passing it as an integer and as a string but the result does not change.
What am I doing wrong?

Thank you.

2

Answers


  1. Chosen as BEST ANSWER

    Should anyone have the same problem; I have found a possible solution:

    I had $testo = '<speak> text I need audio </speak>';

    I added an internal prosody tag to speak:

    $testo = '<speak><prosody pitch="'.$pitchValue.'st"> text I need audio </prosody ></speak>';
    

    I obtain the desired result.


  2. If you use the pitch with Google SSML. You can use the values from x-low/low/medium/high/x-high. This example is working

    <speak>
    <voice  name="en-US-Wavenet-E">
    <prosody  pitch="high">
    Hello world
    </prosody>
    </voice>  
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search