[Python] Google Cloud Text-To-Speech 사용하기

Study/Python

[Python] Google Cloud Text-To-Speech 사용하기

SKJun 2023. 8. 10. 12:22

Google Cloud Text-To-Speech를 사용해보자!

월간 무료 제공량이 많아서 개인적인 목적으로 사용하기에 참 괜찮은 것 같습니다.

Text-to-Speech 가격은 서비스로 전송되어 오디오로 합성되는 문자 수(영문 기준)를 기준으로 매월 책정됩니다. 매월 WaveNet 음성의 100만 자가 무료로 제공됩니다. 표준(WaveNet 이외) 음성의 경우 매월 400만 자가 무료로 제공됩니다. Text-to-Speech는 무료 등급 할당량에 도달한 후부터 처리되는 텍스트에 대해 100만 자 단위로 가격이 책정됩니다.

Google Cloud Text-To-Speech API KEY를 발급받았다는 전제하에 진행합니다!

1. python 패키지 설치

pip install google-cloud-texttospeech sounddevice numpy

2. 사용 가능한 한국어 목소리 확인

import google.cloud.texttospeech as tts

API_KEY_STRING="발급받은 API KEY 입력"
PROJECT_ID="프로젝트ID 입력"

def list_voices(language_code=None):
    client = tts.TextToSpeechClient(client_options={"api_key": API_KEY_STRING,"quota_project_id": PROJECT_ID})
    response = client.list_voices(language_code=language_code)
    voices = sorted(response.voices, key=lambda voice: voice.name)

    print(f" Voices: {len(voices)} ".center(60, "-"))
    for voice in voices:
        languages = ", ".join(voice.language_codes)
        name = voice.name
        gender = tts.SsmlVoiceGender(voice.ssml_gender).name
        rate = voice.natural_sample_rate_hertz
        print(f"{languages:<8} | {name:<24} | {gender:<8} | {rate:,} Hz")
        
        
list_voices("ko")

list_voices("ko")를 사용하면 사용 가능한 한국어 목소리 리스트가 아래처럼 나옵니다!

------------------------ Voices: 15 ------------------------
ko-KR | ko-KR-Neural2-A | FEMALE | 24,000 Hz
ko-KR | ko-KR-Neural2-B | FEMALE | 24,000 Hz
ko-KR | ko-KR-Neural2-C | MALE | 24,000 Hz
ko-KR | ko-KR-Standard-A | FEMALE | 24,000 Hz
ko-KR | ko-KR-Standard-A | FEMALE | 24,000 Hz
ko-KR | ko-KR-Standard-B | FEMALE | 24,000 Hz
ko-KR | ko-KR-Standard-B | FEMALE | 24,000 Hz
ko-KR | ko-KR-Standard-C | MALE | 24,000 Hz
ko-KR | ko-KR-Standard-C | MALE | 24,000 Hz
ko-KR | ko-KR-Standard-D | MALE | 24,000 Hz
ko-KR | ko-KR-Standard-D | MALE | 24,000 Hz
ko-KR | ko-KR-Wavenet-A | FEMALE | 24,000 Hz
ko-KR | ko-KR-Wavenet-B | FEMALE | 24,000 Hz
ko-KR | ko-KR-Wavenet-C | MALE | 24,000 Hz
ko-KR | ko-KR-Wavenet-D | MALE | 24,000 Hz

3. TTS 변환 및 음성 재생

import google.cloud.texttospeech as tts
import sounddevice as sd
import numpy as np

def text_to_speech_with_api_key(voice_name, text):
    try:
    	# Client 생성
        client = tts.TextToSpeechClient(client_options={"api_key": API_KEY_STRING,"quota_project_id": PROJECT_ID})
		
        # Voice 파라미터 적용
        language_code = "-".join(voice_name.split("-")[:2])
        text_input = tts.SynthesisInput(text=text)
        voice_params = tts.VoiceSelectionParams(
            language_code=language_code, name=voice_name
        )
        audio_config = tts.AudioConfig(audio_encoding=tts.AudioEncoding.LINEAR16)
		
        # TTS 생성
        response = client.synthesize_speech(
            input=text_input,
            voice=voice_params,
            audio_config=audio_config,
        )
        audio_content = response.audio_content

        # 오디오 재생
        audio_array = np.frombuffer(audio_content, dtype=np.int16)
        sd.play(audio_array, samplerate=24000)
        sd.wait()

    except Exception as e:
        print("Google TTS Error: ", e)
        
        
text_to_speech_with_api_key("ko-KR-Standard-A","안녕하세요! 구글 티티에스 입니다!")

위에 코드를 실행하면 한국어로 "안녕하세요! 구글 티티에스 입니다!" 가 재생됩니다.

728x90