[Java & Android] Android SpeechRecognizer API 구현하기 (음성을 텍스트로 변환 - STT변환)

개요

안드로이드에서 제공하는 SpeechRecognizer API를 사용하여
인식된 음성을 텍스트로 변환하는 작업을 수행해보겠다.

1. Layout 구성

<activity.xml>

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".MainActivity"
    android:orientation="vertical"
    android:gravity="center">

    <TextView
        android:id="@+id/tv_result"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:text="결과"
        android:textSize="20dp"/>


    <Button
        android:id="@+id/btn_start"
        android:layout_width="150dp"
        android:layout_height="wrap_content"
        android:text="Start"/>

</LinearLayout>

결과를 출력하는 TextView와 음성인식을 실행하는 Button을 구현한다.

2. Manifest 파일에 권한 추가

<AndroidMenifest.xml>

<uses-permission android:name="android.permission.INTERNET"/>
<uses-permission android:name="android.permission.RECORD_AUDIO"/>

SpeechRecognizer API는 Google 서버를 이용하기 때문에 INTERNET 권한이 필요하다.
음성인식을 하기위해서 마이크 권한이 필요하다.

<AndroidMenifest.xml>

<queries>
    <intent>
        <action android:name="android.speech.RecognitionService" />
    </intent>
</queries>

API 30부터는 queries 속성을 추가하여야 사용할 수 있다.

3. 권한 요청

<MainActivity.java>

// 권한 요청
if(Build.VERSION.SDK_INT >= 23){
    ActivityCompat.requestPermissions(this, new String[] {Manifest.permission.INTERNET,
    Manifest.permission.RECORD_AUDIO}, 1);
}

권한을 요청한다.

4. Listener 생성

<MainActivity.java>

private RecognitionListener listener = new RecognitionListener() {
    @Override
    public void onReadyForSpeech(Bundle bundle) {
        // 준비
        Toast.makeText(MainActivity.this, "음성인식 시작", Toast.LENGTH_SHORT).show();
    }

    @Override
    public void onBeginningOfSpeech() {
        // 시작
    }

    @Override
    public void onRmsChanged(float v) {
        // 입력받는 소리의 크기
    }

    @Override
    public void onBufferReceived(byte[] bytes) {
        // 인식된 단어를 buffer에 담음
    }

    @Override
    public void onEndOfSpeech() {
        // 중지
    }

    @Override
    public void onError(int i) {
// 네트워크 또는 인식 오류가 발생했을 때 호출
        String message;

        switch (i) {
            case SpeechRecognizer.ERROR_AUDIO:
                message = "오디오 에러";
                break;
            case SpeechRecognizer.ERROR_CLIENT:
                message = "클라이언트 에러";
                break;
            case SpeechRecognizer.ERROR_INSUFFICIENT_PERMISSIONS:
                message = "퍼미션 없음";
                break;
            case SpeechRecognizer.ERROR_NETWORK:
                message = "네트워크 에러";
                break;
            case SpeechRecognizer.ERROR_NETWORK_TIMEOUT:
                message = "네트웍 타임아웃";
                break;
            case SpeechRecognizer.ERROR_NO_MATCH:
                message = "찾을 수 없음";
                break;
            case SpeechRecognizer.ERROR_RECOGNIZER_BUSY:
                message = "RECOGNIZER 가 바쁨";
                break;
            case SpeechRecognizer.ERROR_SERVER:
                message = "서버 에러";
                break;
            case SpeechRecognizer.ERROR_SPEECH_TIMEOUT:
                message = "시간초과";
                break;
            default:
                message = "알 수 없는 오류";
                break;
        }

        Toast.makeText(MainActivity.this, "Error : " + message,Toast.LENGTH_SHORT).show();
    }

    @Override
    public void onResults(Bundle results) {
        ArrayList<String> matches = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);

        for (int i=0; i<matches.size(); i++){
            tv_result.setText(matches.get(i));
        }

        Toast.makeText(MainActivity.this, matches.get(0), Toast.LENGTH_SHORT).show();
    }

    @Override
    public void onPartialResults(Bundle bundle) {
        // 부분 인식 결과
    }

    @Override
    public void onEvent(int i, Bundle bundle) {
        // 향후 이벤트 추가 예약
    }
};

onReadyForSpeech() : 음성인식을 준비한다.
onBeginningOfSpeech() : 음성인식을 시작한다.
onError() : 에러가 발생 했을 때 호출한다.
onResults() : 음성인식된 결과를 출력한다.

5. Recognizer Intent 생성

<MainActivity.java>

// RecognizerIntent 생성
Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
intent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, getPackageName());
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "ko-KR");

btn_start = findViewById(R.id.btn_start);
btn_start.setOnClickListener(new View.OnClickListener() {
    @Override
    public void onClick(View view) {
        SpeechRecognizer mRecognizer = SpeechRecognizer.createSpeechRecognizer(MainActivity.this);
        mRecognizer.setRecognitionListener(listener);
        mRecognizer.startListening(intent);
    }
});

intent 생성 후 버튼 클릭 시 음성인식을 실행한다.

6. 결과

7. 전체 코드

<activity.xml>

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".MainActivity"
    android:orientation="vertical"
    android:gravity="center">

    <TextView
        android:id="@+id/tv_result"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:text="결과"
        android:textSize="20dp"/>


    <Button
        android:id="@+id/btn_start"
        android:layout_width="150dp"
        android:layout_height="wrap_content"
        android:text="Start"/>

</LinearLayout>

<AndroidMenifest.xml>

<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
    package="com.example.speechrecognizerexample">

    <uses-permission android:name="android.permission.INTERNET"/>
    <uses-permission android:name="android.permission.RECORD_AUDIO"/>

    <queries>
        <intent>
            <action android:name="android.speech.RecognitionService" />
        </intent>
    </queries>

    <application
        android:allowBackup="true"
        android:icon="@mipmap/ic_launcher"
        android:label="@string/app_name"
        android:roundIcon="@mipmap/ic_launcher_round"
        android:supportsRtl="true"
        android:theme="@style/Theme.SpeechRecognizerExample">
        <activity android:name=".MainActivity"
            android:exported="true">
            <intent-filter>
                <action android:name="android.intent.action.MAIN" />

                <category android:name="android.intent.category.LAUNCHER" />
            </intent-filter>
        </activity>
    </application>

</manifest>

<MainActivity.java>

package com.example.speechrecognizerexample;

import androidx.appcompat.app.AppCompatActivity;
import androidx.core.app.ActivityCompat;

import android.Manifest;
import android.content.Intent;
import android.os.Build;
import android.os.Bundle;
import android.speech.RecognitionListener;
import android.speech.RecognizerIntent;
import android.speech.SpeechRecognizer;
import android.view.View;
import android.widget.Button;
import android.widget.TextView;
import android.widget.Toast;

import java.util.ArrayList;

public class MainActivity extends AppCompatActivity {

    //Button
    Button btn_start;

    //TextView
    TextView tv_result;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        // 권한 요청
        if(Build.VERSION.SDK_INT >= 23){
            ActivityCompat.requestPermissions(this, new String[] {Manifest.permission.INTERNET,
            Manifest.permission.RECORD_AUDIO}, 1);
        }

        // RecognizerIntent 생성
        Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
        intent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, getPackageName());
        intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "ko-KR");

        tv_result = findViewById(R.id.tv_result);

        btn_start = findViewById(R.id.btn_start);
        btn_start.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View view) {
                SpeechRecognizer mRecognizer = SpeechRecognizer.createSpeechRecognizer(MainActivity.this);
                mRecognizer.setRecognitionListener(listener);
                mRecognizer.startListening(intent);
            }
        });
    }

    private RecognitionListener listener = new RecognitionListener() {
        @Override
        public void onReadyForSpeech(Bundle bundle) {
            // 준비
            Toast.makeText(MainActivity.this, "음성인식 시작", Toast.LENGTH_SHORT).show();
        }

        @Override
        public void onBeginningOfSpeech() {
            // 시작
        }

        @Override
        public void onRmsChanged(float v) {
            // 입력받는 소리의 크기
        }

        @Override
        public void onBufferReceived(byte[] bytes) {
            // 인식된 단어를 buffer에 담음
        }

        @Override
        public void onEndOfSpeech() {
            // 중지
        }

        @Override
        public void onError(int i) {
// 네트워크 또는 인식 오류가 발생했을 때 호출
            String message;

            switch (i) {
                case SpeechRecognizer.ERROR_AUDIO:
                    message = "오디오 에러";
                    break;
                case SpeechRecognizer.ERROR_CLIENT:
                    message = "클라이언트 에러";
                    break;
                case SpeechRecognizer.ERROR_INSUFFICIENT_PERMISSIONS:
                    message = "퍼미션 없음";
                    break;
                case SpeechRecognizer.ERROR_NETWORK:
                    message = "네트워크 에러";
                    break;
                case SpeechRecognizer.ERROR_NETWORK_TIMEOUT:
                    message = "네트웍 타임아웃";
                    break;
                case SpeechRecognizer.ERROR_NO_MATCH:
                    message = "찾을 수 없음";
                    break;
                case SpeechRecognizer.ERROR_RECOGNIZER_BUSY:
                    message = "RECOGNIZER 가 바쁨";
                    break;
                case SpeechRecognizer.ERROR_SERVER:
                    message = "서버 에러";
                    break;
                case SpeechRecognizer.ERROR_SPEECH_TIMEOUT:
                    message = "시간초과";
                    break;
                default:
                    message = "알 수 없는 오류";
                    break;
            }

            Toast.makeText(MainActivity.this, "Error : " + message,Toast.LENGTH_SHORT).show();
        }

        @Override
        public void onResults(Bundle results) {
            ArrayList<String> matches = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);

            for (int i=0; i<matches.size(); i++){
                tv_result.setText(matches.get(i));
            }

            Toast.makeText(MainActivity.this, matches.get(0), Toast.LENGTH_SHORT).show();
        }

        @Override
        public void onPartialResults(Bundle bundle) {
            // 부분 인식 결과
        }

        @Override
        public void onEvent(int i, Bundle bundle) {
            // 향후 이벤트 추가 예약
        }
    };
}

마무리

SpeechRecognizer API를 사용하여 인식된 음성을 텍스트로 변환해보았다.
다음 포스팅에는 버튼을 누를 때마다 음성인식이 되는 방법이 아닌,
자동으로 계속 실행되는 지속적인 음성인식을 구현하는 방법에 대해서 알아보겠다.

Github

GitHub - jaemin-Yoo/continuous-voice-recognition: Implementation of continuous speech recognition using SpeechRecognizer api

Implementation of continuous speech recognition using SpeechRecognizer api - GitHub - jaemin-Yoo/continuous-voice-recognition: Implementation of continuous speech recognition using SpeechRecognizer...

github.com

'개발 > Java & Android' 카테고리의 다른 글

[Java & Android] 안드로이드 소켓 통신 오디오 파일 전송 (0)	2021.11.02
[Java & Android] 안드로이드 통화 녹음 파일 가져와 재생시키기 (0)	2021.11.01
[Java & Android] Android 지속적인 음성인식 기능 구현 (SpeechRecognizer) (0)	2021.10.28
[Java & Android] 포그라운드 서비스 (Foreground Service) (0)	2021.10.26

개요

1. Layout 구성

2. Manifest 파일에 권한 추가

3. 권한 요청

4. Listener 생성

5. Recognizer Intent 생성

6. 결과

7. 전체 코드

마무리

'개발 > Java & Android' 카테고리의 다른 글

티스토리툴바