ラズパイ用のRespeaker 2-Mics Pi HATというサウンドカードを買いました。
↓ReSpeaker 拡張ボード サウンドカード Raspberry pi 0/3B/3B+適用 二つのマイクつき2-Mics HAT (2-mic)
このサウンドカードとRaspberry Pi (ラズパイ) があると、AlexaやGoogle Assistantのようなスマートスピーカーを自作できるらしい。
↓Raspberry Pi Zero WH GPIOピンヘッダー ハンダ付け済み Wi-Fi & Bluetooth&ヒートシンク付
目次
Respeaker 2-Mics Pi HAT とは
Respeaker 2-Mics Pi HAT は、AIやボイスアプリケーション用に開発されたRaspberry Pi用のデュアルマイク拡張ボードです。
Amazon Alexa Voice Service、Google AssistantなどのAIスピーカー用のサービスを利用することができます。
本体には日本語の解説はありません。
こんな感じで、Raspberry Pi ZEROに直接接続して使えます。
特徴は次の通りです。
- Raspberry Piで利用可能(Zero、1 B+、2 B、3 B)
- デュアルマイク
- 二つのGROVEインタフェース : GPIOとI2Cが利用可能
- 一つのプログラム可能なユーザーボタンおよび三つのLED
- オーディオコーデック搭載
- 二種類の音声出力ソケット : 3.5 mmオーディオジャック、JST2.0 スピーカー出力
- 3 mまでの集音距離
ReSpeakerの公式サイト:https://respeaker.io/
・https://respeaker.io/2_mic_array/
- 詳細:1.Googleアシスタント:https://youtu.be/Z3gIbxnDCtIから始めましょう。 2.ボタンを使用してGoogleアシスタントを起動する方法:https://youtu.be/bZuMimwXtII。
- ReSpeaker 2-Mics Pi HATは、AIおよび音声アプリケーション用に設計されたRaspberry Pi用のデュアルマイク拡張ボードです。つまりAmazonアレクサ音声サービス、Googleアシスタントなどを統合した、より強力で柔軟な音声製品を構築できます。このボードは、低消費電力のステレオコーデックであるWM8960に基づいて開発されています。
- 特徴:Raspberry Pi互換(Pi zero、Pi 1B +、Pi 2B、Pi 3 B)。デュアルマイク;2つのグローブインターフェイス:GPIOとI2Cをサポートします;プログラム可能なボタンとLED:1つのボタンと3つのLED;オンボードのオーディオコーデック。オーディオ出力ソケットの2種類:3.5mmオーディオジャック、JST2.0スピーカー出力;遠方支援(最大3メートル)。寸法:65mm x 30mm x 15mm。
- ハードウェア概要:ボタン:GPIO17に接続されたユーザ定義のボタン。MIC_L MIC_R:左側の左側にマイクがあります。RGB LED:3つのAPA102 RGB LED、SPIインターフェースからRaspberry Pi。WM8960:低電力ステレオコーデック。Raspberry Pi 40針:RPI0、Pi 1B +、Pi2BおよびPi3Bを支持する。POWER:ReSpeaker 2-Mics Pi HATの電源に使用するマイクロUSBポート。十分な電流を供給するために、スピーカーを使用しながらボードに電力を供給します。
- I2C:I2C-1に接続されたグローブI2Cポート。GPIO12:GPIO12およびGPIO13に接続されたGroveデジタルポート。JST 2.0スピーカー出力端子:JST 2.0コネクター。 3.5mmオーディオジャック:ヘッドホンやスピーカーを接続する場合。注:この製品にはPiゼロボードとバッテリーは含まれていません。
Respeaker Pi HATドライバーのインストール、設定
Respeaker 2-Mics Pi HATのドライバーのインストール手順は、次の通りです。
1 2 3 4 5 6 |
$ sudo apt-get update $ sudo apt-get upgrade $ git clone https://github.com/respeaker/seeed-voicecard.git $ cd seeed-voicecard $ sudo ./install.sh $ sudo reboot |
リブート後に、以下のコマンドで確認します。
1 2 3 4 5 |
$ aplay -l **** ハードウェアデバイス PLAYBACK のリスト **** カード 0: seeed2micvoicec [seeed-2mic-voicecard], デバイス 0: bcm2835-i2s-wm8960-hifi wm8960-hifi-0 [] サブデバイス: 1/1 サブデバイス #0: subdevice #0 |
以下のコマンドで、マイクからの入力がそのまま、イヤホンジャックから出力されます。
1 2 3 |
$ arecord -f cd -Dhw:0 | aplay -Dhw:0 録音中 WAVE 'stdin' : Signed 16 bit Little Endian, レート 44100 Hz, ステレオ 再生中 WAVE 'stdin' : Signed 16 bit Little Endian, レート 44100 Hz, ステレオ |
Juliusのインストールと設定
Juliusとは、日本発の音声認識システムを開発・研究するためのオープンソースソフトウェアです。
↓Juliusの公式サイト
最新ソースは、GitHubで公開されています。
↓Julius: Open-Source Large Vocabulary Continuous Speech Recognition Engine
・https://github.com/julius-speech/julius
ライブラリのインストール
まず、ライブラリをインストールします。
1 |
$ sudo apt install -y libasound2-dev libesd0-dev libsndfile1-dev |
Juliusのダウンロードとビルド
Juliusをダウンロードしてビルドします。
1 2 3 4 5 |
$ git clone https://github.com/julius-speech/julius.git $ cd julius $ ./configure --with-mictype=alsa $ make $ sudo make install |
音声認識パッケージと記述文法音声実行キットのダウンロードと展開
以下のパッケージをダウンロードして展開します。
Julius音声認識パッケージ
Julius を動かしてみるために必要なキットです。 日本語のディクテーション(自動口述筆記)に必要な最小限のモデルおよび Julius の実行バイナリが含まれていますので,これだけでとりあえず Julius を動かしてみることができます。
・https://osdn.net/dl/julius/dictation-kit-4.5.zip
Julius記述文法音声認識実行キット
Julius で記述文法音声認識を動かしてみるためのキットです。日本語の不特定話者音響モデル,およびサンプル文法が含まれています。
・https://github.com/julius-speech/grammar-kit/archive/v4.3.1.zip
1 2 3 4 5 6 7 |
$ cd ~ $ wget https://osdn.net/dl/julius/dictation-kit-4.5.zip $ wget https://github.com/julius-speech/grammar-kit/archive/v4.3.1.zip $ mkdir -p ~/lib/julius $ cd ~/lib/julius $ unzip ~/dictation-kit-4.5.zip $ unzip ~/v4.3.1.zip |
Juliusをコマンドから実行
Juliusをコマンドで実行してみます。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 |
$ julius -C ~/lib/julius/dictation-kit-4.5/main.jconf -C ~/lib/julius/dictation-kit-4.5/am-gmm.jconf -nostrip -demo -input mic STAT: include config: /home/pi/lib/julius/dictation-kit-4.5/main.jconf STAT: include config: /home/pi/lib/julius/dictation-kit-4.5/am-gmm.jconf STAT: jconf successfully finalized STAT: *** loading AM00 _default Stat: init_phmm: Reading in HMM definition Stat: binhmm-header: variance inversed Stat: read_binhmm: has inversed variances Stat: read_binhmm: binary format HMM definition Stat: read_binhmm: this HMM does not need multipath handling Stat: init_phmm: defined HMMs: 8443 Stat: init_phmm: loading binary hmmlist Stat: load_hmmlist_bin: reading hmmlist Stat: aptree_read: 42857 nodes (21428 branch + 21429 data) Stat: load_hmmlist_bin: reading pseudo phone set Stat: aptree_read: 3253 nodes (1626 branch + 1627 data) Stat: init_phmm: logical names: 21429 in HMMList Stat: init_phmm: base phones: 43 used in logical Stat: init_phmm: finished reading HMM definitions STAT: pseudo phones are loaded from binary hmmlist file Stat: hmm_lookup: 12 pseudo phones are added to logical HMM list STAT: *** AM00 _default loaded STAT: *** loading LM00 _default Stat: init_voca: read 64274 words Stat: init_ngram: reading in binary n-gram from /home/pi/lib/julius/dictation-kit-4.5/model/lang_m/bccwj.60k.bingram Stat: ngram_read_bin: file version: 5 Stat: ngram_read_bin_v5: this is backward 3-gram file stat: ngram_read_bin_v5: reading 1-gram stat: ngram_read_bin_v5: reading 2-gram stat: ngram_read_bin_v5: reading 3-gram Stat: ngram_read_bin_v5: reading additional LR 2-gram Stat: ngram_read_bin: making entry name index Stat: init_ngram: found unknown word entry "<unk>" Stat: init_ngram: finished reading n-gram Stat: init_ngram: mapping dictonary words to n-gram entries Stat: init_ngram: finished word-to-ngram mapping Warning: EOS word "</s>" has unigram prob of "-99" Warning: assigining value of BOS word "<s>": -2.048938 STAT: *** LM00 _default loaded STAT: ------ STAT: All models are ready, go for final fusion STAT: [1] create MFCC extraction instance(s) STAT: *** create MFCC calculation modules from AM STAT: AM 0 _default: create a new module MFCC01 STAT: 1 MFCC modules created STAT: [2] create recognition processing instance(s) with AM and LM STAT: composing recognizer instance SR00 _default (AM00 _default, LM00 _default) STAT: Building HMM lexicon tree STAT: lexicon size: 392049+23665=415714 STAT: coordination check passed STAT: make successor lists for unigram factoring STAT: done STAT: 1-gram factoring values has been pre-computed STAT: SR00 _default composed STAT: [3] initialize for acoustic HMM calculation Stat: outprob_init: state-level mixture PDFs, use calc_mix() Stat: addlog: generating addlog table (size = 1953 kB) Stat: addlog: addlog table generated STAT: [4] prepare MFCC storage(s) STAT: [5] prepare for real-time decoding STAT: All init successfully done STAT: ###### initialize input device ----------------------- System Information begin --------------------- JuliusLib rev.4.5 (fast) Engine specification: - Base setup : fast - Supported LM : DFA, N-gram, Word - Extension : LibSndFile - Compiled by : gcc -g -O2 Library configuration: version 4.5 - Audio input primary A/D-in driver : alsa (Advanced Linux Sound Architecture) available drivers : alsa wavefile formats : various formats by libsndfile ver.1 max. length of an input : 320000 samples, 150 words - Language Model class N-gram support : yes MBR weight support : yes word id unit : short (2 bytes) - Acoustic Model multi-path treatment : autodetect - External library file decompression by : zlib library - Process hangling fork on adinnet input : no - built-in SIMD instruction set for DNN NONE AVAILABLE, DNN computation may be too slow! ------------------------------------------------------------ Configuration of Modules Number of defined modules: AM=1, LM=1, SR=1 Acoustic Model (with input parameter spec.): - AM00 "_default" hmmfilename=/home/pi/lib/julius/dictation-kit-4.5/model/phone_m/jnas-tri-3k16-gid.binhmm hmmmapfilename=/home/pi/lib/julius/dictation-kit-4.5/model/phone_m/logicalTri-3k16-gid.bin Language Model: - LM00 "_default" vocabulary filename=/home/pi/lib/julius/dictation-kit-4.5/model/lang_m/bccwj.60k.htkdic n-gram filename=/home/pi/lib/julius/dictation-kit-4.5/model/lang_m/bccwj.60k.bingram (binary format) Recognizer: - SR00 "_default" (AM00, LM00) ------------------------------------------------------------ Speech Analysis Module(s) [MFCC01] for [AM00 _default] Acoustic analysis condition: parameter = MFCC_E_D_N_Z (25 dim. from 12 cepstrum + energy, abs energy supressed with CMN) sample frequency = 16000 Hz sample period = 625 (1 = 100ns) window size = 400 samples (25.0 ms) frame shift = 160 samples (10.0 ms) pre-emphasis = 0.97 # filterbank = 24 cepst. lifter = 22 raw energy = False energy normalize = False delta window = 2 frames (20.0 ms) around hi freq cut = OFF lo freq cut = OFF zero mean frame = ON use power = OFF CVN = OFF VTLN = OFF spectral subtraction = off cep. mean normalization = yes, real-time MAP-CMN, updating initial mean with last 500 input frames initial mean from file = N/A beginning data weight = 100.00 cep. var. normalization = no base setup from = Julius defaults ------------------------------------------------------------ Acoustic Model(s) [AM00 "_default"] HMM Info: 8443 models, 3090 states, 3090 mpdfs, 49440 Gaussians are defined model type = context dependency handling ON training parameter = MFCC_E_N_D_Z vector length = 25 number of stream = 1 stream info = [0-24] cov. matrix type = DIAGC duration type = NULLD max mixture size = 16 Gaussians max length of model = 5 states logical base phones = 43 model skip trans. = not exist, no multi-path handling AM Parameters: Gaussian pruning = none (full computation) (-gprune) short pause HMM name = "sp" specified, "sp" applied (physical) (-sp) cross-word CD on pass1 = handle by approx. (use 3-best of same LC) ------------------------------------------------------------ Language Model(s) [LM00 "_default"] type=n-gram N-gram info: spec = 3-gram, backward (right-to-left) OOV word = <unk>(id=2) wordset size = 59084 1-gram entries = 59084 ( 0.5 MB) 2-gram entries = 2476660 ( 27.7 MB) (64% are valid contexts) 3-gram entries = 7894442 ( 52.8 MB) LR 2-gram entries= 2476660 ( 9.7 MB) pass1 = given additional forward 2-gram Vocabulary Info: vocabulary size = 64274 words, 366102 models average word len = 5.7 models, 17.1 states maximum state num = 54 nodes per word transparent words = not exist words under class = 9444 words Parameters: (-silhead)head sil word = 0: "<s> @0.000000 [] silB(silB)" (-siltail)tail sil word = 1: "</s> @0.000000 [。] silE(silE)" ------------------------------------------------------------ Recognizer(s) [SR00 "_default"] AM00 "_default" + LM00 "_default" Lexicon tree: total node num = 415714 root node num = 632 (148 hi-freq. words are separated from tree lexicon) leaf node num = 64274 fact. node num = 64274 Inter-word N-gram cache: root node to be cached = 195 / 631 (isolated only) word ends to be cached = 59084 (all) max. allocation size = 46MB (-lmp) pass1 LM weight = 10.0 ins. penalty = +0.0 (-lmp2) pass2 LM weight = 10.0 ins. penalty = +0.0 (-transp)trans. penalty = +0.0 per word (-cmalpha)CM alpha coef = 0.050000 Search parameters: multi-path handling = no (-b) trellis beam width = 1500 (-bs)score pruning thres= disabled (-n)search candidate num= 30 (-s) search stack size = 500 (-m) search overflow = after 10000 hypothesis poped 2nd pass method = searching sentence, generating N-best (-b2) pass2 beam width = 100 (-lookuprange)lookup range= 5 (tm-5 <= t <tm+5) (-sb)2nd scan beamthres = 80.0 (in logscore) (-n) search till = 30 candidates found (-output) and output = 1 candidates out of above IWCD handling: 1st pass: approximation (use 3-best of same LC) 2nd pass: loose (apply when hypo. is popped and scanned) factoring score: 1-gram prob. (statically assigned beforehand) progressive output on 1st pass short pause segmentation = off progout interval = 300 msec fall back on search fail = off, returns search failure ------------------------------------------------------------ Decoding algorithm: 1st pass input processing = real time, on-the-fly 1st pass method = 1-best approx. generating indexed trellis output word confidence measure based on search-time scores ------------------------------------------------------------ FrontEnd: Input stream: input type = waveform input source = microphone device API = default sampling freq. = 16000 Hz threaded A/D-in = supported, on zero frames stripping = off silence cutting = on level thres = 800 / 32767 zerocross thres = 60 / sec. head margin = 300 msec. tail margin = 300 msec. chunk size = 1000 samples FVAD switch value = 3 (0: moderate - 3: very aggressive to regist to noise FVAD param smoothlen = 5 (50ms) FVAD param threshold = 0.50 long-term DC removal = off level scaling factor = 1.00 (disabled) reject short input = < 800 msec reject long input = off ----------------------- System Information end ----------------------- Notice for feature extraction (01), ************************************************************* * Cepstral mean normalization for real-time decoding: * * NOTICE: The first input may not be recognized, since * * no initial mean is available on startup. * ************************************************************* Error: adin_alsa: cannot set PCM channel to 1 (Invalid argument) failed to begin input stream |
エラー???
Error: adin_alsa: cannot set PCM channel to 1 (Invalid argument)
failed to begin input stream
マイクを環境変数から設定します。
export ALSADEV="plughw:0,0"
0,0はそれぞれ「カード番号,サブデバイス番号」です。
再度、Juliusを実行します。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 |
$ julius -C ~/lib/julius/dictation-kit-4.5/main.jconf -C ~/lib/julius/dictation-kit-4.5/am-gmm.jconf -nostrip -demo -input mic STAT: include config: /home/pi/lib/julius/dictation-kit-4.5/main.jconf STAT: include config: /home/pi/lib/julius/dictation-kit-4.5/am-gmm.jconf STAT: jconf successfully finalized STAT: *** loading AM00 _default Stat: init_phmm: Reading in HMM definition Stat: binhmm-header: variance inversed Stat: read_binhmm: has inversed variances Stat: read_binhmm: binary format HMM definition Stat: read_binhmm: this HMM does not need multipath handling Stat: init_phmm: defined HMMs: 8443 Stat: init_phmm: loading binary hmmlist Stat: load_hmmlist_bin: reading hmmlist Stat: aptree_read: 42857 nodes (21428 branch + 21429 data) Stat: load_hmmlist_bin: reading pseudo phone set Stat: aptree_read: 3253 nodes (1626 branch + 1627 data) Stat: init_phmm: logical names: 21429 in HMMList Stat: init_phmm: base phones: 43 used in logical Stat: init_phmm: finished reading HMM definitions STAT: pseudo phones are loaded from binary hmmlist file Stat: hmm_lookup: 12 pseudo phones are added to logical HMM list STAT: *** AM00 _default loaded STAT: *** loading LM00 _default Stat: init_voca: read 64274 words Stat: init_ngram: reading in binary n-gram from /home/pi/lib/julius/dictation-kit-4.5/model/lang_m/bccwj.60k.bingram Stat: ngram_read_bin: file version: 5 Stat: ngram_read_bin_v5: this is backward 3-gram file stat: ngram_read_bin_v5: reading 1-gram stat: ngram_read_bin_v5: reading 2-gram stat: ngram_read_bin_v5: reading 3-gram Stat: ngram_read_bin_v5: reading additional LR 2-gram Stat: ngram_read_bin: making entry name index Stat: init_ngram: found unknown word entry "<unk>" Stat: init_ngram: finished reading n-gram Stat: init_ngram: mapping dictonary words to n-gram entries Stat: init_ngram: finished word-to-ngram mapping Warning: EOS word "</s>" has unigram prob of "-99" Warning: assigining value of BOS word "<s>": -2.048938 STAT: *** LM00 _default loaded STAT: ------ STAT: All models are ready, go for final fusion STAT: [1] create MFCC extraction instance(s) STAT: *** create MFCC calculation modules from AM STAT: AM 0 _default: create a new module MFCC01 STAT: 1 MFCC modules created STAT: [2] create recognition processing instance(s) with AM and LM STAT: composing recognizer instance SR00 _default (AM00 _default, LM00 _default) STAT: Building HMM lexicon tree STAT: lexicon size: 392049+23665=415714 STAT: coordination check passed STAT: make successor lists for unigram factoring STAT: done STAT: 1-gram factoring values has been pre-computed STAT: SR00 _default composed STAT: [3] initialize for acoustic HMM calculation Stat: outprob_init: state-level mixture PDFs, use calc_mix() Stat: addlog: generating addlog table (size = 1953 kB) Stat: addlog: addlog table generated STAT: [4] prepare MFCC storage(s) STAT: [5] prepare for real-time decoding STAT: All init successfully done STAT: ###### initialize input device ----------------------- System Information begin --------------------- JuliusLib rev.4.5 (fast) Engine specification: - Base setup : fast - Supported LM : DFA, N-gram, Word - Extension : LibSndFile - Compiled by : gcc -g -O2 Library configuration: version 4.5 - Audio input primary A/D-in driver : alsa (Advanced Linux Sound Architecture) available drivers : alsa wavefile formats : various formats by libsndfile ver.1 max. length of an input : 320000 samples, 150 words - Language Model class N-gram support : yes MBR weight support : yes word id unit : short (2 bytes) - Acoustic Model multi-path treatment : autodetect - External library file decompression by : zlib library - Process hangling fork on adinnet input : no - built-in SIMD instruction set for DNN NONE AVAILABLE, DNN computation may be too slow! ------------------------------------------------------------ Configuration of Modules Number of defined modules: AM=1, LM=1, SR=1 Acoustic Model (with input parameter spec.): - AM00 "_default" hmmfilename=/home/pi/lib/julius/dictation-kit-4.5/model/phone_m/jnas-tri-3k16-gid.binhmm hmmmapfilename=/home/pi/lib/julius/dictation-kit-4.5/model/phone_m/logicalTri-3k16-gid.bin Language Model: - LM00 "_default" vocabulary filename=/home/pi/lib/julius/dictation-kit-4.5/model/lang_m/bccwj.60k.htkdic n-gram filename=/home/pi/lib/julius/dictation-kit-4.5/model/lang_m/bccwj.60k.bingram (binary format) Recognizer: - SR00 "_default" (AM00, LM00) ------------------------------------------------------------ Speech Analysis Module(s) [MFCC01] for [AM00 _default] Acoustic analysis condition: parameter = MFCC_E_D_N_Z (25 dim. from 12 cepstrum + energy, abs energy supressed with CMN) sample frequency = 16000 Hz sample period = 625 (1 = 100ns) window size = 400 samples (25.0 ms) frame shift = 160 samples (10.0 ms) pre-emphasis = 0.97 # filterbank = 24 cepst. lifter = 22 raw energy = False energy normalize = False delta window = 2 frames (20.0 ms) around hi freq cut = OFF lo freq cut = OFF zero mean frame = ON use power = OFF CVN = OFF VTLN = OFF spectral subtraction = off cep. mean normalization = yes, real-time MAP-CMN, updating initial mean with last 500 input frames initial mean from file = N/A beginning data weight = 100.00 cep. var. normalization = no base setup from = Julius defaults ------------------------------------------------------------ Acoustic Model(s) [AM00 "_default"] HMM Info: 8443 models, 3090 states, 3090 mpdfs, 49440 Gaussians are defined model type = context dependency handling ON training parameter = MFCC_E_N_D_Z vector length = 25 number of stream = 1 stream info = [0-24] cov. matrix type = DIAGC duration type = NULLD max mixture size = 16 Gaussians max length of model = 5 states logical base phones = 43 model skip trans. = not exist, no multi-path handling AM Parameters: Gaussian pruning = none (full computation) (-gprune) short pause HMM name = "sp" specified, "sp" applied (physical) (-sp) cross-word CD on pass1 = handle by approx. (use 3-best of same LC) ------------------------------------------------------------ Language Model(s) [LM00 "_default"] type=n-gram N-gram info: spec = 3-gram, backward (right-to-left) OOV word = <unk>(id=2) wordset size = 59084 1-gram entries = 59084 ( 0.5 MB) 2-gram entries = 2476660 ( 27.7 MB) (64% are valid contexts) 3-gram entries = 7894442 ( 52.8 MB) LR 2-gram entries= 2476660 ( 9.7 MB) pass1 = given additional forward 2-gram Vocabulary Info: vocabulary size = 64274 words, 366102 models average word len = 5.7 models, 17.1 states maximum state num = 54 nodes per word transparent words = not exist words under class = 9444 words Parameters: (-silhead)head sil word = 0: "<s> @0.000000 [] silB(silB)" (-siltail)tail sil word = 1: "</s> @0.000000 [。] silE(silE)" ------------------------------------------------------------ Recognizer(s) [SR00 "_default"] AM00 "_default" + LM00 "_default" Lexicon tree: total node num = 415714 root node num = 632 (148 hi-freq. words are separated from tree lexicon) leaf node num = 64274 fact. node num = 64274 Inter-word N-gram cache: root node to be cached = 195 / 631 (isolated only) word ends to be cached = 59084 (all) max. allocation size = 46MB (-lmp) pass1 LM weight = 10.0 ins. penalty = +0.0 (-lmp2) pass2 LM weight = 10.0 ins. penalty = +0.0 (-transp)trans. penalty = +0.0 per word (-cmalpha)CM alpha coef = 0.050000 Search parameters: multi-path handling = no (-b) trellis beam width = 1500 (-bs)score pruning thres= disabled (-n)search candidate num= 30 (-s) search stack size = 500 (-m) search overflow = after 10000 hypothesis poped 2nd pass method = searching sentence, generating N-best (-b2) pass2 beam width = 100 (-lookuprange)lookup range= 5 (tm-5 <= t <tm+5) (-sb)2nd scan beamthres = 80.0 (in logscore) (-n) search till = 30 candidates found (-output) and output = 1 candidates out of above IWCD handling: 1st pass: approximation (use 3-best of same LC) 2nd pass: loose (apply when hypo. is popped and scanned) factoring score: 1-gram prob. (statically assigned beforehand) progressive output on 1st pass short pause segmentation = off progout interval = 300 msec fall back on search fail = off, returns search failure ------------------------------------------------------------ Decoding algorithm: 1st pass input processing = real time, on-the-fly 1st pass method = 1-best approx. generating indexed trellis output word confidence measure based on search-time scores ------------------------------------------------------------ FrontEnd: Input stream: input type = waveform input source = microphone device API = default sampling freq. = 16000 Hz threaded A/D-in = supported, on zero frames stripping = off silence cutting = on level thres = 800 / 32767 zerocross thres = 60 / sec. head margin = 300 msec. tail margin = 300 msec. chunk size = 1000 samples FVAD switch value = 3 (0: moderate - 3: very aggressive to regist to noise FVAD param smoothlen = 5 (50ms) FVAD param threshold = 0.50 long-term DC removal = off level scaling factor = 1.00 (disabled) reject short input = < 800 msec reject long input = off ----------------------- System Information end ----------------------- Notice for feature extraction (01), ************************************************************* * Cepstral mean normalization for real-time decoding: * * NOTICE: The first input may not be recognized, since * * no initial mean is available on startup. * ************************************************************* Stat: adin_alsa: device name from ALSADEV: "plughw:0,0" Stat: capture audio at 16000Hz Stat: adin_alsa: latency set to 32 msec (chunk = 512 bytes) Stat: "plughw:0,0": seeed2micvoicec [seeed-2mic-voicecard] device bcm2835-i2s-wm8960-hifi wm8960-hifi-0 [] subdevice #0 STAT: AD-in thread created <<< please speak >>> |
<<< please speak >>>
となったらOKです。
これで、マイクからの待ち状態なので、マイクに向かってしゃべってみます。
誤認識が多い。。
Raspberry Piの関連記事
ラズパイに関する関連記事をまとめてみました。
↓Raspberry Piとは
・https://urashita.com/archives/25631
↓Raspbian Stretch with desktopのインストール
・https://urashita.com/archives/26460
↓Raspberry PiにAnacondaをインストール
・https://urashita.com/archives/26622
↓RaspberryPi (Berry Conda)でPythonの開発
・https://urashita.com/archives/26664
コメント