Raspberry PiとRespeaker 2-Mics Pi HATにJuliusをインストール、コマンド、使い方

ラズパイ用のRespeaker 2-Mics Pi HATというサウンドカードを買いました。

↓ReSpeaker 拡張ボードサウンドカード Raspberry pi 0/3B/3B+適用二つのマイクつき2-Mics HAT (2-mic)

このサウンドカードとRaspberry Pi (ラズパイ) があると、AlexaやGoogle Assistantのようなスマートスピーカーを自作できるらしい。

↓Raspberry Pi Zero WH GPIOピンヘッダーハンダ付け済み Wi-Fi & Bluetooth＆ヒートシンク付

Respeaker 2-Mics Pi HAT とは
Respeaker Pi HATドライバーのインストール、設定
Juliusのインストールと設定
Juliusをコマンドから実行
Raspberry Piの関連記事

Respeaker 2-Mics Pi HAT とは

Respeaker 2-Mics Pi HAT は、AIやボイスアプリケーション用に開発されたRaspberry Pi用のデュアルマイク拡張ボードです。

Amazon Alexa Voice Service、Google AssistantなどのAIスピーカー用のサービスを利用することができます。

本体には日本語の解説はありません。

こんな感じで、Raspberry Pi ZEROに直接接続して使えます。

特徴は次の通りです。

Raspberry Piで利用可能（Zero、1 B+、2 B、3 B）
デュアルマイク
二つのGROVEインタフェース : GPIOとI2Cが利用可能
一つのプログラム可能なユーザーボタンおよび三つのLED
オーディオコーデック搭載
二種類の音声出力ソケット : 3.5 mmオーディオジャック、JST2.0 スピーカー出力
3 mまでの集音距離

ReSpeakerの公式サイト：https://respeaker.io/

・https://respeaker.io/2_mic_array/

詳細：1.Googleアシスタント：https://youtu.be/Z3gIbxnDCtIから始めましょう。 2.ボタンを使用してGoogleアシスタントを起動する方法：https://youtu.be/bZuMimwXtII。

ReSpeaker 2-Mics Pi HATは、AIおよび音声アプリケーション用に設計されたRaspberry Pi用のデュアルマイク拡張ボードです。つまりAmazonアレクサ音声サービス、Googleアシスタントなどを統合した、より強力で柔軟な音声製品を構築できます。このボードは、低消費電力のステレオコーデックであるWM8960に基づいて開発されています。

特徴：Raspberry Pi互換（Pi zero、Pi 1B +、Pi 2B、Pi 3 B）。デュアルマイク；2つのグローブインターフェイス：GPIOとI2Cをサポートします；プログラム可能なボタンとLED：1つのボタンと3つのLED；オンボードのオーディオコーデック。オーディオ出力ソケットの2種類：3.5mmオーディオジャック、JST2.0スピーカー出力；遠方支援（最大3メートル）。寸法：65mm x 30mm x 15mm。

ハードウェア概要：ボタン：GPIO17に接続されたユーザ定義のボタン。MIC_L MIC_R：左側の左側にマイクがあります。RGB LED：3つのAPA102 RGB LED、SPIインターフェースからRaspberry Pi。WM8960：低電力ステレオコーデック。Raspberry Pi 40針：RPI0、Pi 1B +、Pi2BおよびPi3Bを支持する。POWER：ReSpeaker 2-Mics Pi HATの電源に使用するマイクロUSBポート。十分な電流を供給するために、スピーカーを使用しながらボードに電力を供給します。

I2C：I2C-1に接続されたグローブI2Cポート。GPIO12：GPIO12およびGPIO13に接続されたGroveデジタルポート。JST 2.0スピーカー出力端子：JST 2.0コネクター。 3.5mmオーディオジャック：ヘッドホンやスピーカーを接続する場合。注：この製品にはPiゼロボードとバッテリーは含まれていません。

Respeaker Pi HATドライバーのインストール、設定

Respeaker 2-Mics Pi HATのドライバーのインストール手順は、次の通りです。

$ sudo apt-get update
$ sudo apt-get upgrade
$ git clone https://github.com/respeaker/seeed-voicecard.git
$ cd seeed-voicecard
$ sudo ./install.sh
$ sudo reboot

$ sudo apt-get update

$ sudo apt-get upgrade

$ git clone https://github.com/respeaker/seeed-voicecard.git

$ cd seeed-voicecard

$ sudo ./install.sh

$ sudo reboot

リブート後に、以下のコマンドで確認します。

 $ aplay -l
**** ハードウェアデバイス PLAYBACK のリスト ****
カード 0: seeed2micvoicec [seeed-2mic-voicecard], デバイス 0: bcm2835-i2s-wm8960-hifi wm8960-hifi-0 []
  サブデバイス: 1/1
  サブデバイス #0: subdevice #0

$ aplay -l

**** ハードウェアデバイス PLAYBACK のリスト ****

カード 0: seeed2micvoicec [seeed-2mic-voicecard], デバイス 0: bcm2835-i2s-wm8960-hifi wm8960-hifi-0 []

サブデバイス: 1/1

サブデバイス #0: subdevice #0

以下のコマンドで、マイクからの入力がそのまま、イヤホンジャックから出力されます。

$ arecord -f cd -Dhw:0 | aplay -Dhw:0
録音中 WAVE 'stdin' : Signed 16 bit Little Endian, レート 44100 Hz, ステレオ
再生中 WAVE 'stdin' : Signed 16 bit Little Endian, レート 44100 Hz, ステレオ

$ arecord -f cd -Dhw:0 | aplay -Dhw:0

録音中 WAVE 'stdin' : Signed 16 bit Little Endian, レート 44100 Hz, ステレオ

再生中 WAVE 'stdin' : Signed 16 bit Little Endian, レート 44100 Hz, ステレオ

Juliusのインストールと設定

Juliusとは、日本発の音声認識システムを開発・研究するためのオープンソースソフトウェアです。

↓Juliusの公式サイト

・http://julius.osdn.jp/

最新ソースは、GitHubで公開されています。

↓Julius: Open-Source Large Vocabulary Continuous Speech Recognition Engine

・https://github.com/julius-speech/julius

ライブラリのインストール

まず、ライブラリをインストールします。

$ sudo apt install -y libasound2-dev libesd0-dev libsndfile1-dev

1	$ sudo apt install -y libasound2-dev libesd0-dev libsndfile1-dev

Juliusのダウンロードとビルド

Juliusをダウンロードしてビルドします。

$ git clone https://github.com/julius-speech/julius.git
$ cd julius
$ ./configure --with-mictype=alsa
$ make
$ sudo make install

$ git clone https://github.com/julius-speech/julius.git

$ cd julius

$ ./configure --with-mictype=alsa

$ make

$ sudo make install

音声認識パッケージと記述文法音声実行キットのダウンロードと展開

以下のパッケージをダウンロードして展開します。

Julius音声認識パッケージ

Julius を動かしてみるために必要なキットです。日本語のディクテーション（自動口述筆記）に必要な最小限のモデルおよび Julius の実行バイナリが含まれていますので，これだけでとりあえず Julius を動かしてみることができます。

・https://osdn.net/dl/julius/dictation-kit-4.5.zip

Julius記述文法音声認識実行キット

Julius で記述文法音声認識を動かしてみるためのキットです。日本語の不特定話者音響モデル，およびサンプル文法が含まれています。

・https://github.com/julius-speech/grammar-kit/archive/v4.3.1.zip

$ cd ~
$ wget https://osdn.net/dl/julius/dictation-kit-4.5.zip
$ wget https://github.com/julius-speech/grammar-kit/archive/v4.3.1.zip
$ mkdir -p ~/lib/julius
$ cd ~/lib/julius
$ unzip ~/dictation-kit-4.5.zip
$ unzip ~/v4.3.1.zip

$ cd ~

$ wget https://osdn.net/dl/julius/dictation-kit-4.5.zip

$ wget https://github.com/julius-speech/grammar-kit/archive/v4.3.1.zip

$ mkdir -p ~/lib/julius

$ cd ~/lib/julius

$ unzip ~/dictation-kit-4.5.zip

$ unzip ~/v4.3.1.zip

Juliusをコマンドから実行

Juliusをコマンドで実行してみます。

$ julius -C ~/lib/julius/dictation-kit-4.5/main.jconf -C ~/lib/julius/dictation-kit-4.5/am-gmm.jconf -nostrip -demo -input mic
STAT: include config: /home/pi/lib/julius/dictation-kit-4.5/main.jconf
STAT: include config: /home/pi/lib/julius/dictation-kit-4.5/am-gmm.jconf
STAT: jconf successfully finalized
STAT: *** loading AM00 _default
Stat: init_phmm: Reading in HMM definition
Stat: binhmm-header: variance inversed
Stat: read_binhmm: has inversed variances
Stat: read_binhmm: binary format HMM definition
Stat: read_binhmm: this HMM does not need multipath handling
Stat: init_phmm: defined HMMs:  8443
Stat: init_phmm: loading binary hmmlist
Stat: load_hmmlist_bin: reading hmmlist
Stat: aptree_read: 42857 nodes (21428 branch + 21429 data)
Stat: load_hmmlist_bin: reading pseudo phone set
Stat: aptree_read: 3253 nodes (1626 branch + 1627 data)
Stat: init_phmm: logical names: 21429 in HMMList
Stat: init_phmm: base phones:    43 used in logical
Stat: init_phmm: finished reading HMM definitions
STAT: pseudo phones are loaded from binary hmmlist file
Stat: hmm_lookup: 12 pseudo phones are added to logical HMM list
STAT: *** AM00 _default loaded
STAT: *** loading LM00 _default
Stat: init_voca: read 64274 words
Stat: init_ngram: reading in binary n-gram from /home/pi/lib/julius/dictation-kit-4.5/model/lang_m/bccwj.60k.bingram
Stat: ngram_read_bin: file version: 5
Stat: ngram_read_bin_v5: this is backward 3-gram file
stat: ngram_read_bin_v5: reading 1-gram
stat: ngram_read_bin_v5: reading 2-gram
stat: ngram_read_bin_v5: reading 3-gram
Stat: ngram_read_bin_v5: reading additional LR 2-gram
Stat: ngram_read_bin: making entry name index
Stat: init_ngram: found unknown word entry "<unk>"
Stat: init_ngram: finished reading n-gram
Stat: init_ngram: mapping dictonary words to n-gram entries
Stat: init_ngram: finished word-to-ngram mapping
Warning: EOS word "</s>" has unigram prob of "-99"
Warning: assigining value of BOS word "<s>": -2.048938
STAT: *** LM00 _default loaded
STAT: ------
STAT: All models are ready, go for final fusion
STAT: [1] create MFCC extraction instance(s)
STAT: *** create MFCC calculation modules from AM
STAT: AM 0 _default: create a new module MFCC01
STAT: 1 MFCC modules created
STAT: [2] create recognition processing instance(s) with AM and LM
STAT: composing recognizer instance SR00 _default (AM00 _default, LM00 _default)
STAT: Building HMM lexicon tree
STAT: lexicon size: 392049+23665=415714
STAT: coordination check passed
STAT: make successor lists for unigram factoring
STAT: done
STAT:  1-gram factoring values has been pre-computed
STAT: SR00 _default composed
STAT: [3] initialize for acoustic HMM calculation
Stat: outprob_init: state-level mixture PDFs, use calc_mix()
Stat: addlog: generating addlog table (size = 1953 kB)
Stat: addlog: addlog table generated
STAT: [4] prepare MFCC storage(s)
STAT: [5] prepare for real-time decoding
STAT: All init successfully done

STAT: ###### initialize input device
----------------------- System Information begin ---------------------
JuliusLib rev.4.5 (fast)

Engine specification:
 -  Base setup   : fast
 -  Supported LM : DFA, N-gram, Word
 -  Extension    : LibSndFile
 -  Compiled by  : gcc -g -O2
Library configuration: version 4.5
 - Audio input
    primary A/D-in driver   : alsa (Advanced Linux Sound Architecture)
    available drivers       : alsa
    wavefile formats        : various formats by libsndfile ver.1
    max. length of an input : 320000 samples, 150 words
 - Language Model
    class N-gram support    : yes
    MBR weight support      : yes
    word id unit            : short (2 bytes)
 - Acoustic Model
    multi-path treatment    : autodetect
 - External library
    file decompression by   : zlib library
 - Process hangling
    fork on adinnet input   : no
 - built-in SIMD instruction set for DNN

    NONE AVAILABLE, DNN computation may be too slow!


------------------------------------------------------------
Configuration of Modules

 Number of defined modules: AM=1, LM=1, SR=1

 Acoustic Model (with input parameter spec.):
 - AM00 "_default"
        hmmfilename=/home/pi/lib/julius/dictation-kit-4.5/model/phone_m/jnas-tri-3k16-gid.binhmm
        hmmmapfilename=/home/pi/lib/julius/dictation-kit-4.5/model/phone_m/logicalTri-3k16-gid.bin

 Language Model:
 - LM00 "_default"
        vocabulary filename=/home/pi/lib/julius/dictation-kit-4.5/model/lang_m/bccwj.60k.htkdic
        n-gram  filename=/home/pi/lib/julius/dictation-kit-4.5/model/lang_m/bccwj.60k.bingram (binary format)

 Recognizer:
 - SR00 "_default" (AM00, LM00)

------------------------------------------------------------
Speech Analysis Module(s)

[MFCC01]  for [AM00 _default]

 Acoustic analysis condition:
               parameter = MFCC_E_D_N_Z (25 dim. from 12 cepstrum + energy, abs energy supressed with CMN)
        sample frequency = 16000 Hz
           sample period =  625  (1 = 100ns)
             window size =  400 samples (25.0 ms)
             frame shift =  160 samples (10.0 ms)
            pre-emphasis = 0.97
            # filterbank = 24
           cepst. lifter = 22
              raw energy = False
        energy normalize = False
            delta window = 2 frames (20.0 ms) around
             hi freq cut = OFF
             lo freq cut = OFF
         zero mean frame = ON
               use power = OFF
                     CVN = OFF
                    VTLN = OFF

    spectral subtraction = off

 cep. mean normalization = yes, real-time MAP-CMN, updating initial mean with last 500 input frames
  initial mean from file = N/A
   beginning data weight = 100.00
 cep. var. normalization = no

         base setup from = Julius defaults

------------------------------------------------------------
Acoustic Model(s)

[AM00 "_default"]

 HMM Info:
    8443 models, 3090 states, 3090 mpdfs, 49440 Gaussians are defined
              model type = context dependency handling ON
      training parameter = MFCC_E_N_D_Z
           vector length = 25
        number of stream = 1
             stream info = [0-24]
        cov. matrix type = DIAGC
           duration type = NULLD
        max mixture size = 16 Gaussians
     max length of model = 5 states
     logical base phones = 43
       model skip trans. = not exist, no multi-path handling

 AM Parameters:
        Gaussian pruning = none (full computation)  (-gprune)
    short pause HMM name = "sp" specified, "sp" applied (physical)  (-sp)
  cross-word CD on pass1 = handle by approx. (use 3-best of same LC)

------------------------------------------------------------
Language Model(s)

[LM00 "_default"] type=n-gram

 N-gram info:
                    spec = 3-gram, backward (right-to-left)
                OOV word = <unk>(id=2)
            wordset size = 59084
          1-gram entries =      59084  (  0.5 MB)
          2-gram entries =    2476660  ( 27.7 MB) (64% are valid contexts)
          3-gram entries =    7894442  ( 52.8 MB)
        LR 2-gram entries=    2476660  (  9.7 MB)
                   pass1 = given additional forward 2-gram

 Vocabulary Info:
        vocabulary size  = 64274 words, 366102 models
        average word len = 5.7 models, 17.1 states
       maximum state num = 54 nodes per word
       transparent words = not exist
       words under class = 9444 words

 Parameters:
        (-silhead)head sil word = 0: "<s> @0.000000 [] silB(silB)"
        (-siltail)tail sil word = 1: "</s> @0.000000 [。] silE(silE)"

------------------------------------------------------------
Recognizer(s)

[SR00 "_default"]  AM00 "_default"  +  LM00 "_default"

 Lexicon tree:
         total node num = 415714
          root node num =    632
        (148 hi-freq. words are separated from tree lexicon)
          leaf node num =  64274
         fact. node num =  64274

 Inter-word N-gram cache:
        root node to be cached = 195 / 631 (isolated only)
        word ends to be cached = 59084 (all)
          max. allocation size = 46MB
        (-lmp)  pass1 LM weight = 10.0  ins. penalty = +0.0
        (-lmp2) pass2 LM weight = 10.0  ins. penalty = +0.0
        (-transp)trans. penalty = +0.0 per word
        (-cmalpha)CM alpha coef = 0.050000

 Search parameters:
            multi-path handling = no
        (-b) trellis beam width = 1500
        (-bs)score pruning thres= disabled
        (-n)search candidate num= 30
        (-s)  search stack size = 500
        (-m)    search overflow = after 10000 hypothesis poped
                2nd pass method = searching sentence, generating N-best
        (-b2)  pass2 beam width = 100
        (-lookuprange)lookup range= 5  (tm-5 <= t <tm+5)
        (-sb)2nd scan beamthres = 80.0 (in logscore)
        (-n)        search till = 30 candidates found
        (-output)    and output = 1 candidates out of above
         IWCD handling:
           1st pass: approximation (use 3-best of same LC)
           2nd pass: loose (apply when hypo. is popped and scanned)
         factoring score: 1-gram prob. (statically assigned beforehand)
        progressive output on 1st pass
        short pause segmentation = off
                progout interval = 300 msec
        fall back on search fail = off, returns search failure

------------------------------------------------------------
Decoding algorithm:

        1st pass input processing = real time, on-the-fly
        1st pass method = 1-best approx. generating indexed trellis
        output word confidence measure based on search-time scores

------------------------------------------------------------
FrontEnd:

 Input stream:
                     input type = waveform
                   input source = microphone
            device API          = default
                  sampling freq. = 16000 Hz
                 threaded A/D-in = supported, on
           zero frames stripping = off
                 silence cutting = on
                     level thres = 800 / 32767
                 zerocross thres = 60 / sec.
                     head margin = 300 msec.
                     tail margin = 300 msec.
                      chunk size = 1000 samples
               FVAD switch value = 3 (0: moderate - 3: very aggressive to regist to noise
            FVAD param smoothlen = 5 (50ms)
            FVAD param threshold = 0.50
            long-term DC removal = off
            level scaling factor = 1.00 (disabled)
              reject short input = < 800 msec
              reject  long input = off

----------------------- System Information end -----------------------

Notice for feature extraction (01),
        *************************************************************
        * Cepstral mean normalization for real-time decoding:       *
        * NOTICE: The first input may not be recognized, since      *
        *         no initial mean is available on startup.          *
        *************************************************************

Error: adin_alsa: cannot set PCM channel to 1 (Invalid argument)
failed to begin input stream

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

$ julius -C ~/lib/julius/dictation-kit-4.5/main.jconf -C ~/lib/julius/dictation-kit-4.5/am-gmm.jconf -nostrip -demo -input mic

STAT: include config: /home/pi/lib/julius/dictation-kit-4.5/main.jconf

STAT: include config: /home/pi/lib/julius/dictation-kit-4.5/am-gmm.jconf

STAT: jconf successfully finalized

STAT: *** loading AM00 _default

Stat: init_phmm: Reading in HMM definition

Stat: binhmm-header: variance inversed

Stat: read_binhmm: has inversed variances

Stat: read_binhmm: binary format HMM definition

Stat: read_binhmm: this HMM does not need multipath handling

Stat: init_phmm: defined HMMs: 8443

Stat: init_phmm: loading binary hmmlist

Stat: load_hmmlist_bin: reading hmmlist

Stat: aptree_read: 42857 nodes (21428 branch + 21429 data)

Stat: load_hmmlist_bin: reading pseudo phone set

Stat: aptree_read: 3253 nodes (1626 branch + 1627 data)

Stat: init_phmm: logical names: 21429 in HMMList

Stat: init_phmm: base phones: 43 used in logical

Stat: init_phmm: finished reading HMM definitions

STAT: pseudo phones are loaded from binary hmmlist file

Stat: hmm_lookup: 12 pseudo phones are added to logical HMM list

STAT: *** AM00 _default loaded

STAT: *** loading LM00 _default

Stat: init_voca: read 64274 words

Stat: init_ngram: reading in binary n-gram from /home/pi/lib/julius/dictation-kit-4.5/model/lang_m/bccwj.60k.bingram

Stat: ngram_read_bin: file version: 5

Stat: ngram_read_bin_v5: this is backward 3-gram file

stat: ngram_read_bin_v5: reading 1-gram

stat: ngram_read_bin_v5: reading 2-gram

stat: ngram_read_bin_v5: reading 3-gram

Stat: ngram_read_bin_v5: reading additional LR 2-gram

Stat: ngram_read_bin: making entry name index

Stat: init_ngram: found unknown word entry "<unk>"

Stat: init_ngram: finished reading n-gram

Stat: init_ngram: mapping dictonary words to n-gram entries

Stat: init_ngram: finished word-to-ngram mapping

Warning: EOS word "</s>" has unigram prob of "-99"

Warning: assigining value of BOS word "<s>": -2.048938

STAT: *** LM00 _default loaded

STAT: ------

STAT: All models are ready, go for final fusion

STAT: [1] create MFCC extraction instance(s)

STAT: *** create MFCC calculation modules from AM

STAT: AM 0 _default: create a new module MFCC01

STAT: 1 MFCC modules created

STAT: [2] create recognition processing instance(s) with AM and LM

STAT: composing recognizer instance SR00 _default (AM00 _default, LM00 _default)

STAT: Building HMM lexicon tree

STAT: lexicon size: 392049+23665=415714

STAT: coordination check passed

STAT: make successor lists for unigram factoring

STAT: done

STAT: 1-gram factoring values has been pre-computed

STAT: SR00 _default composed

STAT: [3] initialize for acoustic HMM calculation

Stat: outprob_init: state-level mixture PDFs, use calc_mix()

Stat: addlog: generating addlog table (size = 1953 kB)

Stat: addlog: addlog table generated

STAT: [4] prepare MFCC storage(s)

STAT: [5] prepare for real-time decoding

STAT: All init successfully done

STAT: ###### initialize input device

----------------------- System Information begin ---------------------

JuliusLib rev.4.5 (fast)

Engine specification:

- Base setup : fast

- Supported LM : DFA, N-gram, Word

- Extension : LibSndFile

- Compiled by : gcc -g -O2

Library configuration: version 4.5

- Audio input

primary A/D-in driver : alsa (Advanced Linux Sound Architecture)

available drivers : alsa

wavefile formats : various formats by libsndfile ver.1

max. length of an input : 320000 samples, 150 words

- Language Model

class N-gram support : yes

MBR weight support : yes

word id unit : short (2 bytes)

- Acoustic Model

multi-path treatment : autodetect

- External library

file decompression by : zlib library

- Process hangling

fork on adinnet input : no

- built-in SIMD instruction set for DNN

NONE AVAILABLE, DNN computation may be too slow!

------------------------------------------------------------

Configuration of Modules

Number of defined modules: AM=1, LM=1, SR=1

Acoustic Model (with input parameter spec.):

- AM00 "_default"

hmmfilename=/home/pi/lib/julius/dictation-kit-4.5/model/phone_m/jnas-tri-3k16-gid.binhmm

hmmmapfilename=/home/pi/lib/julius/dictation-kit-4.5/model/phone_m/logicalTri-3k16-gid.bin

Language Model:

- LM00 "_default"

vocabulary filename=/home/pi/lib/julius/dictation-kit-4.5/model/lang_m/bccwj.60k.htkdic

n-gram filename=/home/pi/lib/julius/dictation-kit-4.5/model/lang_m/bccwj.60k.bingram (binary format)

Recognizer:

- SR00 "_default" (AM00, LM00)

------------------------------------------------------------

Speech Analysis Module(s)

[MFCC01] for [AM00 _default]

Acoustic analysis condition:

parameter = MFCC_E_D_N_Z (25 dim. from 12 cepstrum + energy, abs energy supressed with CMN)

sample frequency = 16000 Hz

sample period = 625 (1 = 100ns)

window size = 400 samples (25.0 ms)

frame shift = 160 samples (10.0 ms)

pre-emphasis = 0.97

# filterbank = 24

cepst. lifter = 22

raw energy = False

energy normalize = False

delta window = 2 frames (20.0 ms) around

hi freq cut = OFF

lo freq cut = OFF

zero mean frame = ON

use power = OFF

CVN = OFF

VTLN = OFF

spectral subtraction = off

cep. mean normalization = yes, real-time MAP-CMN, updating initial mean with last 500 input frames

initial mean from file = N/A

beginning data weight = 100.00

cep. var. normalization = no

base setup from = Julius defaults

------------------------------------------------------------

Acoustic Model(s)

[AM00 "_default"]

HMM Info:

8443 models, 3090 states, 3090 mpdfs, 49440 Gaussians are defined

model type = context dependency handling ON

training parameter = MFCC_E_N_D_Z

vector length = 25

number of stream = 1

stream info = [0-24]

cov. matrix type = DIAGC

duration type = NULLD

max mixture size = 16 Gaussians

max length of model = 5 states

logical base phones = 43

model skip trans. = not exist, no multi-path handling

AM Parameters:

Gaussian pruning = none (full computation) (-gprune)

short pause HMM name = "sp" specified, "sp" applied (physical) (-sp)

cross-word CD on pass1 = handle by approx. (use 3-best of same LC)

------------------------------------------------------------

Language Model(s)

[LM00 "_default"] type=n-gram

N-gram info:

spec = 3-gram, backward (right-to-left)

OOV word = <unk>(id=2)

wordset size = 59084

1-gram entries = 59084 ( 0.5 MB)

2-gram entries = 2476660 ( 27.7 MB) (64% are valid contexts)

3-gram entries = 7894442 ( 52.8 MB)

LR 2-gram entries= 2476660 ( 9.7 MB)

pass1 = given additional forward 2-gram

Vocabulary Info:

vocabulary size = 64274 words, 366102 models

average word len = 5.7 models, 17.1 states

maximum state num = 54 nodes per word

transparent words = not exist

words under class = 9444 words

Parameters:

(-silhead)head sil word = 0: "<s> @0.000000 [] silB(silB)"

(-siltail)tail sil word = 1: "</s> @0.000000 [。] silE(silE)"

------------------------------------------------------------

Recognizer(s)

[SR00 "_default"] AM00 "_default" + LM00 "_default"

Lexicon tree:

total node num = 415714

root node num = 632

(148 hi-freq. words are separated from tree lexicon)

leaf node num = 64274

fact. node num = 64274

Inter-word N-gram cache:

root node to be cached = 195 / 631 (isolated only)

word ends to be cached = 59084 (all)

max. allocation size = 46MB

(-lmp) pass1 LM weight = 10.0 ins. penalty = +0.0

(-lmp2) pass2 LM weight = 10.0 ins. penalty = +0.0

(-transp)trans. penalty = +0.0 per word

(-cmalpha)CM alpha coef = 0.050000

Search parameters:

multi-path handling = no

(-b) trellis beam width = 1500

(-bs)score pruning thres= disabled

(-n)search candidate num= 30

(-s) search stack size = 500

(-m) search overflow = after 10000 hypothesis poped

2nd pass method = searching sentence, generating N-best

(-b2) pass2 beam width = 100

(-lookuprange)lookup range= 5 (tm-5 <= t <tm+5)

(-sb)2nd scan beamthres = 80.0 (in logscore)

(-n) search till = 30 candidates found

(-output) and output = 1 candidates out of above

IWCD handling:

1st pass: approximation (use 3-best of same LC)

2nd pass: loose (apply when hypo. is popped and scanned)

factoring score: 1-gram prob. (statically assigned beforehand)

progressive output on 1st pass

short pause segmentation = off

progout interval = 300 msec

fall back on search fail = off, returns search failure

------------------------------------------------------------

Decoding algorithm:

1st pass input processing = real time, on-the-fly

1st pass method = 1-best approx. generating indexed trellis

output word confidence measure based on search-time scores

------------------------------------------------------------

FrontEnd:

Input stream:

input type = waveform

input source = microphone

device API = default

sampling freq. = 16000 Hz

threaded A/D-in = supported, on

zero frames stripping = off

silence cutting = on

level thres = 800 / 32767

zerocross thres = 60 / sec.

head margin = 300 msec.

tail margin = 300 msec.

chunk size = 1000 samples

FVAD switch value = 3 (0: moderate - 3: very aggressive to regist to noise

FVAD param smoothlen = 5 (50ms)

FVAD param threshold = 0.50

long-term DC removal = off

level scaling factor = 1.00 (disabled)

reject short input = < 800 msec

reject long input = off

----------------------- System Information end -----------------------

Notice for feature extraction (01),

*************************************************************

* Cepstral mean normalization for real-time decoding: *

* NOTICE: The first input may not be recognized, since *

* no initial mean is available on startup. *

*************************************************************

Error: adin_alsa: cannot set PCM channel to 1 (Invalid argument)

failed to begin input stream

エラー？？？

Error: adin_alsa: cannot set PCM channel to 1 (Invalid argument)

failed to begin input stream

マイクを環境変数から設定します。

export ALSADEV="plughw:0,0"

0,0はそれぞれ「カード番号,サブデバイス番号」です。

再度、Juliusを実行します。

$ julius -C ~/lib/julius/dictation-kit-4.5/main.jconf -C ~/lib/julius/dictation-kit-4.5/am-gmm.jconf -nostrip -demo -input mic
STAT: include config: /home/pi/lib/julius/dictation-kit-4.5/main.jconf
STAT: include config: /home/pi/lib/julius/dictation-kit-4.5/am-gmm.jconf
STAT: jconf successfully finalized
STAT: *** loading AM00 _default
Stat: init_phmm: Reading in HMM definition
Stat: binhmm-header: variance inversed
Stat: read_binhmm: has inversed variances
Stat: read_binhmm: binary format HMM definition
Stat: read_binhmm: this HMM does not need multipath handling
Stat: init_phmm: defined HMMs:  8443
Stat: init_phmm: loading binary hmmlist
Stat: load_hmmlist_bin: reading hmmlist
Stat: aptree_read: 42857 nodes (21428 branch + 21429 data)
Stat: load_hmmlist_bin: reading pseudo phone set
Stat: aptree_read: 3253 nodes (1626 branch + 1627 data)
Stat: init_phmm: logical names: 21429 in HMMList
Stat: init_phmm: base phones:    43 used in logical
Stat: init_phmm: finished reading HMM definitions
STAT: pseudo phones are loaded from binary hmmlist file
Stat: hmm_lookup: 12 pseudo phones are added to logical HMM list
STAT: *** AM00 _default loaded
STAT: *** loading LM00 _default
Stat: init_voca: read 64274 words
Stat: init_ngram: reading in binary n-gram from /home/pi/lib/julius/dictation-kit-4.5/model/lang_m/bccwj.60k.bingram
Stat: ngram_read_bin: file version: 5
Stat: ngram_read_bin_v5: this is backward 3-gram file
stat: ngram_read_bin_v5: reading 1-gram
stat: ngram_read_bin_v5: reading 2-gram
stat: ngram_read_bin_v5: reading 3-gram
Stat: ngram_read_bin_v5: reading additional LR 2-gram
Stat: ngram_read_bin: making entry name index
Stat: init_ngram: found unknown word entry "<unk>"
Stat: init_ngram: finished reading n-gram
Stat: init_ngram: mapping dictonary words to n-gram entries
Stat: init_ngram: finished word-to-ngram mapping
Warning: EOS word "</s>" has unigram prob of "-99"
Warning: assigining value of BOS word "<s>": -2.048938
STAT: *** LM00 _default loaded
STAT: ------
STAT: All models are ready, go for final fusion
STAT: [1] create MFCC extraction instance(s)
STAT: *** create MFCC calculation modules from AM
STAT: AM 0 _default: create a new module MFCC01
STAT: 1 MFCC modules created
STAT: [2] create recognition processing instance(s) with AM and LM
STAT: composing recognizer instance SR00 _default (AM00 _default, LM00 _default)
STAT: Building HMM lexicon tree
STAT: lexicon size: 392049+23665=415714
STAT: coordination check passed
STAT: make successor lists for unigram factoring
STAT: done
STAT:  1-gram factoring values has been pre-computed
STAT: SR00 _default composed
STAT: [3] initialize for acoustic HMM calculation
Stat: outprob_init: state-level mixture PDFs, use calc_mix()
Stat: addlog: generating addlog table (size = 1953 kB)
Stat: addlog: addlog table generated
STAT: [4] prepare MFCC storage(s)
STAT: [5] prepare for real-time decoding
STAT: All init successfully done

STAT: ###### initialize input device
----------------------- System Information begin ---------------------
JuliusLib rev.4.5 (fast)

Engine specification:
 -  Base setup   : fast
 -  Supported LM : DFA, N-gram, Word
 -  Extension    : LibSndFile
 -  Compiled by  : gcc -g -O2
Library configuration: version 4.5
 - Audio input
    primary A/D-in driver   : alsa (Advanced Linux Sound Architecture)
    available drivers       : alsa
    wavefile formats        : various formats by libsndfile ver.1
    max. length of an input : 320000 samples, 150 words
 - Language Model
    class N-gram support    : yes
    MBR weight support      : yes
    word id unit            : short (2 bytes)
 - Acoustic Model
    multi-path treatment    : autodetect
 - External library
    file decompression by   : zlib library
 - Process hangling
    fork on adinnet input   : no
 - built-in SIMD instruction set for DNN

    NONE AVAILABLE, DNN computation may be too slow!


------------------------------------------------------------
Configuration of Modules

 Number of defined modules: AM=1, LM=1, SR=1

 Acoustic Model (with input parameter spec.):
 - AM00 "_default"
        hmmfilename=/home/pi/lib/julius/dictation-kit-4.5/model/phone_m/jnas-tri-3k16-gid.binhmm
        hmmmapfilename=/home/pi/lib/julius/dictation-kit-4.5/model/phone_m/logicalTri-3k16-gid.bin

 Language Model:
 - LM00 "_default"
        vocabulary filename=/home/pi/lib/julius/dictation-kit-4.5/model/lang_m/bccwj.60k.htkdic
        n-gram  filename=/home/pi/lib/julius/dictation-kit-4.5/model/lang_m/bccwj.60k.bingram (binary format)

 Recognizer:
 - SR00 "_default" (AM00, LM00)

------------------------------------------------------------
Speech Analysis Module(s)

[MFCC01]  for [AM00 _default]

 Acoustic analysis condition:
               parameter = MFCC_E_D_N_Z (25 dim. from 12 cepstrum + energy, abs energy supressed with CMN)
        sample frequency = 16000 Hz
           sample period =  625  (1 = 100ns)
             window size =  400 samples (25.0 ms)
             frame shift =  160 samples (10.0 ms)
            pre-emphasis = 0.97
            # filterbank = 24
           cepst. lifter = 22
              raw energy = False
        energy normalize = False
            delta window = 2 frames (20.0 ms) around
             hi freq cut = OFF
             lo freq cut = OFF
         zero mean frame = ON
               use power = OFF
                     CVN = OFF
                    VTLN = OFF

    spectral subtraction = off

 cep. mean normalization = yes, real-time MAP-CMN, updating initial mean with last 500 input frames
  initial mean from file = N/A
   beginning data weight = 100.00
 cep. var. normalization = no

         base setup from = Julius defaults

------------------------------------------------------------
Acoustic Model(s)

[AM00 "_default"]

 HMM Info:
    8443 models, 3090 states, 3090 mpdfs, 49440 Gaussians are defined
              model type = context dependency handling ON
      training parameter = MFCC_E_N_D_Z
           vector length = 25
        number of stream = 1
             stream info = [0-24]
        cov. matrix type = DIAGC
           duration type = NULLD
        max mixture size = 16 Gaussians
     max length of model = 5 states
     logical base phones = 43
       model skip trans. = not exist, no multi-path handling

 AM Parameters:
        Gaussian pruning = none (full computation)  (-gprune)
    short pause HMM name = "sp" specified, "sp" applied (physical)  (-sp)
  cross-word CD on pass1 = handle by approx. (use 3-best of same LC)

------------------------------------------------------------
Language Model(s)

[LM00 "_default"] type=n-gram

 N-gram info:
                    spec = 3-gram, backward (right-to-left)
                OOV word = <unk>(id=2)
            wordset size = 59084
          1-gram entries =      59084  (  0.5 MB)
          2-gram entries =    2476660  ( 27.7 MB) (64% are valid contexts)
          3-gram entries =    7894442  ( 52.8 MB)
        LR 2-gram entries=    2476660  (  9.7 MB)
                   pass1 = given additional forward 2-gram

 Vocabulary Info:
        vocabulary size  = 64274 words, 366102 models
        average word len = 5.7 models, 17.1 states
       maximum state num = 54 nodes per word
       transparent words = not exist
       words under class = 9444 words

 Parameters:
        (-silhead)head sil word = 0: "<s> @0.000000 [] silB(silB)"
        (-siltail)tail sil word = 1: "</s> @0.000000 [。] silE(silE)"

------------------------------------------------------------
Recognizer(s)

[SR00 "_default"]  AM00 "_default"  +  LM00 "_default"

 Lexicon tree:
         total node num = 415714
          root node num =    632
        (148 hi-freq. words are separated from tree lexicon)
          leaf node num =  64274
         fact. node num =  64274

 Inter-word N-gram cache:
        root node to be cached = 195 / 631 (isolated only)
        word ends to be cached = 59084 (all)
          max. allocation size = 46MB
        (-lmp)  pass1 LM weight = 10.0  ins. penalty = +0.0
        (-lmp2) pass2 LM weight = 10.0  ins. penalty = +0.0
        (-transp)trans. penalty = +0.0 per word
        (-cmalpha)CM alpha coef = 0.050000

 Search parameters:
            multi-path handling = no
        (-b) trellis beam width = 1500
        (-bs)score pruning thres= disabled
        (-n)search candidate num= 30
        (-s)  search stack size = 500
        (-m)    search overflow = after 10000 hypothesis poped
                2nd pass method = searching sentence, generating N-best
        (-b2)  pass2 beam width = 100
        (-lookuprange)lookup range= 5  (tm-5 <= t <tm+5)
        (-sb)2nd scan beamthres = 80.0 (in logscore)
        (-n)        search till = 30 candidates found
        (-output)    and output = 1 candidates out of above
         IWCD handling:
           1st pass: approximation (use 3-best of same LC)
           2nd pass: loose (apply when hypo. is popped and scanned)
         factoring score: 1-gram prob. (statically assigned beforehand)
        progressive output on 1st pass
        short pause segmentation = off
                progout interval = 300 msec
        fall back on search fail = off, returns search failure

------------------------------------------------------------
Decoding algorithm:

        1st pass input processing = real time, on-the-fly
        1st pass method = 1-best approx. generating indexed trellis
        output word confidence measure based on search-time scores

------------------------------------------------------------
FrontEnd:

 Input stream:
                     input type = waveform
                   input source = microphone
            device API          = default
                  sampling freq. = 16000 Hz
                 threaded A/D-in = supported, on
           zero frames stripping = off
                 silence cutting = on
                     level thres = 800 / 32767
                 zerocross thres = 60 / sec.
                     head margin = 300 msec.
                     tail margin = 300 msec.
                      chunk size = 1000 samples
               FVAD switch value = 3 (0: moderate - 3: very aggressive to regist to noise
            FVAD param smoothlen = 5 (50ms)
            FVAD param threshold = 0.50
            long-term DC removal = off
            level scaling factor = 1.00 (disabled)
              reject short input = < 800 msec
              reject  long input = off

----------------------- System Information end -----------------------

Notice for feature extraction (01),
        *************************************************************
        * Cepstral mean normalization for real-time decoding:       *
        * NOTICE: The first input may not be recognized, since      *
        *         no initial mean is available on startup.          *
        *************************************************************

Stat: adin_alsa: device name from ALSADEV: "plughw:0,0"
Stat: capture audio at 16000Hz
Stat: adin_alsa: latency set to 32 msec (chunk = 512 bytes)
Stat: "plughw:0,0": seeed2micvoicec [seeed-2mic-voicecard] device bcm2835-i2s-wm8960-hifi wm8960-hifi-0 [] subdevice #0
STAT: AD-in thread created
<<< please speak >>>

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

$ julius -C ~/lib/julius/dictation-kit-4.5/main.jconf -C ~/lib/julius/dictation-kit-4.5/am-gmm.jconf -nostrip -demo -input mic

STAT: include config: /home/pi/lib/julius/dictation-kit-4.5/main.jconf

STAT: include config: /home/pi/lib/julius/dictation-kit-4.5/am-gmm.jconf

STAT: jconf successfully finalized

STAT: *** loading AM00 _default

Stat: init_phmm: Reading in HMM definition

Stat: binhmm-header: variance inversed

Stat: read_binhmm: has inversed variances

Stat: read_binhmm: binary format HMM definition

Stat: read_binhmm: this HMM does not need multipath handling

Stat: init_phmm: defined HMMs: 8443

Stat: init_phmm: loading binary hmmlist

Stat: load_hmmlist_bin: reading hmmlist

Stat: aptree_read: 42857 nodes (21428 branch + 21429 data)

Stat: load_hmmlist_bin: reading pseudo phone set

Stat: aptree_read: 3253 nodes (1626 branch + 1627 data)

Stat: init_phmm: logical names: 21429 in HMMList

Stat: init_phmm: base phones: 43 used in logical

Stat: init_phmm: finished reading HMM definitions

STAT: pseudo phones are loaded from binary hmmlist file

Stat: hmm_lookup: 12 pseudo phones are added to logical HMM list

STAT: *** AM00 _default loaded

STAT: *** loading LM00 _default

Stat: init_voca: read 64274 words

Stat: init_ngram: reading in binary n-gram from /home/pi/lib/julius/dictation-kit-4.5/model/lang_m/bccwj.60k.bingram

Stat: ngram_read_bin: file version: 5

Stat: ngram_read_bin_v5: this is backward 3-gram file

stat: ngram_read_bin_v5: reading 1-gram

stat: ngram_read_bin_v5: reading 2-gram

stat: ngram_read_bin_v5: reading 3-gram

Stat: ngram_read_bin_v5: reading additional LR 2-gram

Stat: ngram_read_bin: making entry name index

Stat: init_ngram: found unknown word entry "<unk>"

Stat: init_ngram: finished reading n-gram

Stat: init_ngram: mapping dictonary words to n-gram entries

Stat: init_ngram: finished word-to-ngram mapping

Warning: EOS word "</s>" has unigram prob of "-99"

Warning: assigining value of BOS word "<s>": -2.048938

STAT: *** LM00 _default loaded

STAT: ------

STAT: All models are ready, go for final fusion

STAT: [1] create MFCC extraction instance(s)

STAT: *** create MFCC calculation modules from AM

STAT: AM 0 _default: create a new module MFCC01

STAT: 1 MFCC modules created

STAT: [2] create recognition processing instance(s) with AM and LM

STAT: composing recognizer instance SR00 _default (AM00 _default, LM00 _default)

STAT: Building HMM lexicon tree

STAT: lexicon size: 392049+23665=415714

STAT: coordination check passed

STAT: make successor lists for unigram factoring

STAT: done

STAT: 1-gram factoring values has been pre-computed

STAT: SR00 _default composed

STAT: [3] initialize for acoustic HMM calculation

Stat: outprob_init: state-level mixture PDFs, use calc_mix()

Stat: addlog: generating addlog table (size = 1953 kB)

Stat: addlog: addlog table generated

STAT: [4] prepare MFCC storage(s)

STAT: [5] prepare for real-time decoding

STAT: All init successfully done

STAT: ###### initialize input device

----------------------- System Information begin ---------------------

JuliusLib rev.4.5 (fast)

Engine specification:

- Base setup : fast

- Supported LM : DFA, N-gram, Word

- Extension : LibSndFile

- Compiled by : gcc -g -O2

Library configuration: version 4.5

- Audio input

primary A/D-in driver : alsa (Advanced Linux Sound Architecture)

available drivers : alsa

wavefile formats : various formats by libsndfile ver.1

max. length of an input : 320000 samples, 150 words

- Language Model

class N-gram support : yes

MBR weight support : yes

word id unit : short (2 bytes)

- Acoustic Model

multi-path treatment : autodetect

- External library

file decompression by : zlib library

- Process hangling

fork on adinnet input : no

- built-in SIMD instruction set for DNN

NONE AVAILABLE, DNN computation may be too slow!

------------------------------------------------------------

Configuration of Modules

Number of defined modules: AM=1, LM=1, SR=1

Acoustic Model (with input parameter spec.):

- AM00 "_default"

hmmfilename=/home/pi/lib/julius/dictation-kit-4.5/model/phone_m/jnas-tri-3k16-gid.binhmm

hmmmapfilename=/home/pi/lib/julius/dictation-kit-4.5/model/phone_m/logicalTri-3k16-gid.bin

Language Model:

- LM00 "_default"

vocabulary filename=/home/pi/lib/julius/dictation-kit-4.5/model/lang_m/bccwj.60k.htkdic

n-gram filename=/home/pi/lib/julius/dictation-kit-4.5/model/lang_m/bccwj.60k.bingram (binary format)

Recognizer:

- SR00 "_default" (AM00, LM00)

------------------------------------------------------------

Speech Analysis Module(s)

[MFCC01] for [AM00 _default]

Acoustic analysis condition:

parameter = MFCC_E_D_N_Z (25 dim. from 12 cepstrum + energy, abs energy supressed with CMN)

sample frequency = 16000 Hz

sample period = 625 (1 = 100ns)

window size = 400 samples (25.0 ms)

frame shift = 160 samples (10.0 ms)

pre-emphasis = 0.97

# filterbank = 24

cepst. lifter = 22

raw energy = False

energy normalize = False

delta window = 2 frames (20.0 ms) around

hi freq cut = OFF

lo freq cut = OFF

zero mean frame = ON

use power = OFF

CVN = OFF

VTLN = OFF

spectral subtraction = off

cep. mean normalization = yes, real-time MAP-CMN, updating initial mean with last 500 input frames

initial mean from file = N/A

beginning data weight = 100.00

cep. var. normalization = no

base setup from = Julius defaults

------------------------------------------------------------

Acoustic Model(s)

[AM00 "_default"]

HMM Info:

8443 models, 3090 states, 3090 mpdfs, 49440 Gaussians are defined

model type = context dependency handling ON

training parameter = MFCC_E_N_D_Z

vector length = 25

number of stream = 1

stream info = [0-24]

cov. matrix type = DIAGC

duration type = NULLD

max mixture size = 16 Gaussians

max length of model = 5 states

logical base phones = 43

model skip trans. = not exist, no multi-path handling

AM Parameters:

Gaussian pruning = none (full computation) (-gprune)

short pause HMM name = "sp" specified, "sp" applied (physical) (-sp)

cross-word CD on pass1 = handle by approx. (use 3-best of same LC)

------------------------------------------------------------

Language Model(s)

[LM00 "_default"] type=n-gram

N-gram info:

spec = 3-gram, backward (right-to-left)

OOV word = <unk>(id=2)

wordset size = 59084

1-gram entries = 59084 ( 0.5 MB)

2-gram entries = 2476660 ( 27.7 MB) (64% are valid contexts)

3-gram entries = 7894442 ( 52.8 MB)

LR 2-gram entries= 2476660 ( 9.7 MB)

pass1 = given additional forward 2-gram

Vocabulary Info:

vocabulary size = 64274 words, 366102 models

average word len = 5.7 models, 17.1 states

maximum state num = 54 nodes per word

transparent words = not exist

words under class = 9444 words

Parameters:

(-silhead)head sil word = 0: "<s> @0.000000 [] silB(silB)"

(-siltail)tail sil word = 1: "</s> @0.000000 [。] silE(silE)"

------------------------------------------------------------

Recognizer(s)

[SR00 "_default"] AM00 "_default" + LM00 "_default"

Lexicon tree:

total node num = 415714

root node num = 632

(148 hi-freq. words are separated from tree lexicon)

leaf node num = 64274

fact. node num = 64274

Inter-word N-gram cache:

root node to be cached = 195 / 631 (isolated only)

word ends to be cached = 59084 (all)

max. allocation size = 46MB

(-lmp) pass1 LM weight = 10.0 ins. penalty = +0.0

(-lmp2) pass2 LM weight = 10.0 ins. penalty = +0.0

(-transp)trans. penalty = +0.0 per word

(-cmalpha)CM alpha coef = 0.050000

Search parameters:

multi-path handling = no

(-b) trellis beam width = 1500

(-bs)score pruning thres= disabled

(-n)search candidate num= 30

(-s) search stack size = 500

(-m) search overflow = after 10000 hypothesis poped

2nd pass method = searching sentence, generating N-best

(-b2) pass2 beam width = 100

(-lookuprange)lookup range= 5 (tm-5 <= t <tm+5)

(-sb)2nd scan beamthres = 80.0 (in logscore)

(-n) search till = 30 candidates found

(-output) and output = 1 candidates out of above

IWCD handling:

1st pass: approximation (use 3-best of same LC)

2nd pass: loose (apply when hypo. is popped and scanned)

factoring score: 1-gram prob. (statically assigned beforehand)

progressive output on 1st pass

short pause segmentation = off

progout interval = 300 msec

fall back on search fail = off, returns search failure

------------------------------------------------------------

Decoding algorithm:

1st pass input processing = real time, on-the-fly

1st pass method = 1-best approx. generating indexed trellis

output word confidence measure based on search-time scores

------------------------------------------------------------

FrontEnd:

Input stream:

input type = waveform

input source = microphone

device API = default

sampling freq. = 16000 Hz

threaded A/D-in = supported, on

zero frames stripping = off

silence cutting = on

level thres = 800 / 32767

zerocross thres = 60 / sec.

head margin = 300 msec.

tail margin = 300 msec.

chunk size = 1000 samples

FVAD switch value = 3 (0: moderate - 3: very aggressive to regist to noise

FVAD param smoothlen = 5 (50ms)

FVAD param threshold = 0.50

long-term DC removal = off

level scaling factor = 1.00 (disabled)

reject short input = < 800 msec

reject long input = off

----------------------- System Information end -----------------------

Notice for feature extraction (01),

*************************************************************

* Cepstral mean normalization for real-time decoding: *

* NOTICE: The first input may not be recognized, since *

* no initial mean is available on startup. *

*************************************************************

Stat: adin_alsa: device name from ALSADEV: "plughw:0,0"

Stat: capture audio at 16000Hz

Stat: adin_alsa: latency set to 32 msec (chunk = 512 bytes)

Stat: "plughw:0,0": seeed2micvoicec [seeed-2mic-voicecard] device bcm2835-i2s-wm8960-hifi wm8960-hifi-0 [] subdevice #0

STAT: AD-in thread created

<<< please speak >>>

となったらOKです。

これで、マイクからの待ち状態なので、マイクに向かってしゃべってみます。

誤認識が多い。。

Raspberry Piの関連記事

ラズパイに関する関連記事をまとめてみました。

↓Raspberry Piとは

・https://urashita.com/archives/25631

↓Raspbian Stretch with desktopのインストール

・https://urashita.com/archives/26460

↓Raspberry PiにAnacondaをインストール

・https://urashita.com/archives/26622

↓RaspberryPi (Berry Conda)でPythonの開発

・https://urashita.com/archives/26664