DmDX help, VoiceKeyKeyword

DMDX Help.

Legacy Voice Key Keyword

<VoiceKey [N]>
<vox [N]>

variants:
Voice Key Commands

<VoiceKey monitor ...>
<vox monitor ...>
<VoiceKey parameters ...>
<vox parameters ...>

The legacy versions of this keyword exist to interface with a physical electronic voice key, I don't think anyone will ever build another one so the keyword has been appropriated to operate with the built in software VOX (although the original forms of the keyword still work of course). Original keyword is the MIP bit 20 modifier, the bit that enables the VOX input from the PIO12. Reset if N = 0, otherwise set. All MIP modifiers are both parameters and switches. If N is missing the bit is set and the mode is active. The method of input is now restricted to the PIO12 inputs, see Input. The legacy VOX input was designed as an alternative Positive Response coming from a VOX taking it's input from a microphone for naming tasks.

Voice Key Commands

<VoiceKey monitor designator₁, designator₂>
<vox monitor designator₁, designator₂>
<VoiceKey parameters N₁, N₂, N₃, N₄, N₅>
<vox parameters N₁, N₂, N₃, N₄, N₅>

counter designators:
counter.name.
c.name.
counterN
cN
N

The VOX commands added in version 6.1.4.0 of DMDX allow one to programmatically set the parameters of the software digital VOX built in to DMDX for occasions where the experimenter is not present to use the VOX calibration dialog, namely during remote testing when the subject is sure to have a laptop or something else with a built in microphone that's got a good chance of working out of the gate without the need of gerfingerpoken.

The monitor command allows one to nominate two existing counters to monitor the maximum sample value the VOX encounters and the maximum sibilant count. When not using the enhanced VOX the sample range will be from zero to 32768 and the sibilant counter will not be used (you still need to specify it on account of the fact that I doubt anyone will ever not use the enhanced VOX). When using the enhanced VOX the counters are a sum of the samples over the window duration so for the default 30 millisecond window used at the 22 KHz the VOX runs at that's roughly 22 * 30 samples so values can be in the hundred thousands even for silence.

The parameters command stores the following parameter list in the registry overwriting whatever settings might have been set with the VOX calibration dialog and tells the VOX to reload it's parameters from those registry keys meaning that once set all subsequent runs of DMDX will use those parameters for the VOX. Parameter N₁ is sample threshold to trigger the VOX (so 0..32768 for unenhanced and up to 32768 * 22 * N₃ with N₃ being the window duration in milliseconds). Parameter N₂ is a boolean switch to turn the enhanced VOX on (so 0 or 1). Parameter N₃ is the sliding window duration in milliseconds when the enhanced VOX is in use. Parameter N₄ is the sibilant threshold and parameter N₅ another boolean switch the turns the high frequency 11 kHz filter on (affects the sibilant count significantly).

N₁ sample threshold
N₂ enhanced VOX
N₃ sliding window duration
N₄ sibilant threshold
N₅ high frequency filter

So for example here's the script I used testing the VOX commands with that monitors the energy the VOX is encountering. First off it's setting the VOX parameters sufficiently high that the enhanced VOX can't trigger and end the task then it loops back on itself till the user responds with a positive keyboard response displaying the highest values seen from the previous half second's recording. Then it switches to unenhanced operation and doing the same thing then setting the parameters back to something more reasonable.

<ep> <VideoMode desktop> <t 500> f15 <id Keyboard>
<id recordvocal 500,77> <id digitalvox> <nfb> <cr>
</ep>
0 <vox parameters 999999999, 1, 30, 999999999, 1>
<dfm 2 stat> "VOX command test", "response to terminate" @2
<set c1=0> <set c2=0> <vox monitor c1,c2>;
+1 d2 " sam %-6d sib %-6d" <sprintf c1,c2> <set c1=0> <set c2=0> /
! "recording" <ln 2> * <binr -1>;

0 <vox parameters 32768, 0, 0, 0, 0>
<dfm 2 stat> "VOX non enhanced", "response to terminate" @2;
+1 d2 " sam %-6d sib %-6d" <sprintf c1,c2> <set c1=0> <set c2=0> /! "recording" <ln 2> * <binr -1>;

0 <vox parameters 1000000, 1, 30, 200, 1>
"Thank you, that's the end.";

And here is the heart of what was going to be the self titrating VOX calibration task (wound up not being titrating because it gets to actually sample the energy the VOX is seeing as opposed to having to wildly guess and refine it's guess over multiple trials which is what I was initially thinking I'd be doing) that's in the remote testing section, basically it determines what the background noise is then samples some actual syllables and sets the thresholds appropriately (it might be a little aggressive in borderline cases using only 1.3 times the noise in poor situations, time will tell I suppose):

<ep>
<id #keyboard> <mip multisignal>
<VideoMode desktop> <t 500> f15
<id digitalvox> <nfb> <cr>
</ep>

0 <vox parameters 999999999, 1, 30, 999999999, 0>
"When you're ready to be quiet",
"for a few seconds please hit the space bar." @1
<set c1=0> <set c2=0> <vox monitor c1,c2>;
+1 <ntl 0> <set c1=0> <set c2=0> %60 "++" / "recording" *;
+2 "Ok, we see a sample threshold of" @-2,
@-1 "%d with a sibilant count of %d." <sprintf c1,c2> ,
"If there happened to be noise during that interval" @1 ,
"please press Y and we'll try again." <ln 2>
* <mpr +#21> <mnr +#57> <ntl> <mnr +#49> <bic -1>;

0 <ntl 0> <set c3=c1> <set c4=c2> <t 1500>
"Ok, next up are some syllable samples" @-2,
"and a long SSSSSS at the end." @-1,
"Say the syllables as they appear after the fixation point." @1,
"Press the space bar when ready." @3;

+3 <ntl 0> %60 "++" / "GA" *;
+4 d2 %30 / %60 "++" / "PA" *;
+5 d2 %30 /%60 "++" / "SSSSSS" *;

+2 <set c5=c1-c3> <set c6=c2-c4>
"Ok, we see a sample threshold difference of" @-2,
@-1 "%d with a sibilant difference of %d." <sprintf c5,c6> ,
"If you want to do that over" @1 ,
"please press Y and we'll try again." <ln 2>
* <ntl> <bic -3>;

~11 <ntl 0> <bi 12, c6 .gt. (c4/3)> <macro set .sib. = c4 + (c6 / 3)>;
0 "Hmm, looks like the sibilant signal is less" @-1,
"than a third of noise level so we'll just",
"go with a sibilant threshold 1.3 times the noise level" @1
<macro set .sib. = c4 + (c4 / 3)> <emit going with sib 1.3x noise>;

~12 <bi 13, c5 .gt. (c3/3)> <macro set .sam. = c3 + (c5 / 3)>;
0 "Hmm, looks like the vocalization energy is less" @-1,
"than a third of the noise level so I'm guessing",
"the VOX won't be working too well and we'll just" @1,
"go with a sample threshold 1.3 times the noise level." @2
<macro set .sam. = c3 + (c3 / 3)> <emit going with sam 1.3x noise>;

~13 <nfb 0> <vox parameters ~.sam., 1, 30, ~.sib., 1>;
0 "Ok, let's try those values (~.sam. and ~.sib.)" @-1,
"and see if they work well or not." ,
"Press the space bar when ready." @3
<emit 1> <emit 2> <emit 3> <emit 4> <emit 5> <emit 6>
<emit VOX parameters are ~.sam. and ~.sib.>;

+15 "++" / "CAT" *;
+16 "++" / "BAT" *;
+17 "++" / "SAT" *;

0 "Done";

DMDX Index.