PDA

View Full Version : speex_preprocess()


Trivian
14-05-2004, 01:30
Any chance you could use the Speex preprocessor to filter noise, do automatic gain control and more sophisticated voice detection? The preprocessor really makes a world of difference, especially for people with substandard microphones or noisy environments.

Peter
14-05-2004, 05:51
Hum,

currently we have our own gain control, noise filter and voice detection algorithms, which don't do such a bad job in my opinion. What makes you think that the speex_preprocessor does a better job at this than what we currently use?

Trivian
14-05-2004, 13:11
In the version of TS I'm using (TS2 RC2) there's still noticable noise and problems with user-set volume ("No, user X now sounds much louder than you, user Y, you need to increase the recording volume a bit again.. No, that was too much, your signal is peaking.")

I'm using speex as the voice codec in my own videoconferencing tool (which is Linux only, so useless for gaming), and a few weeks back I integrated the speex preprocessor. I used to have the same ammount of "static background noise" that's common on TS today, but now it's GONE. Talking is the same as sitting in the room with people, the noise isn't there anymore. In addition, as long as the microphone settings for the users are "somewhat sane", the volume is automatically scaled so that everyone sounds just as loud.

There's an additional bonus, as by querying the preprocessor_state->loudness2 variable, you can measure how well the user's mic is set up. Assuming you have the AGC setting set to 8000 (which is the recommended default), as long as the user is speaking normally loudness2 should be around 8000.. If it's lower, the user should increase the mic volume and if it's higher the user really should turn it down. While the AGC will make everyone sound identical, having the input volume close to the ideal tuned volume means (slightly) higher precision in the recording :)

Note that the preprocessor is a bit "quirky" at times and it takes a long time to adjust to drastical changes in the enviroment (like changing the mic volume suddenly), so you should include a button to reset it. (Just reinitialize it).

As for the voice activity, it's based on "recognising voice" by frequencies, not by amplitude, which means it'll go off whenever there is something that is distinguishable from white noise. It can sometimes be a bit quick to "stop" activity though, so I recommend a logic where you stop transmission if the last 5 or 10 frames returned 0 for VAD. (I have good experience with 10). Incidentally, while it's not documented, as long as you've turned VAD on in the preprocessor (not in speex itself), the return value from speex_preprocess() will be the VAD indicator.

The good thing about the preprocessor is that it runs all the time, and so uses the frames where there is no voice activity to "learn" noise patterns. The downside is, of course, that it also uses a bit of CPU all the time, so it should probably be a user option.

As it's very easy to set up (just initialize a preprocessor context and call speex_preprocess for every frame), I recommend you give it a try :)

As an added bonus, speex 1.1.5 now has functions that take shorts instead of floats for sample buffers, which means no conversion has to be done.

Peter
14-05-2004, 17:59
Hmmm,

well I will definetly give the preprocessor a shot, just a few notes and questions...are you recording in narrow band mode? Because (naturaly) the wide band will sound quite a bit "clearer" and less noisy (just to make sure this is not the reason why you experience teamspeak to be "noisier" than your app). Also, the automatic gain control works very well, but it is a tad to sensitive to low volume, but once you have your mic over the magic value, you *should* be the same volume as the others (roughly) - this has proven to work quite well, dunno where your problem is there...
Also of course you should always remember that TeamSpeak is VERY cpu critical...we can't run overly expensive routines just to check for voice activation or other stuff...
Anyhow I will look into the features of the speex preprocessor more closly due to your post, I wanted to do so anyhow, since I would love to try the echo canclation :).

Trivian
14-05-2004, 20:10
I've tried both narrow and wideband, and the preprocessor gives a very noticable improvement to both. A very good way to test it is to say "ssssssssssssssssssssssssssssssss". After about 4 or 5 seconds, the filter will conclude it's constant white noise and remove it. Catch your breath and say "sssss" again and it will be just as it was, because the filter will have found a new (and lower) noise-level while you catched your breath.

As for the CPU intensive part, I agree. Personally I'd be willing to sacrifice 10% of my CPU (including taking a 20% hit in framerate) just to be sure the voice communication was "closer to perfect", but many people will disagree with me here which is why it's perfect to have as an option :)

As for echo cancellation, that is easy when it's just one person speaking at a time, but it gets much more tricky when you have multiple incoming streams, since you need to filter against all of them.

However, if I've understood the speex sourcecode correctly, the preprocessor maintains a "noise level" for each frequency band in the incoming frames and subtracts this minimum from the actual frequencies sampled (providing you enable noise removal). Which means that as long as the echo isn't louder than the background noise it'll get filtered out anyway. My experience seems to support this, but I have to turn of VAD to test it, as the echos aren't loud enough to trigger VAD.

SatanClaus
15-05-2004, 00:50
sidenote: make an automatic detection of total cpu-time used, so TS will use more complex algorithms to reduce noise / echos while there's a lot of unused cpu !?!
perhaps also a good idea in combination with profiles... gaming / talking

Trivian
22-05-2005, 15:32
It's been a while since I used TS last, but this noise thing still annoys me ;)

I did a quick hack to inject speex_preprocess into teamspeak, called when the application recieves it's MM_WIM_DATA message (so it works only with the wave driver, not directsound). This solution is far from ideal, as it works with data at a different sample-rate than what is ideal for speex (22 vs 16 khz), and it seems this cleaned up input messes with TS' own postprocessing; while the noise is gone, it sounds like you're sitting in a large stone chamber. This "feature" only appears in teamspeak and not when using the preprocessor in other applications. I have a suspicion it comes from processing 2048 samples at once vs 320 which is the normal frame size for speex, but not having the source for TS it's kinda hard for me to track down the exact reason. I might experiment a bit more with it to find out if there is sufficient interest.

For those interested, the application and hook library can be found at http://www.pvv.ntnu.no/~thorvan/tshook.zip (source included)

This also uses the speex voice activity detection, but it's bastardized to just zeroize the sound input buffer when the VAD returns 0. I still hope to see the next version of TS use this preprocessor.

Trivian
22-05-2005, 15:54
The problem was the framesize, the stone chamber effect is gone now. zip file updated.

Nemesis02
26-05-2005, 22:00
So all we do is join the server, run the app and thats it? does everyone in the server need to use it?

Just wondering cause i tested it myself and i didnt really hear a difference.

Trivian
26-05-2005, 23:15
You need to be using wave and not directsound, and it only filters the sound you send. Using the "local test mode" should let you hear the difference.

Note that the filtering is active only as long as the application is running. When you want to remove it, hit a key as specified. Don't break or abort the program, as that will leave the global DLL running.

Nemesis02
27-05-2005, 13:47
Very nice, works, notice a GREAT difference in the audio.

Trivian
27-05-2005, 14:24
Thank you. As can be seen from the source, the necesarry changes are fairly minimal, so it's my hope this will be a part of the next version of teamspeak.

It does use some CPU for the preprocessing (~ 3% on my machine), but since people are talking about using "higher quality codecs" it seems more cpu is an acceptable tradeoff for quality. Incidentally, using this kind of preprocessing will allow you to use lower bitrate codecs, as the majority of the bandwidth will be used to encode the noise without filtering. Less noise => more of the bandwidth used for voice => less bandwidth (and cpu to encode) needed.

Peter
27-05-2005, 14:58
The current alpha version of the next generation TeamSpeak client is making use of the speex preprocessing capabilities.

Nemesis02
27-05-2005, 19:15
The current alpha version of the next generation TeamSpeak client is making use of the speex preprocessing capabilities.

Will there be a public alpha that will be able to connect to v2 servers? or could there be an alpha where all the new features on it are disabled except for this preprocessor thing?

Bug in it btw, after doing that local echo thing, it crashes my ts lol. Not a big bug, since i dont use the local echo much.

Elfe
28-05-2005, 09:49
I just wonder if this preprocessing can be done with the incoming sound?

best would be a seperate auto gain control for every user in the channel

Trivian
30-05-2005, 22:54
Only if all the users transmit continously. The noise adaption works by measuring background noise when there isn't any voice component, and then subtracts that base noise from the different bands during talking. Most of the time, the users won't transmit their "silent" periods, so there is no no-voice period in which to adapt to the background noise.

As for automatic gain control, that can be done, but it's much more efficient to do it at the transmitters side, as that will result in a cleaner signal to encode, meaning less bandwidth or better quality.

Elfe
31-05-2005, 10:53
Only if all the users transmit continously. The noise adaption works by measuring background noise when there isn't any voice component, and then subtracts that base noise from the different bands during talking. Most of the time, the users won't transmit their "silent" periods, so there is no no-voice period in which to adapt to the background noise.

As for automatic gain control, that can be done, but it's much more efficient to do it at the transmitters side, as that will result in a cleaner signal to encode, meaning less bandwidth or better quality.

the noise reduction isn't needed for incoming but agc would be very helpfull

yeah it would be great if everyone would use it - but hardly anyone does :mad:

Trivian
31-05-2005, 12:48
yeah it would be great if everyone would use it - but hardly anyone does :mad:


The current alpha version of the next generation TeamSpeak client is making use of the speex preprocessing capabilities.


As soon as that version is released, everyone will have these preprocessing features :)

If they have implemented it, we might also have sampled echo cancellation, so we can use speakers and have the output filterered out. I'll run some local experiments with this and post my findings here later on. Speaking to someone using loudspeakers produces a much more natural environment though, so I hope it's doable.

Label2021
01-06-2005, 04:46
Says its installed, however is this something to run in the background of TeamSpeak, or just installed? It also says press any key to abort, except none of the keys abort and print the Uninstall procedure. I press Enter or Space and it goes away though.




-LabeL-

Trivian
20-06-2005, 12:12
Says its installed, however is this something to run in the background of TeamSpeak, or just installed? It also says press any key to abort, except none of the keys abort and print the Uninstall procedure. I press Enter or Space and it goes away though.


You need to run it in the background while TS runs.