03 March 2016

(SnailTrumpeter#0)Voice communication between a human and a PC - NEW NAME, lets do some research

New name

I have found a name for my project. I will call it a SnailTrumpeter, because my first thought was an ear trumpet(Wikipedia ) that was a hearing aid for people in the 18th century. It looks like a snail, so i thought "how about a snail playing on a trumpet in the shape of a snail?" :) This gave me a snail which is a trumpeter, the SnailTrumpeter ;)

Source: Wikipedia

Research

Anyway, lets do some research.
First of all, we need some information about what has been invented already ;) I have used Google, Wikipedia... (sources below ). I have read Windows has it's own speech API, but i wont use it because i want to make this program multi platform later(still fighting with procrastination ).


We want to know how a PC can see the difference between ex. speaking vowel "ah" and "oh". From [1] we see it can be achieved through the spectrum analysis. What is a spectrum? Spectrum [2] (easier) tells us what is the volume of every frequency at the specific time in a sound sample. Graphical representation of the acoustic spectrum is called a spectrogram[3]. So we need a tool(?) which will analyze the spectrum for us to make something like a spectrogram :) Let's Google it! That was super effective[4][5]. We will do Fast Fourier Transform on our sound samples with fftw[5] library. I have found an interesting project on the Web [6] where someone achieved a real time spectrum analysis. Good to see, what we do is likely to be possible :)

We need a way of getting samples through our mic. Mentioned earlier project[6] uses Windows API to do that. I have found[7] a cross platform audio I/O library[8] which is being used in Audacity[9], my favorite, free, multi platform audio editing tool :)

That's all we need for now. Further research on different aspects will be performed later(damn procrastination...) when needed :)

Wiki article i've found worth reading(https://en.wikipedia.org/wiki/Speech_recognition)

Sources:

[1] http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
[2] https://en.wikipedia.org/wiki/Spectrum#Spectrogram
[3] https://en.wikipedia.org/wiki/Spectrogram
[4] http://stackoverflow.com/questions/2886146/real-time-spectrum-analyzer-with-api
[5] http://www.fftw.org
[6] http://www.techmind.org/audio/specanaly.html
[7] http://stackoverflow.com/questions/5148365/recording-mic-to-a-wav-or-mp3-file-on-linux
[8] http://www.portaudio.com/
[9] https://en.wikipedia.org/wiki/PortAudio

No comments: