Not because they are easy, but because they are hard,īecause that goal will serve to organize and measure the best of our energies and skills, We choose to go to the moon in this decade and do the other things, We choose to go to the moon in this decade and do the other things not because they are easy but because they are hard because that goal will serve to organize and measure the best of our energies and skillsĪnd the corresponding Wikipedia captions: 89 test_srt.py moon.wav > moon.srtĪudio duration: 17s, wav file size 532K, conversion time 22s, output: This audio has good sound quality, with occasional approval screams by the crowd, and a slight echo of the venue: wget -O moon.ogv įfmpeg -i moon.ogv -ss 09:12 -to 09:29 -q:a 0 -map a -ar 16000 -ac 1 moon.wav Through reading, listening, discussing, observing, and thinking. Sample output of the three first sentences: 1Īnd the Wikipedia transcription for the same segment reads: 1 ![]() The recording is 28 seconds long, and the wav file is 900KB large.Ĭonversion took 32 seconds. The speech is however very clear and paused. The sound quality is not great, with a lot of microphone hissing noise due to the technology of the time. From (IBM) (public domain in USA): wget įfmpeg -i Think_Thomas_J_Watson_Sr.ogg -ar 16000 -ac 1 think.wav Next I also tried with the vosk-model-en-us-aspire-0.2 which was a 1.4GB download compared to 36MB of vosk-model-small-en-us-0.3 and is listed at : mv model model.vosk-model-small-en-us-0.3 So we can see that several mistakes were made, presumably in part because we have the understanding that all words are numbers to help us. The "z" of the before last "zero" sounds a bit like an "s". The "nine oh two one oh" is said very fast, but still clear. The test.wav example given in the repository says in perfect American English accent and perfect sound quality three sentences which I transcribe as: one zero zero zero one The sections below show some testing I did with it. The same directory also contains an SRT subtitle output example, which is more human readable and can be directly useful to people with that use case: python3 -m pip install srt Then install vosk-api with pip: pip3 install vosk 2014 - Pycon: Using Python to Code by Voice (Tavis Rudd)įirst you convert the file to the required format and then you recognize it: ffmpeg -i file.mp3 -ar 16000 -ac 1 file.wav.2016 - The Eleventh HOPE: Coding by Voice with Open Source Speech Recognition (David Williams-King).I am also aware of these two talks exploring Linux option for speech recognition: ![]() I am aware of Aenea, which allows speech recognition via Dragonfly on one computer to send events to another, but it has some latency cost: as well as this benchmark of existing speech recognition APIs. I am also aware of this attempt at tracking states of the arts and recent results (bibliography) on speech recognition.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |