Speech to text software

1/6/2024

Not because they are easy, but because they are hard,īecause that goal will serve to organize and measure the best of our energies and skills, We choose to go to the moon in this decade and do the other things, We choose to go to the moon in this decade and do the other things not because they are easy but because they are hard because that goal will serve to organize and measure the best of our energies and skillsĪnd the corresponding Wikipedia captions: 89 test_srt.py moon.wav > moon.srtĪudio duration: 17s, wav file size 532K, conversion time 22s, output: This audio has good sound quality, with occasional approval screams by the crowd, and a slight echo of the venue: wget -O moon.ogv įfmpeg -i moon.ogv -ss 09:12 -to 09:29 -q:a 0 -map a -ar 16000 -ac 1 moon.wav Through reading, listening, discussing, observing, and thinking. Sample output of the three first sentences: 1Īnd the Wikipedia transcription for the same segment reads: 1

The recording is 28 seconds long, and the wav file is 900KB large.Ĭonversion took 32 seconds. The speech is however very clear and paused. The sound quality is not great, with a lot of microphone hissing noise due to the technology of the time. From (IBM) (public domain in USA): wget įfmpeg -i Think_Thomas_J_Watson_Sr.ogg -ar 16000 -ac 1 think.wav Next I also tried with the vosk-model-en-us-aspire-0.2 which was a 1.4GB download compared to 36MB of vosk-model-small-en-us-0.3 and is listed at : mv model model.vosk-model-small-en-us-0.3 So we can see that several mistakes were made, presumably in part because we have the understanding that all words are numbers to help us. The "z" of the before last "zero" sounds a bit like an "s". The "nine oh two one oh" is said very fast, but still clear. The test.wav example given in the repository says in perfect American English accent and perfect sound quality three sentences which I transcribe as: one zero zero zero one The sections below show some testing I did with it. The same directory also contains an SRT subtitle output example, which is more human readable and can be directly useful to people with that use case: python3 -m pip install srt Then install vosk-api with pip: pip3 install vosk 2014 - Pycon: Using Python to Code by Voice (Tavis Rudd)įirst you convert the file to the required format and then you recognize it: ffmpeg -i file.mp3 -ar 16000 -ac 1 file.wav.2016 - The Eleventh HOPE: Coding by Voice with Open Source Speech Recognition (David Williams-King).I am also aware of these two talks exploring Linux option for speech recognition:

I am aware of Aenea, which allows speech recognition via Dragonfly on one computer to send events to another, but it has some latency cost: as well as this benchmark of existing speech recognition APIs. I am also aware of this attempt at tracking states of the arts and recent results (bibliography) on speech recognition.

(to be released by Google, mentioned at Interspeech 2018).
Vox, a system to control a Linux system using Dragon NaturallySpeaking: +.
(part of Mozilla's Vaani project: ( mirror)).
There exist some very alpha open-source projects: Clean (94), is the number of utterances scored. The number in the parentheses next to each dataset, e.g. All systems are scored only on the utterances with predictions given by all systems.

Table 4: Results (%WER) for 3 systems evaluated on the original audio. Benchmarks from Gigaom are encouraging as shown in the table below, but I am not aware of any good wrapper around to make it usable without quite some coding (and a large training data set): On Microsoft Windows I use Dragon NaturallySpeaking, on Apple Mac OS X I use Apple Dictation and DragonDictate, on Android I use Google speech recognition, and on iOS I use the built-in Apple speech recognition.īaidu Research released yesterday the code for its speech recognition library using Connectionist Temporal Classification implemented with Torch. As for Wine + Dragon NaturallySpeaking, in my experience it keeps crashing, and I don't seem to be the only one to have such issues unfortunately. By poor accuracy, I mean an accuracy significantly below the one the speech recognition software I mentioned below for other platforms have. Wine + Dragon NaturallySpeaking + NatLink + dragonfly + damselflyĪll the above-mentioned native Linux solutions have both poor accuracy and usability (or some don't allow free-text dictation but only voice commands).silvius (built on the Kaldi speech recognition toolkit).IBM ViaVoice (used to run on Linux but was discontinued years ago).I have unsatisfyingly tried the following: It should not be restricted to voice commands, as I want to be able to dictate text. The short version of the question: I am looking for a speech recognition software that runs on Linux and has decent accuracy and usability.

0 Comments

Speech to text software

Leave a Reply.

Author

Archives

Categories