Jump to content

Check out our Community Blogs

Register and join over 40,000 other developers!

Recent Topics

Recent Status Updates

View All Updates

- - - - -

Software to Detect Words in Audio File?

software detect word audio map timestamp file

  • Please log in to reply
2 replies to this topic

#1 RhetoricalRuvim


    JavaScript Programmer

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1311 posts
  • Location:C:\Countries\US
  • Programming Language:C, Java, C++, PHP, Python, JavaScript

Posted 19 October 2013 - 04:12 PM

Hello everyone.

I am wanting to write a program that can talk (read from a sentence, actually), and for that I am using word-timestamp descriptors for the program to be able to find the word it is trying to read and seek to the appropriate position (as defined by the descriptor) in an audio file where that particular word is pronounced.

I mean, say the file has the following message:
Today is Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday . 
. Also say, for example, that the word "today" starts at 1 second into the file, "Sunday" starts at 2.5 seconds into the file, and other words also start at different times.

The word-timestamp descriptors would have, more or less, the following information:
"today is": { from: 1, to: 2 }, 
"Sunday": { from: 2.5, to: 3 }, 
"Monday": { from: 3.5, to: 4 }, 
"Tuesday": { from: 5, to: 5.8 }, 
"Wednesday": { from: 6.4, to: 7.3 }, 
... and so on ... 
That way there is a way of knowing exactly where, in the audio file, each word starts, so the program can "talk" if it refers to the timestamp data and plays the appropriate sections from the audio file. For example, in reading the sentence "Today is Tuesday" , it would find the word "today is" (from 1 to 2 seconds into the file), and the word "Tuesday" (from 5 to 5.8 seconds into the file), and it will therefore do:
- seek to 1s
- play until reach 2s
- seek to 5s
- play until reach 5.8s

So I have the system that does the seeking and playing and reading down, but as of right now I have to manually open the sound file in an audio editor and manually figure out the timestamps for the starting points and the ending time positions for each word I want this thing to be able to recognize.

This is workable, but it would makes things a lot easier, I think, if there was some software that could dive into an audio file, scan it, recognize at what timestamps words start and end, and dump that information into a text file, so that the information can be further processed by my code.

Is there any software like that or similar to that?

I think it shouldn't be too hard to recognize a new word because the audio files I am dealing with have small pauses of quite quiet silence between each word.

I tried searching online, but I could not find anything promising. I also tried writing some code that could maybe do something similar, but the language I used was JavaScript (the one I am most experienced with), but it is not a very well-suited language for what I am doing, and it was bulky, and I don't know if that would work.

Thanks in advance.
  • 0

#2 0xFACEB004


    CC Devotee

  • Senior Member
  • PipPipPipPipPipPip
  • 625 posts
  • Location:Chicago
  • Programming Language:C, Java, C++, PHP, (Visual) Basic, JavaScript, Visual Basic .NET, Others
  • Learning:Assembly, Others

Posted 19 October 2013 - 04:45 PM

Is the audio file just voice? If so, you may want to look into something like the Dragon Naturally Speaking SDK, which can:



Step two:
transcribing the Dictation.
The transcription component transcribes
the recorded text to .txt and .idx, where the .idx (index file) is the concordance
containing recognized words and timestamps.



Also, I have heard about CMUSphinx doing the same thing as open source. Found this example at Stackoverflow. Sounds like it will do exactly what you are looking to do.

  • 0

                                                                                                                                                                            FACEB00K Likes this.

#3 BlackRabbit


    CodeCall Legend

  • Expert Member
  • PipPipPipPipPipPipPipPip
  • 3871 posts
  • Location:Argentina
  • Programming Language:C, C++, C#, PHP, JavaScript, Transact-SQL, Bash, Others
  • Learning:Java, Others

Posted 19 October 2013 - 10:58 PM

It's a subject I master, but I know the terms: speech recognition and more importantly for this case: phonetic indexing.


I think you will find out what you need in here . but since those softwares requires for you to read a lot and learn it's script language... I would rather wait for you to tell me how did it go :D

  • 0

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download