Hi,
I have to create an application which can recognize some important keywords from a text. For example (by Calais Viewer):
Input:
"Computer programming (often shortened to programming or coding) is the process of designing, writing, testing, debugging / troubleshooting, and maintaining the source code of computer programs. This source code is written in a programming language. The code may be a modification of an existing source or something completely new. The purpose of programming is to create a program that exhibits a certain desired behaviour (customization). The process of writing source code often requires expertise in many different subjects, including knowledge of the application domain, specialized algorithms and formal logic."
Output:
"Computer programming
Mathematics
Technology Internet
Computing
Programming in the large and programming in the small
Algorithm
Programming paradigms
Programming language
Source code
Debugging
C
Software engineering
"
I don´t want to use existing systems because it will be integrated into my website and used in other languages too - so I will edit it. So I need some tips for an universal solution, especially algorithm or method descriptions which could be useful.
Thanks!
Text analytics
Started by pilsner001, Jul 21 2010 08:40 AM
2 replies to this topic
#1
Posted 21 July 2010 - 08:40 AM
|
|
|
#2
Posted 22 July 2010 - 12:57 PM
I recommend statistical/machine learning. The part of the domain that deals with language is still experimental but if you have knowledge of computer programming it won't be too hard to do this simple task.
The rough idea is to train a model with [labelled] examples so that it would statistically learn to "extract" common words among a text.
The rough idea is to train a model with [labelled] examples so that it would statistically learn to "extract" common words among a text.
#3
Posted 23 July 2010 - 10:28 AM
manux said:
I recommend statistical/machine learning. The part of the domain that deals with language is still experimental but if you have knowledge of computer programming it won't be too hard to do this simple task.
The rough idea is to train a model with [labelled] examples so that it would statistically learn to "extract" common words among a text.
The rough idea is to train a model with [labelled] examples so that it would statistically learn to "extract" common words among a text.
I'm not an expert on this, but I think that will only work if the input always uses the same kind of words. Otherwise the program would need to understand the syntax of human language in general, which seems quite a bit too complex to ask for on a forum. Unless the program goes the other way and doesn't figure out what is important but simply assumes that everything that appears all the time is unimportant (like "and"). Alternatively you could also write an additional program to automatically scan the internet for lots of input and that always considers the type of the input when analyzing it (scientific, comedic, etc. and all kinds of subtypes). But that also seems like a lot of work and very processing intensive, not to speak of the difficulty of composing a sufficiently huge list of genres and the ton of contingencies you'd have to consider.


Sign In
Create Account

Back to top









