Anyone has experience with doing up a basic search engine with web crawling abilities? Would like to discuss and hear your views on how you would get started on something like that, probably on a small scale basis
My current thinking is this -
Basically need a search form and a search-action form and let it run through your index (which of course we would start building from scratch just to test - and later use a web crawler to find more information to add to the index). Eventually this would be done dynamically.
What I know now is I need the following:
1. Search Form
2. Search Action form (Including design of the sql query)
3. Database design
4. Database (Index - am I right?)
5. Retrieve and show results (organization and order of links)
Im still quite lost on certain parts (such as how the web crawler would dynamically update new websites)
I understand that this would be something really tedious and my inadequate programming knowledge might not suffice as of now, but would like to hear from guys how you would start on such a project. Would appreciate any help.
Search Engine development
Started by solidlink, Aug 19 2010 11:09 PM
3 replies to this topic
#1
Posted 19 August 2010 - 11:09 PM
|
|
|
#2
Guest_johnny.dacu_*
Posted 20 August 2010 - 08:24 AM
Guest_johnny.dacu_*
I'm not a PHP expert, but i imagine the crowling part a CURL and a cronjob. And domDocument perhaps, to read html structure. Index the primary page, and navigate on other pages from domain, index as well... and so on
#3
Posted 21 August 2010 - 04:50 AM
i think i got a little idea of it. but still pretty lost as im not very proficient in PHP. However the navigation of other pages from the same domain is a little tough, anyone has any idea on that?
PHPCrawl - Webcrawler Class
PHPCrawl - Webcrawler Class
#4
Posted 24 August 2010 - 05:40 AM
anyone keen to joint develop a basic search engine or keen to help? please pm me. i know it's still a far fetched dream at the moment but i dont really want a good domain to go to waste.


Sign In
Create Account

Back to top









