Jump to content

Search Engine development

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
3 replies to this topic

#1
solidlink

solidlink

    Newbie

  • Members
  • Pip
  • 3 posts
Anyone has experience with doing up a basic search engine with web crawling abilities? Would like to discuss and hear your views on how you would get started on something like that, probably on a small scale basis

My current thinking is this -

Basically need a search form and a search-action form and let it run through your index (which of course we would start building from scratch just to test - and later use a web crawler to find more information to add to the index). Eventually this would be done dynamically.

What I know now is I need the following:
1. Search Form
2. Search Action form (Including design of the sql query)
3. Database design
4. Database (Index - am I right?)
5. Retrieve and show results (organization and order of links)

Im still quite lost on certain parts (such as how the web crawler would dynamically update new websites)

I understand that this would be something really tedious and my inadequate programming knowledge might not suffice as of now, but would like to hear from guys how you would start on such a project. Would appreciate any help.

#2
Guest_johnny.dacu_*

Guest_johnny.dacu_*
  • Guests
I'm not a PHP expert, but i imagine the crowling part a CURL and a cronjob. And domDocument perhaps, to read html structure. Index the primary page, and navigate on other pages from domain, index as well... and so on

#3
solidlink

solidlink

    Newbie

  • Members
  • Pip
  • 3 posts
i think i got a little idea of it. but still pretty lost as im not very proficient in PHP. However the navigation of other pages from the same domain is a little tough, anyone has any idea on that?

PHPCrawl - Webcrawler Class

#4
solidlink

solidlink

    Newbie

  • Members
  • Pip
  • 3 posts
anyone keen to joint develop a basic search engine or keen to help? please pm me. i know it's still a far fetched dream at the moment but i dont really want a good domain to go to waste.