Jump to content

Check out our Community Blogs

- - - - -

Programming challenge - Related Links

Posted by Roger, 04 September 2010 · 914 views

I have a PHP programming challenge.

I can probably write this out in a few hours, but I wanted to see there is any interest from the members here. It's not that difficult, but I am looking for an elegant solution.

Background: I was doing some research on Google Labs and saw that they have a cool tool (Google Related Links - Overview) which displays multiple "related links" to a particular page. I tried to sign up for it (for use with CodeCall and my other site), but have not heard back from the Google folks. It appears from the message board that it doesn't get too much support. On my way to work today, I realized that it's actually not that hard to implement...

What I know already:
- If I am trying to find related links on Google for:

- I can make a Google request like:

- I can then narrow it down to CodeCall URLs:
site:codecall.net related:http://forum.codecall.net/c-c/31885-do-while-loop-got-problem.html

- I need to call this from within PHP...Here are a few links that show how it can be done (Google API Example with PHP, SOAP, and Web Services, Developer's Guide - Google AJAX Search API - Google Code)

- I also need the responses to be stored in a DB.. indexed by the URL or an index.

- The sites have a large number of URLs, so the DB code will need to be optimized.

- It would be nice if itcan show just the links, or the links with a brief description.

- It would be nice if it can also show the "related searches" (see Google Related Links - Overview). But that's not required.

- This program should be independent of the site. It should be plug and play on any site (1. install DB, 2. install code on appropriate pages and 3. see related links)

- I haven't given much though to formatting the results, but am open to suggestions and ideas...

Please let me know what you think.

  • 0

Question: What would be the use of a database? It would eliminate the dynamic purpose of the related posts/addresses, as the similar results will be static.
    • 0
Good question - the DB is needed for sites that have a large number of (1) pages and (2) pageviews per day.

(1) For sites with a lot of pages (100K+), this DB would serve as a cache for the previous queries. This would improve performance at the sacrifice of freshness. In my case, the pages don't change that much, so caching the results in a DB would be helpful. (nice-to-have feature) It would be nice to be able to set some sort of refresh timer (ex. 1D, 1W, 1M, 1Y)

(2) For sites that get alot of traffic, the API will be limited by Google's query limits per day (last I read, it was 5000 queries per day). Since the results are cached - in (1) above, the API can just query the DB first before querying Google for the latest results. Again, this is a trade-off that will be acceptable for my use. (nice-to-have feature) Since the API does have a # limit, the API should gracefully handle errors (and error messages). For example, once the API starts to error out, we can set a timer or counter to throttle the query frequency until the next successful query. This will prevent Google from banning an IP too quickly for a very high traffic site.

About the indexing options, here is some more color... Some sites have an index embedded in URL, which can be extracted and used for the DB. Some sites don't (like some CodeCall URLs), so the DB will need to store the whole URL (or a hash of the URL). I'm open to implementation (combine the tables, separate tables, etc.)...

Any other questions?
    • 0
Found a good site for this: PHP Tip: Add Custom Google Search Results to Your Site with PHP | Dev Tips | Become a Better Developer, One Tip at a Time.

It doesn't include the DB part, but it's pretty close to what I'm looking for.
    • 0
I'll see what I can do in my time, this project will be practise in client side stuff I seem to lack skill at.
    • 0
Most of the processing can happen on the server side. I implemented the code on one of my sites yesterday - without the DB portion yet. My next step is to save the output in the DB so that subsequent calls won't query Google repeatedly.
    • 0
Alright I finished up what I could, I will dump the description portion of my readme:
Notable features:
- Caching IDs are speedy 4 byte (single unsigned int) CRC32 checksums of object title
- All caching aspects contain 64-bit compatible-code
- Cache hit invalidation timeout defined in seconds
- Uses PHP database abstraction object (PDO) for performance and security
- Complies with all Google API terms of service (Referrer set, search version defined, etc)

Cache results are stored:
- In MySQL Zlib GZ format (1/2-1/8 size), No PHP<->MySQL GZip overhead.
    -10000 320KB cache requests = 3.2MB total row size
- In VARBINARY fields for compressed fields, faster average seek rate than VARCHAR/BINARY

Other features:
- Cleanup: Delete all LRU (least recently used, or expiry * 20) cache objects over threshold
- Friendly setup with sanity checks and option to overwrite prior install
- Handles API lookup limit and lookup errors gracefully
- Fully profiled code and essentials optimization (although code is simple)
The project files (excuse the shorturl) are located here: http://bit.ly/alcPd5

I wasn't so comfortable doing related searches below, as there are just too many odd variables to take into account.
    • 0
@Nullw0rm: Sorry for not getting back to you.. vB didn't send me an email notification that something was posted here.
I have downloaded the code (and it looks really nice) and will be reviewing it tomorrow... After that I'll try to install it over the weekend and provide some feedback.
    • 0
Got the code installed. Thanks for the effort. I also sent you the hosting information for the free account (as promised). I hope you enjoyed this exercise and enjoy the free account.
    • 0
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download