This leads to my question: what's the purpose of this application? Because it is very unlikely that `http://codecall.net/forum/xxxxxx` will use the same template as `http://codecall.net/topic/xxxx` logically from a human perspective, so, as the author of the machine, if it is your first time encounter `/topic/xxxx` this form, put it in the queue to analyze. Learn it. Then if your machine finds another path exactly of that form, don't put it in the analyzer queue (or / and) instead, just mark it "walk-through only" - that is, finds links only.
That's the way I see it... and I also wonder if compress the two html files would help at all (barebone structures - with name, tags, maybe, definitely not any content strings though). That might be the first step to comparison.