Alright, here's my problem. The Library of Congress has an atrocious site. Navigating the ui is painful enough, but finding pictures is even harder. As of now I've resorted to going directly through the
Apache Index pages to look at each photo. For each photo however there is a .gif copy, and a .tif copy along with the standard jpg copy. For some reason apache doesn't have the option to arrange the files by file type, and sorting through these pictures is almost as annoying using the main site. So what Im hoping to do is write an program to scrape the html of the the page and arrange the links by file type.
And this is what I'm talking about when I say index page...
http://memory.loc.gov/service/pnp/cp...41000/3b41500/
So here's my question:
What language would be best to accomplish this?
And, is this not merely a simple task, and I'd be better off dealing with it.
If that is the case, does anyone know of software that can do this?
Funny you should ask - Last year I actually did something incredibly similar parsing the HTML of the LOC site for ISBN information. I can give you the code and you can modify it, if you like. What operating system do you want this for?
sudo rm -rf /
That'd be awesome. When you say operating system, I'm assuming you are asking what os I'm using. I am using windows....
Perfect. Here you go. Mind you, it's multithreaded, so be careful how many threads you use. I used 64 and crashed the LOC server. Once I stopped my program the site was back up.
I am not responsible if things go wrong.![]()
sudo rm -rf /
Haha, really you crashed it? How would it do that? Simply to many requests? And thanks for this, I really appreciate it.
Apparently it can't handle 64 requests at the same time. Basically an unintentional DoS attack. I scaled it back to 32 and I think that worked.
sudo rm -rf /
I remember the first time I heard about that. Somehow, I can actually believe that. A ddos from a single computer in a campus dorm![]()
Amazing what people can do with their free time. Just to be an idiot I sent two friends an email from another friend (we're all close, so it's okay). It had...some rather...um... I'll just leave it at "there were goats involved."Anyway, I spoofed the headers to make it seem like it came from my victim friend. The others eventually figured it out and came to my room at three in the morning. When I answered the door, they attacked me with a large vibrating dildo. Apparently they found it at a frat house.
sudo rm -rf /
Pretty much any computer connected to a network with a program like sendmail installed can pull it off.
Last edited by dargueta; 03-10-2010 at 12:30 AM. Reason: Added link
sudo rm -rf /
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks