Jump to content


Check out our Community Blogs

Register and join over 40,000 other developers!


Recent Status Updates

View All Updates

Photo
- - - - -

downloading images


  • Please log in to reply
21 replies to this topic

#1 Hot_Milo23

Hot_Milo23

    CC Addict

  • Just Joined
  • PipPipPipPipPip
  • 104 posts

Posted 11 June 2009 - 04:55 AM

hey all,
got a small problem and im not really sure where to start with it. (i havent used the html libraries before). I quite enjoy the internet comic called ansems retort (some of you may know of it? or not?). and i would like a way to be able to view them offline. so i started going through each one methodically copying and pasting, got bored pretty quickly. so im looking for a way to do it easier?

if this helps each picture is held on a page that follows like this:
ansemretort.org/ansemretort/index.html?comic=x
x being the number (up to 529 atm).
also each strip is named : "Comicx.png" (x being the number again.)

if possible could someone show me a way to accomplish this?
at first i just wanted it for convenience, now i want to use it as a learning opportunity?

thanks in advance!
  • 0

#2 Aereshaa

Aereshaa

    CC Devotee

  • Just Joined
  • PipPipPipPipPipPip
  • 638 posts

Posted 11 June 2009 - 07:17 AM

Well, I'm not too familiar with python, but here's the way I would do it in pseudocode:
-while there's more comics to fetch:
--fetch url "blah.com/comic=" + num.to_s into string
--write string to file "comic" + num.to_s
-end while.
If someone proficient in python could translate this into python in should work.
  • 0
Watches: Nanoha, Haruhi, AzuDai. Listens to: E-Type, Dj Melodie, Nightcore.
"When people are wrong they need to be corrected. And then when they can't accept it, an argument ensues." - MeTh0Dz

#3 psam

psam

    CC Regular

  • New Member
  • PipPipPip
  • 35 posts

Posted 21 June 2009 - 12:27 PM

I wrote a python script that'll download the comics up to the edition 531 ( which i believe to be the last one right now ) into the folder you save it. I can't post the code because it contains the download link and my post count is less than 10 so i attached it to this message.

Attached Files

  • Attached File  pic.zip   279bytes   78 downloads

  • 0

#4 Hot_Milo23

Hot_Milo23

    CC Addict

  • Just Joined
  • PipPipPipPipPip
  • 104 posts

Posted 29 June 2009 - 11:09 PM

haha, wow
thx psam
your a legend! :D
preciate it
  • 0

#5 psam

psam

    CC Regular

  • New Member
  • PipPipPip
  • 35 posts

Posted 30 June 2009 - 07:44 AM

No problem.
Any time you need ;).
  • 0

#6 Hot_Milo23

Hot_Milo23

    CC Addict

  • Just Joined
  • PipPipPipPipPip
  • 104 posts

Posted 27 September 2009 - 05:20 AM

im sure this page is very dead by now but if u are still around psam, i would like your help again with a similar problem.

analyzing the code u used last time i see you didnt use the web address at all, u used the address of where the pictures were stored. i was just wondering how u knew where this was, and if u know how to do it again (with "Pokemon x" comics).

so if u still frequent this site psam, i would appreciate your help.
or if anyone else could have a look into this for me??

thx in advance guys :D
  • 0

#7 Davison

Davison

    CC Lurker

  • Just Joined
  • Pip
  • 9 posts

Posted 28 September 2009 - 03:13 PM

I slightly modified the previous posters code, to make it so you can define which comic you wish to start downloading from, and which you wish to finish with (i.e. you know you have not read 25 or 26, you simply run this from command prompt in windows with the code "spam and eggs.py" 25 26 (with 'spam and eggs.py' being the script name)


from urllib import urlretrieve
import sys

n = int(sys.argv[1])
finish = int(sys.argv[2])
while n < finish:
url = 'urlurlurlurlurlurlurl' + str(n) + '.png'
name = 'comictitlehere' + str(n) + '.PNG'
file = open(name, 'w')
urlretrieve(url, name)
print 'downloading %s NOW' % (url)
file.close()
n += 1


*It says urlurl...as i cannot post links.

To retrieve the image url, the simplest possible method is
  • Right click the picture
  • Copy the image url
  • Paste to your address bar/empty txt file

If the url is along the lines of 'comic/500.png' or similar, you are fine.

However my code will not work if the comic you use has the date it was posted as the name, a-la Ctrl-Alt-Del.

I would probably be able to find a solution to this, but it is midnight and i have university tomorrow, i'll try and update tomorrow night with any solution for the problem of date.

Hope i've helped.
  • 0

#8 Davison

Davison

    CC Lurker

  • Just Joined
  • Pip
  • 9 posts

Posted 29 September 2009 - 09:01 AM

from urllib import urlretrieve
import sys,os

def main():
n = int(sys.argv[1])
while 1:
url = 'urlurlurlurlurl' + str(n) + '.png'
name = 'comic-' + str(n) + '.PNG'
file = open(name, 'w')
urlretrieve(url, name)
file.close()
if(os.stat(name)[6] < 10000):
print 'Updated to Comic', str(n-1)
break
print 'downloaded %s' %url
n += 1
os.remove(name)

main():


This is some mildly updated script.

Added the os.stat function

This means that now, you enter your start comic number in the command line and the program will get any subsequent comics until it stores an image of less than 10000 bytes(can be changed to any value you like, this is just an example), where it will then exit the program and delete this sub-10000byte image.

Brushing up on regular expressions just now to handle the issue of date-url comics.
  • 0

#9 Davison

Davison

    CC Lurker

  • Just Joined
  • Pip
  • 9 posts

Posted 29 September 2009 - 01:20 PM

I'm an idiot, regular expressions are not needed

I love pythons included libraries btw, and this code variant is for sites with the pattern

yyyymmdd.jpg
can change the url and the extension to fit...

from urllib import urlretrieve
import sys,os,datetime,time

arg = sys.argv

one = sys.argv[1]
yr = one[:4]
mo = one[4:6]
dy = one[6:]

two = sys.argv[2]
yr2 = two[:4]
mo2 = two[4:6]
dy2 = two[6:]

start = datetime.date(int(yr),int(mo),int(dy))
end = datetime.date(int(yr2),int(mo2),int(dy2))

print 'Start: ',start,' End: ',end

while start <= end:
site = 'urlurlurlurlurl'
if start.month < 10:
month = '0' + str(start.month)
else:
month = str(start.month)
if start.day < 10:
day = '0' + str(start.day)
else:
day = str(start.day)
url = site + str(start.year) + month + day + '.jpg'

name = 'ctrlaltdel - ' + start.strftime("%Y%m%d") + '.jpg'
file = open(name, 'w')
urlretrieve(url, name)
file.close()

if(os.stat(name)[6] < 20000):
os.remove(name)
else:
print 'downloaded',url
start = start + datetime.timedelta(1)



Process:
Get startdate and enddate from command line
Get site for start date, check if data more than 20000 bytes
If bytes less than 20000bytes...delete
Add 1 day and repeat

edit: Made it that start <= end, so that if there is a comic on the end date, it will also be downloaded
  • 1

#10 Hot_Milo23

Hot_Milo23

    CC Addict

  • Just Joined
  • PipPipPipPipPip
  • 104 posts

Posted 05 October 2009 - 02:08 AM

Davidson,
you have been an awesome help, but for some reason it still wont work?
ive used urlretrieve to download the google logo. ive used it to download every file type (including png, which is what the comic is saved as). Both worked, but when i try to do the exact same thing with the comic, it wont??

im stumped?

thx for the help tho :D
  • 0

#11 debtboy

debtboy

    CC Devotee

  • Just Joined
  • PipPipPipPipPipPip
  • 499 posts

Posted 05 October 2009 - 08:18 AM

Good to see some Python :thumbup1:
  • 0

#12 Davison

Davison

    CC Lurker

  • Just Joined
  • Pip
  • 9 posts

Posted 06 October 2009 - 12:20 PM

Can you post the code the way you use it, and also can you post the site for me, if possible, so i can see the format
  • 0




Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download