hey all,
got a small problem and im not really sure where to start with it. (i havent used the html libraries before). I quite enjoy the internet comic called ansems retort (some of you may know of it? or not?). and i would like a way to be able to view them offline. so i started going through each one methodically copying and pasting, got bored pretty quickly. so im looking for a way to do it easier?
if this helps each picture is held on a page that follows like this:
ansemretort.org/ansemretort/index.html?comic=x
x being the number (up to 529 atm).
also each strip is named : "Comicx.png" (x being the number again.)
if possible could someone show me a way to accomplish this?
at first i just wanted it for convenience, now i want to use it as a learning opportunity?
thanks in advance!
downloading images
Started by Hot_Milo23, Jun 11 2009 04:55 AM
21 replies to this topic
#1
Posted 11 June 2009 - 04:55 AM
|
|
|
#2
Posted 11 June 2009 - 07:17 AM
Well, I'm not too familiar with python, but here's the way I would do it in pseudocode:
-while there's more comics to fetch:
--fetch url "blah.com/comic=" + num.to_s into string
--write string to file "comic" + num.to_s
-end while.
If someone proficient in python could translate this into python in should work.
-while there's more comics to fetch:
--fetch url "blah.com/comic=" + num.to_s into string
--write string to file "comic" + num.to_s
-end while.
If someone proficient in python could translate this into python in should work.
#3
Posted 21 June 2009 - 12:27 PM
I wrote a python script that'll download the comics up to the edition 531 ( which i believe to be the last one right now ) into the folder you save it. I can't post the code because it contains the download link and my post count is less than 10 so i attached it to this message.
Attached Files
#4
Posted 29 June 2009 - 11:09 PM
haha, wow
thx psam
your a legend! :D
preciate it
thx psam
your a legend! :D
preciate it
#5
Posted 30 June 2009 - 07:44 AM
No problem.
Any time you need ;).
Any time you need ;).
#6
Posted 27 September 2009 - 05:20 AM
im sure this page is very dead by now but if u are still around psam, i would like your help again with a similar problem.
analyzing the code u used last time i see you didnt use the web address at all, u used the address of where the pictures were stored. i was just wondering how u knew where this was, and if u know how to do it again (with "Pokemon x" comics).
so if u still frequent this site psam, i would appreciate your help.
or if anyone else could have a look into this for me??
thx in advance guys :D
analyzing the code u used last time i see you didnt use the web address at all, u used the address of where the pictures were stored. i was just wondering how u knew where this was, and if u know how to do it again (with "Pokemon x" comics).
so if u still frequent this site psam, i would appreciate your help.
or if anyone else could have a look into this for me??
thx in advance guys :D
#7
Posted 28 September 2009 - 03:13 PM
I slightly modified the previous posters code, to make it so you can define which comic you wish to start downloading from, and which you wish to finish with (i.e. you know you have not read 25 or 26, you simply run this from command prompt in windows with the code "spam and eggs.py" 25 26 (with 'spam and eggs.py' being the script name)
*It says urlurl...as i cannot post links.
To retrieve the image url, the simplest possible method is
If the url is along the lines of 'comic/500.png' or similar, you are fine.
However my code will not work if the comic you use has the date it was posted as the name, a-la Ctrl-Alt-Del.
I would probably be able to find a solution to this, but it is midnight and i have university tomorrow, i'll try and update tomorrow night with any solution for the problem of date.
Hope i've helped.
from urllib import urlretrieve import sys n = int(sys.argv[1]) finish = int(sys.argv[2]) while n < finish: url = 'urlurlurlurlurlurlurl' + str(n) + '.png' name = 'comictitlehere' + str(n) + '.PNG' file = open(name, 'w') urlretrieve(url, name) print 'downloading %s NOW' % (url) file.close() n += 1
*It says urlurl...as i cannot post links.
To retrieve the image url, the simplest possible method is
- Right click the picture
- Copy the image url
- Paste to your address bar/empty txt file
If the url is along the lines of 'comic/500.png' or similar, you are fine.
However my code will not work if the comic you use has the date it was posted as the name, a-la Ctrl-Alt-Del.
I would probably be able to find a solution to this, but it is midnight and i have university tomorrow, i'll try and update tomorrow night with any solution for the problem of date.
Hope i've helped.
#8
Posted 29 September 2009 - 09:01 AM
from urllib import urlretrieve import sys,os def main(): n = int(sys.argv[1]) while 1: url = 'urlurlurlurlurl' + str(n) + '.png' name = 'comic-' + str(n) + '.PNG' file = open(name, 'w') urlretrieve(url, name) file.close() if(os.stat(name)[6] < 10000): print 'Updated to Comic', str(n-1) break print 'downloaded %s' %url n += 1 os.remove(name) main():
This is some mildly updated script.
Added the os.stat function
This means that now, you enter your start comic number in the command line and the program will get any subsequent comics until it stores an image of less than 10000 bytes(can be changed to any value you like, this is just an example), where it will then exit the program and delete this sub-10000byte image.
Brushing up on regular expressions just now to handle the issue of date-url comics.
#9
Posted 29 September 2009 - 01:20 PM
I'm an idiot, regular expressions are not needed
I love pythons included libraries btw, and this code variant is for sites with the pattern
yyyymmdd.jpg
can change the url and the extension to fit...
Process:
Get startdate and enddate from command line
Get site for start date, check if data more than 20000 bytes
If bytes less than 20000bytes...delete
Add 1 day and repeat
edit: Made it that start <= end, so that if there is a comic on the end date, it will also be downloaded
I love pythons included libraries btw, and this code variant is for sites with the pattern
yyyymmdd.jpg
can change the url and the extension to fit...
from urllib import urlretrieve
import sys,os,datetime,time
arg = sys.argv
one = sys.argv[1]
yr = one[:4]
mo = one[4:6]
dy = one[6:]
two = sys.argv[2]
yr2 = two[:4]
mo2 = two[4:6]
dy2 = two[6:]
start = datetime.date(int(yr),int(mo),int(dy))
end = datetime.date(int(yr2),int(mo2),int(dy2))
print 'Start: ',start,' End: ',end
while start <= end:
site = 'urlurlurlurlurl'
if start.month < 10:
month = '0' + str(start.month)
else:
month = str(start.month)
if start.day < 10:
day = '0' + str(start.day)
else:
day = str(start.day)
url = site + str(start.year) + month + day + '.jpg'
name = 'ctrlaltdel - ' + start.strftime("%Y%m%d") + '.jpg'
file = open(name, 'w')
urlretrieve(url, name)
file.close()
if(os.stat(name)[6] < 20000):
os.remove(name)
else:
print 'downloaded',url
start = start + datetime.timedelta(1)
Process:
Get startdate and enddate from command line
Get site for start date, check if data more than 20000 bytes
If bytes less than 20000bytes...delete
Add 1 day and repeat
edit: Made it that start <= end, so that if there is a comic on the end date, it will also be downloaded
#10
Posted 05 October 2009 - 02:08 AM
Davidson,
you have been an awesome help, but for some reason it still wont work?
ive used urlretrieve to download the google logo. ive used it to download every file type (including png, which is what the comic is saved as). Both worked, but when i try to do the exact same thing with the comic, it wont??
im stumped?
thx for the help tho :D
you have been an awesome help, but for some reason it still wont work?
ive used urlretrieve to download the google logo. ive used it to download every file type (including png, which is what the comic is saved as). Both worked, but when i try to do the exact same thing with the comic, it wont??
im stumped?
thx for the help tho :D
#11
Posted 05 October 2009 - 08:18 AM
#12
Posted 06 October 2009 - 12:20 PM
Can you post the code the way you use it, and also can you post the site for me, if possible, so i can see the format


Sign In
Create Account


Back to top










