Jump to content


Check out our Community Blogs

Register and join over 40,000 other developers!


Recent Status Updates

View All Updates

Photo
- - - - -

downloading images


  • Please log in to reply
21 replies to this topic

#13 Hot_Milo23

Hot_Milo23

    CC Addict

  • Just Joined
  • PipPipPipPipPip
  • 104 posts

Posted 06 October 2009 - 05:38 PM

ok, i'll just test to see if i can post urls yet:

Pokmon-X - Comic Archives - Monday, June 2, 2003

Edited by Hot_Milo23, 06 October 2009 - 05:41 PM.
grammar

  • 0

#14 Hot_Milo23

Hot_Milo23

    CC Addict

  • Just Joined
  • PipPipPipPipPip
  • 104 posts

Posted 06 October 2009 - 05:48 PM

ok, so here goes:
i tried to download the google image with this:

from urllib import urlretrieve
url="http://www.google.com.au/logos/barcode09.gif"
name="img1.gif"
file = open(name, 'w')
urlretrieve(url, name)
file.close()

and it works.
i did the same thing for the comic i wanted:
from urllib import urlretrieve
url="http://pokemonx.comicgenesis.com/comics/20030602.png"
name="img1.png"
file = open(name, 'w')
urlretrieve(url, name)
file.close()

and this one wont work.
any ideas?
oh, and i was using your code without the sys.argv because i prefer to use idle:
from urllib import urlretrieve
import sys,os,datetime,time

arg = sys.argv

one = raw_input("Start date?(yyyymmdd):")
yr = one[:4]
mo = one[4:6]
dy = one[6:]

two = raw_input("End date?(yyyymmdd):")
yr2 = two[:4]
mo2 = two[4:6]
dy2 = two[6:]

start = datetime.date(int(yr),int(mo),int(dy))
end = datetime.date(int(yr2),int(mo2),int(dy2))

print 'Start: ',start,' End: ',end

while start <= end:
    site = 'http://pokemonx.comicgenesis.com/comics/' 
    if start.month < 10:
        month = '0' + str(start.month)
    else:
        month = str(start.month)
    if start.day < 10:
        day = '0' + str(start.day)
    else:
        day = str(start.day)
    url =  site + str(start.year) + month + day + '.png'

    name = 'Pokemon X - ' + start.strftime("%Y%m%d") + '.png'
    file = open(name, 'w')
    urlretrieve(url, name)
    file.close()

    if(os.stat(name)[6] < 10000):
        os.remove(name)
    else:
        print 'downloaded',url
    start = start + datetime.timedelta(1)  

Edited by Hot_Milo23, 06 October 2009 - 05:51 PM.
added more code

  • 0

#15 Davison

Davison

    CC Lurker

  • Just Joined
  • Pip
  • 9 posts

Posted 12 October 2009 - 05:27 AM

Hey, can you post the website you are trying to use, and i will see if i can solve the issue. May just be a slight formatting error for this specific website.
  • 0

#16 Davison

Davison

    CC Lurker

  • Just Joined
  • Pip
  • 9 posts

Posted 12 October 2009 - 10:43 PM

Hmm...i'll have a look at the code and see if i can see what the error is.
The code seems like it should be fine, but i'll debug anyway
  • 0

#17 Hot_Milo23

Hot_Milo23

    CC Addict

  • Just Joined
  • PipPipPipPipPip
  • 104 posts

Posted 13 October 2009 - 05:46 AM

Posted via CodeCall Mobile

i have posted the site. Its above my code. "ok, i.ll just test.... "
  • 0

#18 Davison

Davison

    CC Lurker

  • Just Joined
  • Pip
  • 9 posts

Posted 14 October 2009 - 10:00 AM

Ah, i've found out whats wrong with the code, i believe.
I ran it through command line using a rather simplified code, using f.open() to print data to the screen, rather than saving to file.

It is giving an error about anti-hotlinking, so i believe that the picture may be embedded in some way, and that the url we are trying is simply getting pointed to a re-director, which it cannot handle.
  • 0

#19 Davison

Davison

    CC Lurker

  • Just Joined
  • Pip
  • 9 posts

Posted 14 October 2009 - 10:02 AM

p.s. As of yet, im not quite sure how to solve it, but i believe this explains the problems we are having.
  • 0

#20 manux

manux

    CC Addict

  • Just Joined
  • PipPipPipPipPip
  • 211 posts

Posted 16 October 2009 - 11:00 AM

What you can do is parse the page for the image and directly spawn a get query after you've parsed it so the query seems legitimate, and not hotlinking.
  • 0

#21 Hot_Milo23

Hot_Milo23

    CC Addict

  • Just Joined
  • PipPipPipPipPip
  • 104 posts

Posted 19 October 2009 - 02:57 AM

can you maybe show me an example of how to do this? That would be great ^-^
  • 0

#22 manux

manux

    CC Addict

  • Just Joined
  • PipPipPipPipPip
  • 211 posts

Posted 24 October 2009 - 10:57 AM

well, using a regex such as
<img[^>]*>[^<]*</img>
(might not work, I havent tested it) you could scan for images, I guess you can then check if the image corresponds to what you're looking for, e.g., if its href starts with comicxxx.
You could just parse all the pages.
  • 0




Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download