I want to pull text strings off a website, and write them to a .txt file or even a spreadsheet.
Does anyone have an idea of how to got about this?
I will be very grateful :)
-----------
Damsel in distress
Does any know how pull text strings off of a website?
Started by roxygirl123, Dec 08 2011 09:16 PM
4 replies to this topic
#1
Posted 08 December 2011 - 09:16 PM
|
|
|
#2
Posted 09 December 2011 - 02:19 AM
You can do it super easy with urllib and urllib2 modules. To parse text you can do it by hand or use regexes (regular expressions).
import urllib import re URL = "http://www.xkcd.com" for line in urllib.urlopen(URL).readlines(): if "<title>" in line: print line[9:-9] # print without html tags break regex = re.compile(r"\<title\>(.*?)\</title\>") for line in regex.findall(urllib.urlopen(URL).read()): print lineAbove code gets the latest comic title from xkcd.
A conclusion is where you got tired of thinking.
#define class struct // All is public.
#3
Posted 09 December 2011 - 01:23 PM
I'd use urllib2 and you can use HTMLParser to parse the html.
#4
Posted 09 December 2011 - 02:40 PM
Forgot to mention earlier, there's also a BeautifulSoup module for parsing html.
A conclusion is where you got tired of thinking.
#define class struct // All is public.
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users


Sign In
Create Account

Back to top









