Edited by HapHazard, 19 April 2011 - 01:02 PM.
4 replies to this topic
#1
Posted 18 April 2011 - 04:37 PM
I have been searching for what feels like forever. So for class I need to have a data set of anything (other than Wikipedia or twitter) that is at least 10 GB (uncompressed) in size. Also the data needs to be in some plain text format or something easy to parse. I found 20 GB for a dump from wikileaks but there wasn't enough plain text stuff, just lots of pdf(image type) files. Any help would be awesome. Thanks in advance.
-HapHazard
|
|
|
#2
Posted 18 April 2011 - 06:15 PM
You can try some government websites, but 10GB of TEXT is a massive amount of data.
#3
Posted 18 April 2011 - 07:54 PM
What kind of data? Is it supposed to be random or non-random? Could you just generate this data?
Wow I changed my sig!
#4
Posted 18 April 2011 - 08:35 PM
I looked earlier but could not find anything simple to find, there are always NASA/biomedical/scientific data sets around, often 10-500MBs zipped, many times larger opened.
Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.
#5
Posted 19 April 2011 - 01:02 PM
So I found some large data on the infochimps website its about song information, thanks for the help. Now for changing the format, wish me luck.
-HapHazard
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users


Sign In
Create Account


Back to top









