Jump to content

Need help finding a large data set

- - - - -

  • Please log in to reply
4 replies to this topic

#1
HapHazard

HapHazard

    Newbie

  • Members
  • PipPip
  • 10 posts
I have been searching for what feels like forever. So for class I need to have a data set of anything (other than Wikipedia or twitter) that is at least 10 GB (uncompressed) in size. Also the data needs to be in some plain text format or something easy to parse. I found 20 GB for a dump from wikileaks but there wasn't enough plain text stuff, just lots of pdf(image type) files. Any help would be awesome. Thanks in advance.

Edited by HapHazard, 19 April 2011 - 01:02 PM.

-HapHazard

#2
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others
You can try some government websites, but 10GB of TEXT is a massive amount of data.
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#3
ZekeDragon

ZekeDragon

    Writes binary right handed and hex left handed

  • Moderators
  • 2,103 posts
What kind of data? Is it supposed to be random or non-random? Could you just generate this data?
Wow I changed my sig!

#4
Alexander

Alexander

    It's Science!

  • Moderators
  • 4,118 posts
  • Location:Vancouver, Eh! Cleverness: 200
I looked earlier but could not find anything simple to find, there are always NASA/biomedical/scientific data sets around, often 10-500MBs zipped, many times larger opened.
Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.

#5
HapHazard

HapHazard

    Newbie

  • Members
  • PipPip
  • 10 posts
So I found some large data on the infochimps website its about song information, thanks for the help. Now for changing the format, wish me luck.
-HapHazard




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users