Datasets

We are excited about analyzing human behavior! Here, we list freely available datasets of any dimension of human behavior (and any other fascinating dataset we came across). Let us know if we are missing something!

Go-to pages for datasets

Kaggle

Kaggle offers an impressive range ob datasets. Credit card fraud, mobile phone apps, football results or crime rates in Chicago... Kaggle has it all. The page offers more than 500 datasets, challenging data competitions and many other features. 

Open Data Network

The Open Data Network by Socrata offers a vast collection of datasets nicely categorized by topic on their page. They cover many datasets by governmental and international organizations. 

data.world

data.world collects various datasets and gives you the option to upload your datasets. Additionally it aims to work as a social network for data scientists. 

Datasets for specific topics

Deep Learning

  • A list of datasets on Deep Learning that can be used for benchmarking algorithms
  • List on Wikipedia with datasets for training algorithms for image, text and sound recognition
  • Handwriting, housenumbers or face expressions... Christos Christofidis collected them all
  • Rich collection of datasets for face, image, speach and text recognition on deeplearning4j

Economics

  • Worldbank Microdata database containing mainly survey data for individuals, households and enterprises from more than 140 countries 
  • Our World in Data: Countless fascinating time series that show how living conditions around the world are changing
  • FRED St. Louis: Great data portal for macro data; mainly focused on the US

Finance

Microlevel datasets

Financial markets

  • Kenneth French's data library contains all you need for risk factors of assets
  • Quandl: A free alternative to Bloomberg or Datastream
  • Stock price data from Yahoo finance or Google finance
  • Interested in programming your own investment algorithm? Quantopian offers you the data and the minute by minute data and the infrastructure.

 

Geodata & Climate Science

  • Fine gridded satellite data on night light activity by NOAA
  • Nightlight activity data 1992 - 2013
  • NASA satellite data on air quality (huge datasets!)
  • TRMM Satellite by NASA provides fine gridded data of temperature and rain for tropical regions (data available for 1997 - 2015 (mission end))
  • Google Earth Engine: A great list of geo datasets (satellite images, climate, population, malaria...)
  • The European Space Agency (ESA) has an open source policy and provides numerous datasets from several satellites

Germany 

  • Federal Statistics Office: All kinds of statistics on Germany; usually on an aggregate level
  • Datasets on the German transportation system (e.g. Bahn) 
  • A Kaggle dataset to estimate credit risk among German customers

History 

Psychology & Social Sciences

Sports

  • Interested in sports predictions? Check the datasets of FiveThirtyEight on github
  • football.db: Freely available football datasets, the perfect equipment for the next world cup office pool

Other

  • Enron email dataset: You always wanted to know the secrets of Enron managers prior to the bankruptcy? Check here 1,227,255 emails
  • Million Song Dataset on Amazon
  • Interested in what people google? Google Trends
  • Analyze in what people are currently interested and how it develops over time: Wikipedia visitor traffic is a great tool. The handy tool by the community allows to download specific time series. For a direct download to R use the package wikipediatrend.

 

 Back