Blog

Blog Categories

Curious about neural networks and deep learning? This post will inspire you to get started in deep learning. Why are we witnessing this kind of build up for neural networks? It is because of their amazing applications. Some of their applications include image classification, face recognition, pattern recognition, automatic machine translation, and so on. So, let’s get started now. Machine Learning is a field of computer science that...
The open-source project R is among the leading tools for data science and machine learning tasks. Given its open-source framework, there are continuous contributions, and package libraries with new features pop up frequently. Currently, the CRAN package repository features 12,525 available packages. This post takes a look at the most popular and useful packages that have set the standards for solving data manipulation, visualization, and...
GBM is a highly popular prediction model among data scientists or as top Kaggler Owen Zhang describes it: "My confession: I (over)use GBM. When in doubt, use GBM." GradientBoostingClassifier from sklearn is a popular and user friendly application of Gradient Boosting in Python (another nice and even faster tool is xgboost). Apart from setting up the feature space and fitting the model, parameter tuning is a crucial task in...
AI Took My Job! Ken Jennings’ name is vaguely familiar to people, but why? Because his profound knowledge on all things trivial led to him being the unbeatable champion of a TV game show called Jeopardy! It also put him in the gunsights of IBM. They spent thousands of hours, invested millions of dollars, all just to build a machine named WATSON that could defeat him playing that TV-derived game. See how Ken deals with the...
Demand for professionals in data science and analytics is expected to rise significantly over the next years (cf.  this study  by IBM). In order to keep track of future job trends, we started the DataCareer Job Market Index (DJMI) in July 2017. We track job openings on the biggest online job board,  Indeed , in the fields of data science and analytics, data engineering, business intelligence, artificial intelligence and...
  Currently, Python and R are the dominating data science tools and Python will probably even be taking the lead (at least based on the latest KDNuggets survey ). When did the two open source players manage to become the leading platforms for analytics, data science, and machine learning, leaving behind established players such as Matlab or SAS? Here are some insights from Google Trends. Looking at the years 2009 - 2013 in the...
Big Data, AI and Machine Learning are today's buzzwords. Data nerds, business executives and politicians alike are talking about data-related opportunities and potential risks. But since when has this been the case and how have data-related interests developed over time? We've looked into this question using Google Trends data.  Google searches reveal people's interests Google search queries have become a powerful tool to...
For individuals, businesses and research institutes working with emerging technologies, it is important to follow and shape societal debates revolving around their field. Sooner or later, societal debates are likely to translate into political action, which may greatly impact work on emerging technologies – for better or worse. Also, if research institutes and businesses aim for more than research results and profit, they’re...
Image recognition has been a major challenge in machine learning, and working with large labelled datasets to train your algorithms can be time-consuming. One efficient approach for getting such data is to outsource the work to a large crowd of users. Google uses this approach with the game “Quick, Draw!” to create the world’s largest doodling dataset, which has recently been made publicly available . In this...
Much has been written on the most popular software and programming languages for Data Science (recall, for instance, the infamous “Python vs R battle”). We approached this question by scraping job ads from Indeed and counting the frequency at which each software is mentioned as a measure of current employer demand. In a recent blog post , we analyzed the Data Science software German employers want job applicants to know...
Social science researchers collect much of their data through online surveys. In many cases, they offer incentives to the participants. These incentives can take the form of lotteries for more valuable prices or individual gift card codes. We are doing the latter in our studies here at CEPA Labs at Stanford. Specifically, our survey participants receive a gift card code from Amazon.     However, sending these gift card...
Learning new programming languages is an investment in human capital. Figuring out the return on investment can thus be very informative. There are very specific requirements for each industry and specific job, and finding a generalizable answer to the question proves quite difficult. One approach is to analyze the required software skills in job postings, which reflect current demand and may therefore indicate general return on investment....