Learning new programming languages is an investment in human capital. Figuring out the return on investment can thus be very informative. There are very specific requirements for each industry and specific job, and finding a generalizable answer to the question proves quite difficult. One approach is to analyze the required software skills in job postings, which reflect current demand and may therefore indicate general return on investment. We downloaded all data science related job posts for Germany on Indeed to obtain a rough idea of the popularity of each software on the German labor market.
We proceeded by scraping all job listings in Germany for data science related keywords (Data Scientist, Data Analyst, Big Data, Machine Learning) in June 2017. Even though this only covers the current postings, job ads are usually kept up for several weeks, which allows us to assume that we’re getting an accurate reflection of current market demand. The search covers 2807 job postings, but only 70% of the postings specifically state a software or a programming language. Why do so many postings come without specific requirements? Often employers only state general requirements (e.g. knowledge in machine learning or data analytics) and some of the job postings overlap with data science (and thus mention it) but do not explicitly filter for fully fledged data scientists.
We searched the 2807 job postings for the 25 most popular data science softwares. 1971 job postings mention at least one of the softwares, with many listing several. The figure below shows the number of job posts mentioning a specific software. With about 1000 job posts, SQL is the most popular software, followed by Python with around 900 mentions and Java with 670.
Is the above distribution unique to Germany or does it reflect worldwide trends? Robert Muenchen performed the same analysis for the US market. Places one to three turn out identical: SQL (18,000 jobs), Python (13,000 jobs) and Java (13,000 jobs) dominate the market. Some differences exist further down: E.g., SAP is more popular in the German (6th) than on the US market (12th). But the two graphs are very similar overall, confirming that software trends are global and demands shaped by the technological frontier.
If you are new to data science or thinking about moving into the field, the analysis gives you a decent idea about which programming skills are likely to be particularly valuable in the near future. SQL (still) being in high demand could be a signal that many companies don’t just expect skills in data analysis but a smooth interaction with databases too. Python seems on the rise. Robert Muenchen shows that Python’s popularity has been greatly increasing over the past three years, with its growth outpacing the other big open source player R. Traditional software such as SAS is stagnating. Overall, a combination of strong analytical skills in Python and R with solid knowledge of SQL looks like a great foundation for a career in the growing field of data science.
Performed a fascinating analysis you’d like to publish and share? Found a cool dataset that should be featured on our page? Contribute to our blog!