Blog

Blog Categories

I’m sure everybody who worked with Python and a PostgreSQL database is familiar or definitely heard about the psycopg2 library. It is the most popular PostgreSQL database adapter for the Python programming language. In my work, I come in contact with this library every day and execute hundreds of automated statements. As always, I try to improve my code and execution speed, because, in the cloud, time is money. The longer your code runs...
Overview Lately, I worked a lot with the Azure Cloud. Overall I have to say Azure offers a lot but is still not on the same level as its hardest competitors (AWS, Google). One thing that caught my eye is the compatibility of certain programming languages. Azure supports a few different languages (C#, JavaScript, Java, Python, etc.) but the supported features for these languages differ a lot. I think Azure Cloud is really great for...
The first step to land a job as a data scientist is the same as in any other profession: create a compelling CV! Although there are more open positions in the field of data science than ever before it is important to have a strong and suitable CV in order to land the job you want. This post gives you nine tips in order to improve your chances to find the data science job you want. 1. Include achievements, not just jobs “I...
Use your extra time at home (and your data skills) for a good cause: Check out the  Kaggle  COVID-19 Open Research Dataset Challenge. In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). This dataset is a resource of over 29,000 scholarly articles, including over 13,000 with full text, about COVID-19, SARS-CoV-2, and...
Based on the public perception and the general interest in the topic, data science is still on the rise. Google Trends offers an excellent fever curve of the public interest. The screenshot below shows that after years  with a strong rise in interest, the curve has flattened - but is still increasing. With the increasing use of data in companies and organizations, also the demand for data skills has seen a rapid growth and a new field...
Der erste Schritt, um eine Stelle als Data Scientist zu bekommen, ist derselbe wie in jedem anderen Beruf: Erstellen Sie einen überzeugenden Lebenslauf! Obwohl es im Bereich der Data Science mehr offene Stellen gibt als je zuvor , ist es wichtig, einen aussagekräftigen und passenden Lebenslauf zu haben, um den gewünschten Job zu bekommen. Dieser Blog gibt Ihnen neun Tipps, um Ihre Chancen zu verbessern, die von Ihnen...
  For many machine learning problems with a large number of features or a low number of observations, a linear model tends to overfit and variable selection is tricky. Models that use shrinkage such as Lasso and Ridge can improve the prediction accuracy as they reduce the estimation variance while providing an interpretable final model. In this tutorial, we will examine Ridge and Lasso...
   A common and very challenging problem in machine learning is overfitting, and it comes in many different appearances. It is one of the major aspects of training the model. Overfitting occurs when the model is capturing too much noise in the training data set which leads to bad predication accuracy when applying the model to new data. One of the ways to avoid overfitting is regularization technique. In this tutorial, we...
Today, data science specialists are among the most sought-after in the labor market. Being able to find significant insights in a huge amount of information, they help companies and organizations to optimize the structure's work.  The field of data science is rapidly developing and the demand for talent is changing. We use job offerings to analyze the current demand for talent. After performing a  first analysis in 2017 , this...
   Random Forest is a powerful ensemble learning method that can be applied to various prediction tasks, in particular classification and regression. The method uses an ensemble of decision trees as a basis and therefore has all advantages of decision trees, such as high accuracy, easy usage, and no necessity of scaling data. Moreover, it also has a very important additional benefit, namely perseverance to overfitting (unlike...
For the past few years, tasks involving text and speech processing have become really hot-trendy. Among the various researches belonging to the fields of Natural Language Processing and Machine Learning, sentiment analysis ranks really high. Sentiment analysis allows identifying and getting subjective information from the source data using data analysis and visualization, ML models for classification, text mining and analysis. This helps...
The way other people think about one or another product or service has a big impact on our everyday process of making decisions. Earlier, people relied on the opinion of their friends, relatives, or products and services reposts, but the era of the Internet has made significant changes. Today opinions are collected from different people around the world via reviewing e-commerce sites as well as blogs and social nets. To transform gathered...