Blog

Making binary annotations less boring

14/08/2021

Introduction For a university project, I’m developing a Music recommendation classifier based on the Spotify API. The idea is to recommend new music to the user, based on songs he personally likes or dislikes and on the musical components of the song (speed, tonality, instrumentality and many more). The preparation of the dataset usually is the most time-consuming part of any machine learning project. This usually consists of gathering...

COVID-19 Dataset Challenge

30/03/2020

Use your extra time at home (and your data skills) for a good cause: Check out the Kaggle COVID-19 Open Research Dataset Challenge. In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). This dataset is a resource of over 29,000 scholarly articles, including over 13,000 with full text, about COVID-19, SARS-CoV-2, and related...

Ridge and Lasso in R

18/02/2020

A common and very challenging problem in machine learning is overfitting, and it comes in many different appearances. It is one of the major aspects of training the model. Overfitting occurs when the model is capturing too much noise in the training data set which leads to bad predication accuracy when applying the model to new data. One of the ways to avoid overfitting is regularization technique. In this tutorial, we will examine...

Data Science & Analytics Job Market in Switzerland in 2019

26/11/2019

Today, data science specialists are among the most sought-after in the labor market. Being able to find significant insights in a huge amount of information, they help companies and organizations to optimize the structure's work. The field of data science is rapidly developing and the demand for talent is changing. We use job offerings to analyze the current demand for talent. After performing a first analysis in 2017 , this article...

Random Forest in R: An Example

19/09/2019

Random Forest is a powerful ensemble learning method that can be applied to various prediction tasks, in particular classification and regression. The method uses an ensemble of decision trees as a basis and therefore has all advantages of decision trees, such as high accuracy, easy usage, and no necessity of scaling data. Moreover, it also has a very important additional benefit, namely perseverance to overfitting (unlike simple...

Sentiment Analysis of Trump Tweets in Python

08/07/2019

The way other people think about one or another product or service has a big impact on our everyday process of making decisions. Earlier, people relied on the opinion of their friends, relatives, or products and services reposts, but the era of the Internet has made significant changes. Today opinions are collected from different people around the world via reviewing e-commerce sites as well as blogs and social nets. To transform gathered...

Random Forest in Python with scikit-learn

19/12/2018

The random forest algorithm is the combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. It can be applied to different machine learning tasks, in particular, classification and regression. Random Forest uses an ensemble of decision trees as a basis and therefore has all advantages of decision trees, such as high accuracy,...

Top 10 R Pakete für Data Science

11/12/2018

Das Open-Source-Projekt R gehört zu den führenden Tools für datenwissenschaftliche und maschinelle Lernaufgaben. Aufgrund des Open-Source-Frameworks gibt es kontinuierliche Beiträge, und Paketbibliotheken mit neuen Funktionen werden häufig angezeigt. Derzeit verfügt das CRAN-Paket-Repository über 12'525 verfügbare Pakete. Dieser Beitrag wirft einen Blick auf die beliebtesten und nützlichsten Pakete, die die Standards für die Lösung von...

Neural Network: How does it work?

18/05/2018

Curious about neural networks and deep learning? This post will inspire you to get started in deep learning. Why are we witnessing this kind of build up for neural networks? It is because of their amazing applications. Some of their applications include image classification, face recognition, pattern recognition, automatic machine translation, and so on. So, let’s get started now. Machine Learning is a field of computer science that...

Support Vector Machines (SVM) in Python

08/05/2018

Support Vector Machine (SVM) is a widely used supervised learning algorithm for classification and regression tasks. It is mostly exploited for classification problems. The points of different classes are separated by a hyperplane, and this hyperplane must be chosen in such a way that the distances from it to the nearest data points on each side should be maximal. Support Vector Machine has some advantages. The first one is that SVM works...

Top 10 R Packages for Data Science

15/04/2018

The open-source project R is among the leading tools for data science and machine learning tasks. Given its open-source framework, there are continuous contributions, and package libraries with new features pop up frequently. Currently, the CRAN package repository features 12,525 available packages. This post takes a look at the most popular and useful packages that have set the standards for solving data manipulation, visualization, and...

Parameter Tuning in Gradient Boosting (GBM) with Python

26/03/2018

GBM is a highly popular prediction model among data scientists or as top Kaggler Owen Zhang describes it: "My confession: I (over)use GBM. When in doubt, use GBM." GradientBoostingClassifier from sklearn is a popular and user friendly application of Gradient Boosting in Python (another nice and even faster tool is xgboost). Apart from setting up the feature space and fitting the model, parameter tuning is a crucial task in finding...

Blog Categories