For the past few years, tasks involving text and speech processing have become really hot-trendy. Among the various researches belonging to the fields of Natural Language Processing and Machine Learning, sentiment analysis ranks really high. Sentiment analysis allows identifying and getting subjective information from the source data using data analysis and visualization, ML models for classification, text mining and analysis. This helps to understand social opinions on the subject, so sentiment analysis is widely used in business and politics and usually conducted in social networks. Social networks as the main resource for sentiment analysis Nowadays, social nets and forums are the main stage for people sharing opinions. That is why they are so interesting for researches to figure out the attitude to one or another object. Sentiment analysis allows challenging the problem of analyzing textual data created by users on social nets, microblogging platforms and forums, as well as business platforms regarding the opinions the users have about some product, service, person, idea, and so on. In the most common approach, text can be classified into two classes (binary sentiment classification): positive and negative, but sentiment analysis can have way more classes involving multi-class problem. Sentiment analysis allows processing hundreds and thousands of texts in a short period of time. This is another reason for its popularity - while people need many hours to do the same work, sentiment analysis is able to finish it in a few seconds. Common approaches for classifying sentiments Sentiment analysis of the text data can be done via three commonly used methods: machine learning, using dictionaries, and hybrid. Learning-based approach Machine learning approach is one of the most popular nowadays. Using ML techniques and various methods, users can build a classifier that can identify different sentiment expressions in the text. Dictionary-based approach The main concept of this approach is using a bag of words with polarity scores, that can help to establish whether the word has a positive, negative, or neutral connotation. Such an approach doesn't require any training set to be used allowing to classify even a small amount of data. However, there are a lot of words and expressions that are still not included in sentiment dictionaries. Hybrid approach As is evident from the title, this approach combines machine learning and lexicon-based techniques. Despite the fact that it's not widely used, the hybrid approach shows more promising and valuable results than the two approaches used separately. In this article, we will implement a dictionary-based approach, so let's deep into its basis. Dictionary (or Lexicon)-based sentiment analysis uses special dictionaries, lexicons, and methods, a lot number of which is available for calculating sentiment in text. The main are: afinn bing nrc All three are the sentiment dictionaries which help to evaluate the valence of the textual data by searching for words that describe emotion or opinion. Things needed to be done before sentiment analysis Before starting building sentiment analyzer, a few steps must be taken. First of all, we need to state the problem we are going to explore, understand its objective. Since we will use data from Donald Trump twitter, let’s claim our objective as an attempt to analyze which connotation his last tweets have. As the problem is outlined, we need to prepare our data for examining. Data preprocessing is basically an initial step in text and sentiment classification. Depending on the input data, various amount of techniques can be applied in order to make data more comprehensible and improve the effectiveness of the analysis. The most common steps in data processing are: removing numbers removing stopwords removing punctuation and so on. Building sentiment classifier The first step in building our classifier is installing the needed packages. We will need the following packages that can be installed via command install.packages("name of the package") directly from the development environment: twitteR dplyr splitstackshape purrr As soon as all the packages are installed, let's initialize them. In : library(twitteR) library(dplyr) library(splitstackshape) library(tidytext) library(purrr) We're going to get tweets from Donald Trump's account directly from Twitter, and that is why we need to provide Twitter API credentials. In : api_key <- "----" api_secret <- "----" access_token <- "----" access_token_secret <- "----" In : setup_twitter_oauth(api_key,api_secret,access_token,access_token_secret)  "Using direct authentication" And now it's time to get tweets from Donald Trump's account and convert the data into dataframe. In : TrumpTweets <- userTimeline("realDonaldTrump", n = 3200) In : TrumpTweets <- tbl_df(map_df(TrumpTweets, as.data.frame)) Here how our initial dataframe looks: In : head(TrumpTweets) text favorited favoriteCount replyToSN created truncated replyToSID id replyToUID statusSource screenName retweetCount isRetweet retweeted longitude latitude It was my great honor to host Canadian Prime Minister @JustinTrudeau at the @WhiteHouse today!🇺🇸🇨🇦 https://t.co/orlejZ9FFs FALSE 17424 NA 2019-06-20 17:49:52 FALSE NA 1141765119929700353 NA <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> realDonaldTrump 3540 FALSE FALSE NA NA Iran made a very big mistake! FALSE 127069 NA 2019-06-20 14:15:04 FALSE NA 1141711064305983488 NA <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> realDonaldTrump 38351 FALSE FALSE NA NA “The President has a really good story to tell. We have unemployment lower than we’ve seen in decades. We have peop… https://t.co/Pl2HsZbiRK FALSE 36218 NA 2019-06-20 14:14:13 TRUE NA 1141710851617034240 NA <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> realDonaldTrump 8753 FALSE FALSE NA NA S&P opens at Record High! FALSE 43995 NA 2019-06-20 13:58:53 FALSE NA 1141706991464849408 NA <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> realDonaldTrump 9037 FALSE FALSE NA NA Since Election Day 2016, Stocks up almost 50%, Stocks gained 9.2 Trillion Dollars in value, and more than 5,000,000… https://t.co/nOj2hCnU11 FALSE 62468 NA 2019-06-20 00:12:31 TRUE NA 1141499029727121408 NA <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> realDonaldTrump 16296 FALSE FALSE NA NA Congratulations to President Lopez Obrador — Mexico voted to ratify the USMCA today by a huge margin. Time for Congress to do the same here! FALSE 85219 NA 2019-06-19 23:01:59 FALSE NA 1141481280653209600 NA <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> realDonaldTrump 20039 FALSE FALSE NA NA To prepare our data for classification, let's get rid of links and format dataframe in a way when only one word is in line. In : TrumpTweets <- TrumpTweets[-(grep('t.co', TrumpTweets$'text')),] In : TrumpTweets$'tweet' <- 'tweet' TrumpTweets <- TrumpTweets[ , c('text', 'tweet')] TrumpTweets <- unnest_tokens(TrumpTweets, words, text) And this is how our dataframe looks now: In : tail(TrumpTweets) tweet words tweet most tweet of tweet their tweet people tweet from tweet venezuela In : head(TrumpTweets) tweet words tweet iran tweet made tweet a tweet very tweet big tweet mistake It's obvious that dataframe also contains various words without useful content. So it's a good idea to get rid of them. In : TrumpTweets <- anti_join(TrumpTweets, stop_words, by = c('words' = 'word')) And here's the result: In : tail(TrumpTweets) tweet words tweet harassment tweet russia tweet informed tweet removed tweet people tweet venezuela In : head(TrumpTweets) tweet words tweet iran tweet mistake tweet amp tweet record tweet congratulations tweet president Much better, isn't it? Let's see how many times each word appears in Donald Trump's tweets. In : word_count <- dplyr::count(TrumpTweets, words, sort = TRUE) In : head(word_count) words n day 2 democrats 2 enjoy 2 florida 2 iran 2 live 2 Now it's time to create some dataframe with sentiments that will be used for tweets classification. We will use bing dictionary although you can easily use any other source. In : sentiments <-get_sentiments("bing") sentiments <- dplyr::select(sentiments, word, sentiment) In : TrumpTweets_sentiments <- merge(word_count, sentiments, by.x = c('words'), by.y = c('word')) Above we did a simple classification of Trump's tweets words using our sentiment bag of words. And this is how the result looks: In : TrumpTweets_sentiments words n sentiment beautiful 1 positive burning 1 negative congratulations 1 positive defy 1 negative enjoy 2 positive harassment 1 negative hell 1 negative limits 1 negative mistake 1 negative scandal 1 negative strong 1 positive trump 1 positive Let's look at the number of occurrences per sentiment in tweets. In : sentiments_count <- dplyr::count(TrumpTweets_sentiments, sentiment, sort = TRUE) In : sentiments_count sentiment n negative 7 positive 5 We also may want to know the total count and percentage of all the sentiments. In : sentiments_sum <- sum(sentiments_count$'n') In : sentiments_count$'percentage' <- sentiments_count$'n' / sentiments_sum Let's now create an ordered dataframe for plotting counts of sentiments. In : sentiment_count <- rbind(sentiments_count) In : sentiment_count <- sentiment_count[order(sentiment_count$sentiment), ] In : sentiment_count sentiment n percentage negative 7 0.5833333 positive 5 0.4166667 And now it's time for the visualization. We will plot the results of our classifier. In : sentiment_count$'colour' <- as.integer(4) In : barplot(sentiment_count$'n', names.arg = sentiment_count$'sentiment', col = sentiment_count$'colour', cex.names = .5) In : barplot(sentiment_count$'percentage', names.arg = sentiment_count$'sentiment', col = sentiment_count$'colour', cex.names = .5) Conclusion Sentiment analysis is a great way to explore emotions and opinions among the people. Today we explored the most common and easy way for sentiment analysis that is still great in its simplicity and gives quite an informative result. However, it should be noted that different sentiment analysis methods and lexicons work better depending on the problem and text corpuses. The result of the dictionary-based approach also depends much on the matching between the used dictionary and the textual data needed to be classified. But still, user can create own dictionary that can be a good solution. Despite this, dictionary-based methods usually show much better results than more compound techniques.
Data Science Jobs in Germany
audibene Berlin, Germany
audibene / hear.com is the fastest growing hearing care company ever. Our mission is to provide cutting edge technology for hearing care devices and improve the lives of our customers. We started our journey in 2012 when our founders started thinking about the concept of digitalizing a very traditional industry. Today we are more than 1,000 people across the globe, at 11 offices from Denver to Seoul. Join us working towards revolutionizing the hearing industry! About the role As a Finance & Data Analyst you are going to support the Finance and Business Intelligence teams with managing and owning the relationship between Finance and Business Intelligence. In your day-to-day business you want to work on several global and cross-functional projects. You guarantee that the data warehouse is done smoothly and in a way that allows the business to grow and to work in a more structured way. Main tasks: Analyze / maintain the organizations KPIs as they pertain to the specific projects you will be assigned to As part of the Global Business Intelligence team, communicate and meet with stakeholders Provide recommendations and operational insight on existing and new processes Proactively and reactively analyze and interpret Create intelligent and structured processes to define and understand the data flow process for finance Create custom models and advanced analysis on the KPIs that drive our business Pro-actively managing the business relationships with the global finance teams What makes you successful as a Finance & Data Analyst Outstanding analytics skills and attention to detail At least two years of hands-on experience in a role that requires rigorous project management, financial analysis, controlling and financial planning like investment banking, consulting, venture capital Expertise in financial modeling and good understanding of accounting principles Knowledge of data profiling and proven experience in data analytics, transformations Strong, professional written and verbal communication skills with the ability to work with all levels within the business; confidence in your ability to anticipate what is needed; flexibility and ability to use initiative Data management experience and working knowledge of reporting universe structures and processes Working knowledge of Microsoft Navision is a plus Working knowledge of relational databases and how they are manipulated Experience in using statistical techniques in a business environment or advanced analytics modeling knowledge Advanced knowledge in SQL, Python, or R Excellent time management and organizational skills and deliverable to deliver deadlines More reasons to join us It makes no difference whether you are an early-bird or an owl, as we offer flexible working hours based on trust and individual accountability Be part of the company that makes a difference in peoples' lives every year Shape the future of audibene / hear.com from day one with high responsibility and meaningful tasks We offer a start-up atmosphere with flat hierarchies and open communication. Apart from the usual benefits, as free coffee, drinks and healthy snacks, we focus mainly on our people Work alongside an international team of more than 50 nationalities speaking more than 20 languages
Quandoo Berlin, Germany
We are looking for a BI analyst to support our marketing department. You'll be part of the go-to team that delivers results and business data. As a BI Analyst, your responsibility wants to be: Contribute in measuring and optimizing marketing activities using our marketing team (eg with tableau dashboards) Extract and aggregate data from various sources (Google Analytics, Google AdWords, Salesforce, etc.) Set up automation for Excel reporting Analyzing data sources, taxonomies, semantic conventions, and methodologies regularly to ensure data integrity Qualifications You have 1-2 years experience in business intelligence and especially working with marketing data Excellent knowledge of SQL queries You can build reports and dashboards with Tableau or other BI reporting tools Ideally you've already worked with Google Analytics / Adwords APIs, Localytics or similar Python knowledge is a big plus Additional information What we offer: Attractive perks - access to Quandoo's pension scheme as well as various fitness and lifestyle benefits Modern equipment - state-of-the-art laptop & tools needed for optimal work results Fun working environment - opportunity to collaborate with highly motivated colleagues and attendants parties, the weekly Qweekend and Quandoo's Global Summit Work-home balance - flexible working hours, home office possibilities and up to 30 days of vacation High level of responsibility - the chance to hit the ground Want to be part of Quandoo's future? Apply now! Sounds like a cool job, but not for yourself? Please feel free to share this job description with your network!
Morningstar on the roof, Mainkurstraße, Frankfurt, Germany
The Managed Investment Data Team requires a Fund Data Analyst to drive Morningstar data in the local market. The employee wants to collaborate with all global and local teams. The Data Analyst wants to improve and enhance relationships with data providers, demonstrate our capabilities and quality, and return the voice of the local market to the global and central teams. The position wants to be based in our Frankfurt office. Responsibilities: Data expert on investment data points, processes, methodologies, calculations & different fund structures. Partner with product, sales & support teams on the prospect discovery process. Evangelize Morningstar data at a local level to internal & external clients. Manage business relationships with fund companies. Visiting providers & setting up acquisition channels as well as acting as a point of escalation when such channels are failing. Collaborate with members of the Data & Development Centers and global teams. Manage projects focused on improving our database in order to make changes in our industry and client's needs. This includes business analysis on market trends & regulatory changes & liaising with associations or working groups in order to design data collection plans & bring back the voice of the client. Monitor competitor behavior, trends and services in order Morningstar are well placed to act on any opportunities that may arise. Requirements: Solid understanding of the local financial industry and passionate in investment data. Excellent writing, communication, problem solving, organizational and analytical skills. Ability to demonstrate client centric approach. Previous experience in project management is highly preferred. Fluent German and English. A minimum of 2 years of experience in morning star, ideally in relationship management. A bachelor's degree or equivalent is required. Expertise in Morningstar data, methodology, quality and processes. Morningstar is an equal opportunity employer D99_MstarGerman Morningstar Germany GmbH (Germany) Legal Entity
ING-DiBa AG Frankfurt am Main, Germany
Why just look on if you can help us to move on? Our ING Analytics Hub is staffed with 15 highly qualified experts who work on interdisciplinary projects, transforming Fin into Tech. You value international exchange at the highest level, are keen to think ahead or outside the box and enjoy sharing your knowledge productively? Well, don‘t just look on, jump on. Your tasks: Machine learning, Spark & Big Data, software engineering – these are the topics you like digging into, passionately pursuing trends & technologies to introduce into your team. You understand data science better than a software engineer and software engineering better than a data scientist. In practice, you implement and evaluate machine learning models and bring them into operation. It is your task to turn the requirements of the departments in the environment of BI/DWH/Big Data into smart designs and to supervise the operation of these environments. In data science projects you are in charge of the technical aspects, act as a contact person and set out the framework while always working closely with international project managers, your team, and diverse stakeholders. Your Profile: Masters degree in computer science Several years of experience with machine learning models including their operation Fit in Python & Libraries such as scikit-learn, pandas, NumPy, TensorFlow Experienced in working with Big Data platforms and tools like Hadoop, Spark und Flink Profound know-how in software development incl. software testing, Continuous Integration, code reviews, etc. Constructive in discussions in English and willing to learn German Ready to travel up to max. 1/4 of your working time Apply now at Germany’s third largest retail bank: ing.de/karriere