Blog > R

Analyzing Google Trends with R: Retrieve and plot with gtrendsR

 

Google became the main starting point for our online activities. Processing more than 40,000 search queries every second, Google captures a lot of what we’re thinking and worrying about all the time. Hidden racism, sexual orientation or ad returns - check out the work by Seth Stephens-Davidowitz to get some inspiration for the huge potential of Google Trends data.

While the Google Trends cockpit offers a user-friendly tool to compare the popularity of keywords over time, it often makes sense to retrieve Google Trends data to R, especially when you want to link the data with other data sets. This tutorial provides an example of how Google Trends data can be directly retrieved to R using the package gtrendsR. The package is available via CRAN. Check also the version on Github for the latest adjustments.

Downloading the gtrendsR package

The gtrendsR package can be downloaded through CRAN.

In [1]:
install.packages('gtrendsR')
library (gtrendsR)
 
 

Setting the keywords, country and time window

The package offers the same selection options as the Google Trends interface on the website. First, you select the keywords. Already here, it is important to keep in mind that the obtained values are always relative to the maximum volume for one keyword in one period (and not the absolute search volume). Thus, if the analysis contains a highly popular keyword, less popular keywords will have values close to 0 and it will be hard to analyze any variation over time.

Region: Set the region of the query. The default is 'all'. For specific countries use the country code. The countrycode package might be useful to find the desired codes.

Time window: Set the specific time window, "today+5-y" Last five years (default), 'all' for all since 2004 or a specific time span using "Y-m-d Y-m-d".

Further, the package allows to specify the channel ("web" (default), "news" "images", "froogle" (shopping) and "youtube").

Let us analyze the search volume in Germany for three big cities and major tourist destinations:

In [2]:
#define the keywords
keywords=c("Paris","New York","Barcelona")
#set the geographic area: DE = Germany
country=c('DE')
#set the time window
time=("2010-01-01 2018-08-27")
#set channels 
channel='web'
 

Now, we are ready to run the query. The query returns for popular queries not only the trend over time buy also interest by city and further details. We select interest over time and obtain a matrix that contains our selected values and the value 'hits' - the search volume for each month.

In [3]:
trends = gtrends(keywords, gprop =channel,geo=country, time = time )
#select only interst over time 
time_trend=trends$interest_over_time
head(time_trend)
 
date hits keyword geo gprop category
2010-01-01 22 Paris DE web 0
2010-02-01 22 Paris DE web 0
2010-03-01 24 Paris DE web 0
2010-04-01 25 Paris DE web 0
2010-05-01 24 Paris DE web 0
2010-06-01 22 Paris DE web 0
 

The matrix contains a value for each month and keyword (hits). We plot the result over time to obtain an idea

In [4]:
library(ggplot2)
 
plot<-ggplot(data=time_trend, aes(x=date, y=hits,group=keyword,col=keyword))+
        geom_line()+xlab('Time')+ylab('Relative Interest')+ theme_bw()+
        theme(legend.title = element_blank(),legend.position="bottom",legend.text=element_text(size=12))+ggtitle("Google Search Volume")
plot
 
 
 

We can see that one event is dominating the figure: The November 2015 Paris attacks caused a spike in search volume. This example demonstrates that outliers can dominate the analysis as the hits are displayed relative to the highest search volume.

Let’s remove November 2015 to get a better idea of the overall trend.

In [5]:
time_trend2=time_trend[time_trend$hits<45,]
plot<-ggplot(data=time_trend2, aes(x=date, y=hits,group=keyword,col=keyword))+
        geom_line()+xlab('Time')+ylab('Relative Interest')+ theme_bw()+
        theme(legend.title = element_blank(),legend.position="bottom",legend.text=element_text(size=12))+ggtitle("Google Search Volume")
plot
 
 
 

Apart from spike for Paris, we see that google searches for Barcelona are very seasonal: The spikes occur always in summer, what is not surprising for a summer destination in Spain. There seems to be a decline in 'New York' searches, but given the seasonal fluctuations, it is not that visible.

We now apply some smoothing to remove the seasonality:

In [6]:
plot<-ggplot(data=time_trend2, aes(x=date, y=hits,group=keyword,col=keyword))+
        geom_smooth(span=0.5,se=FALSE)+xlab('Time')+ylab('Relative Interest')+
        theme_bw()+theme(legend.title = element_blank(),legend.position="bottom",
        legend.text=element_text(size=12))+ggtitle("Google Search Volume")
 plot
 
`geom_smooth()` using method = 'loess'
 
 
 

The line plots with a smoothing factor show clearly that the volume for 'New York' decreased while 'Barcelona' increased since 2015. ‘Paris’ had a lower volume after the attacks but recovered within 2 years.

 

Advanced: Plotting the data in one step

The package allows to directly plotting the data, which is a great functionality if you just want to test some keywords.

In [7]:
plot(gtrendsR::gtrends(keyword = c("New York","Paris","Barcelona"), geo = "DE", time = "2010-01-01 2018-08-27"))
plot(gtrendsR::gtrends(keyword = c("Berlin","München","Frankfurt","Hamburg","Köln"), geo = "DE", time = "2010-01-01 2018-08-27"))