Blog > API

Accessing the News API in Python

Accessing and analyzing media content is a fascinating part of data analytics. It allows to follow trends of public interest over time or to see how stories evolve (e.g. newslens). While many media outlets offer APIs, it is cumbersome to collect them individually. News API closes that gap and allows to search and retrieve live articles from all over the web.

 

 

In this tutorial we will retrieve the latest news and visualize it in a word cloud, using Python 3.

NewsAPI.org is an easy to use API to get news from over 30,000 sources all over the world. The API is free for all non-commercial projects (including open-source) and in-development commercial projects. You do need to register though to get an 'API key'. You can do this very easily in a few seconds at: https://newsapi.org/register.

Let's start with importing the required packages for this tutorial.

In [1]:
import pprint
import requests     # 2.19.1
 

After registering at NewsAPI.org, you can find your API key at: https://newsapi.org/docs/authentication The following one is a dummy one, so please replace it with your own.

In [2]:
secret = '***'
 

NewsAPI offers three endpoints:

  1. '/v2/top-headlines', for the most important headlines per country and category
  2. '/v2/everything', for all the news articles from over 30,000 sources
  3. '/v2/sources', for information on the various sources

We will use the 'everything' endpoint, to get news about 'Big Data'.

In [3]:
# Define the endpoint
url = 'https://newsapi.org/v2/everything?'
In [4]:
# Specify the query and number of returns
parameters = {
    'q': 'big data', # query phrase
    'pageSize': 20,  # maximum is 100
    'apiKey': secret # your own API key
}
 

Now we can retrieve the news with the requests package.

In [5]:
# Make the request
response = requests.get(url, params=parameters)

# Convert the response to JSON format and pretty print it
response_json = response.json()
pprint.pprint(response_json)
 
{'articles': [{'author': 'David Murphy',
               'content': 'In whats starting to feel like a weekly tradition, '
                          'another popular service Quora, this timehas '
                          'indicated it has been the victim of a security '
                          'breach that may have affected its users. As always, '
                          'some mixture of your personal details (or login '
                          'credentials) are … [+5305 chars]',
               'description': 'Illustration: Quora In what’s starting to feel '
                              'like a weekly tradition, another popular '
                              'service— Quora, this time—has indicated it has '
                              'been the victim of a security breach that may '
                              'have affected its users. As always, some '
                              'mixture of your personal details (or …',
               'publishedAt': '2018-12-04T19:30:00Z',
               'source': {'id': None, 'name': 'Lifehacker.com'},
               'title': "How to Protect Yourself After Quora's Recent Data "
                        'Breach',
               'url': 'https://lifehacker.com/how-to-protect-yourself-after-quoras-recent-data-breach-1830849388',
               'urlToImage': 'https://i.kinja-img.com/gawker-media/image/upload/s--3RCxO3KR--/c_fill,fl_progressive,g_center,h_900,q_80,w_1600/oxrmehzzseg5lq5xawvw.png'},
              {'author': 'Lydia DePillis, CNN Business',
               'content': None,
               'description': 'Among the many public services that Americans '
                              'will miss if the partial government shutdown '
                              "continues beyond this week, here's a big one: "
                              'data.',
               'publishedAt': '2018-12-26T18:40:30Z',
               'source': {'id': 'cnn', 'name': 'CNN'},
               'title': 'Government shutdown threatens access to key economic '
                        'data',
               'url': 'https://www.cnn.com/2018/12/26/economy/shutdown-census-data/index.html',
               'urlToImage': 'https://cdn.cnn.com/cnnnext/dam/assets/181226131730-01-data-shutdown-file-1226-restricted-super-tease.jpg'},
              {'author': 'Bryan Menegus',
               'content': '2018 has been an excruciating exercise is '
                          'achieving, and then discovering new definitions of, '
                     
 
limit_output extension: Maximum message size of 2500 exceeded with 26381 characters
 

Let's walk through all the news headlines with a loop (print just the titles).

In [6]:
for i in response_json['articles']:
    print(i['title'])
 
How to Protect Yourself After Quora's Recent Data Breach
Government shutdown threatens access to key economic data
The Year Workers Stood Up to Big Tech
Fivetran announces $15M Series A to build automated data pipelines
Bios raises $4.5M to further develop its ‘neural interface’ and new ways to treat chronic medical conditions
Looker snags $103 million investment on $1.6 billion valuation
Facebook: Giving Other Companies Access to Your Private Messages Actually Wasn't a Big Deal
Twitter Alerts Some Users to 'Unusual' Data Leak
Groundbreaking Tips for Managing Big Data
The Pixel 3's Big Bug Fix Update Is Here
Opioid Makers Are the Big Winners in Lawsuit Settlements
With trust destroyed, Facebook is haunted by old data deals
U.S. Murder Rate for 2018 Is on Track for a Big Drop
AtScale lands $50 million investment led by Morgan Stanley
Google CEO Sundar Pichai thinks Android users know how much their phones are tracking them
Data breaches help crooks targeting you. Prepare to fight back - CNET
At a New York Privacy Pop-Up, Facebook Sells Itself
I Asked Apple for Everything It Knows About Me, and Here's What I Found
Ten big science stories of 2018
Physicists Create Incredible 'Quark Soup' Droplets That Expand Like Little Big Bangs
 

Pretty easy right? Feel free to try out some other queries and the different endpoints. You can find the documentation at https://newsapi.org/docs and if you're logged in, all the example queries are already with your own API Key.

Now we have the headlines for 'Big Data', let's do something fun with it. We can visualize it with the wordcloud package. You can install the package via pip or conda-forge. Then, import the wordcloud & matplotlib packages

In [7]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
 

Now put all the headlines together in one string:

In [8]:
# Create an empty string
text_combined = ''
# Loop through all the headlines and add them to 'text_combined' 
for i in response_json['articles']:
    text_combined += i['title'] + ' ' # add a space after every headline, so the first and last words are not glued together
# Print the first 300 characters to screen for inspection
print(text_combined[0:300])
 
How to Protect Yourself After Quora's Recent Data Breach Government shutdown threatens access to key economic data The Year Workers Stood Up to Big Tech Fivetran announces $15M Series A to build automated data pipelines Bios raises $4.5M to further develop its ‘neural interface’ and new ways to trea
 

Now we have all the headlines together in one variable, we can use it to generate the word cloud with the following code:

In [10]:
wordcloud = WordCloud(max_font_size=40).generate(text_combined)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
 
 

What other cool things can you do with the NewsAPI and the wordcloud package? Let us know your thoughts in the comment section!

 

About the author: Joris H., Python & open source enthusiast. Entrepreneur @ Automation Wizards - https://www.automationwizards.nl