July 30, 2014

Create your own machine learning powered RSS reader in under 30 minutes

As developers one of our biggest "problems" is our voracious appetite for news. From Twitter to HackerNews to the latest funding on TechCrunch, it seems, at times, we cannot avoid the gravitational pull of our favorite news feeds. I know, at least for myself, this is engrained in my routine: wake up, check Twitter, check TechCrunch, check The Verge, etc. I spend at least the first 30 minutes of every day reading feeds based on title and repeat this a couple more times through the day.

Get the code sample for this project here.

I recently discovered SkimFeed, which I love and call my "dashboard into nerd-dom," basically it is a single view of the major tech sites’ titles. However, I wanted more information on each article before I decided to click on one, so I thought: Why not use text analysis algorithms as a more efficient way of consuming my feeds?

Building my very own text analysis API

There are a number of things that can be done as part of a text analysis API but I decided to concentrate on four elements I believed I could build the fastest:

  • Strip documents of unnecessary elements
  • Advanced topic extraction
  • Automatically summarize stories
  • Sentiment analysis

As all of these algorithms are already available in the Algorithmia API, I could piece them together without having to worry about servers, scaling, or even implementing the actual algorithms:

  • ScrapeRSS – Retrieves all the necessary elements from an RSS feed
  • Html2Text – Strips HTML elements and keeps the important text
  • AutoTag – Looks for keywords that represent the topic of any text submitted to it. (Latent Dirichlet Allocation)
  • SentimentAnalysis – Analyzes sentences for positive, neutral or negative connotation. Uses the Stanford NLP library.
  • Summarizer – Breaks content into sentences, and extract key sentences that represent the contents topic. Uses classifier4j to ingest a URL and summarize its main contents

Now, it was just a question of making them work together (and it took ~200 lines of code).

Check it out:

Note: wow our poor servers have been Reddit’d. If you see a unable to find worker error , rest assured we are working to get things back to normal – check back with us in a couple of minutes. 

The process:

The first thing I needed to do was retrieve all the necessary elements from the RSS feeds (ScrapeRSS). Once I had located the main content I could strip all the unnecessary HTML (HTML2Text). Now I had a nice clean body of text to start doing analysis on.

Creating topic tags is an easy way of understanding an article at a very quick glance, so I fed our clean body of text through AutoTag. Now I had the RSS title and some topics, next step was to summarize each link into 3 sentences max to complement the tags. Finally, and mostly for fun, I wanted to see if it’s true that most news is negative, so I added SentimentAnalysis.

 Done.

You can check out some of the code here (or just view-source on our demo page in the browser).

 

image

That’s one way to use the Algorithmia library to build your own text analysis API to analyze almost any data stream in < 30 minutes. .

Cheers, Diego

Matt Kiser

Product marketing manager at Algorithmia, helping developers give their apps super powers.

More Posts - Website

Follow Me:
TwitterFacebookLinkedIn

Search Posts

Recommended Content

Want to Learn More?

Join Algorithmia to get started building tomorrow's smart apps today.
© 2016 Algorithmia, All Rights Reserved.
Highlights by 0

/link/create-your-own-machine-learning-powered-rss-reader-in-under-30-minutes-algorithmia