Create your own machine learning powered RSS reader in under 30 minutes - Algorithmia
Create your own machine learning powered RSS reader in under 30 minutes
As developers one of our biggest "problems" is our voracious appetite for news. From Twitter to HackerNews to the latest funding on TechCrunch, it seems, at times, we cannot avoid the gravitational pull of our favorite news feeds. I know, at least for myself, this is engrained in my routine: wake up, check Twitter, check TechCrunch, check The Verge, etc. I spend at least the first 30 minutes of every day reading feeds based on title and repeat this a couple more times through the day.
Get the code sample for this project here.
I recently discovered SkimFeed, which I love and call my "dashboard into nerd-dom," basically it is a single view of the major tech sites’ titles. However, I wanted more information on each article before I decided to click on one, so I thought: Why not use text analysis algorithms as a more efficient way of consuming my feeds?
Building my very own text analysis API
There are a number of things that can be done as part of a text analysis API but I decided to concentrate on four elements I believed I could build the fastest:
- Strip documents of unnecessary elements
- Advanced topic extraction
- Automatically summarize stories
- Sentiment analysis
As all of these algorithms are already available in the Algorithmia API, I could piece them together without having to worry about servers, scaling, or even implementing the actual algorithms:
- ScrapeRSS – Retrieves all the necessary elements from an RSS feed
- Html2Text – Strips HTML elements and keeps the important text
- AutoTag – Looks for keywords that represent the topic of any text submitted to it. (Latent Dirichlet Allocation)
- SentimentAnalysis – Analyzes sentences for positive, neutral or negative connotation. Uses the Stanford NLP library.
- Summarizer – Breaks content into sentences, and extract key sentences that represent the contents topic. Uses classifier4j to ingest a URL and summarize its main contents
Now, it was just a question of making them work together (and it took ~200 lines of code).
Check it out:
Note: wow our poor servers have been Reddit’d. If you see a unable to find worker error , rest assured we are working to get things back to normal – check back with us in a couple of minutes.
The process:
The first thing I needed to do was retrieve all the necessary elements from the RSS feeds (ScrapeRSS). Once I had located the main content I could strip all the unnecessary HTML (HTML2Text). Now I had a nice clean body of text to start doing analysis on.
Creating topic tags is an easy way of understanding an article at a very quick glance, so I fed our clean body of text through AutoTag. Now I had the RSS title and some topics, next step was to summarize each link into 3 sentences max to complement the tags. Finally, and mostly for fun, I wanted to see if it’s true that most news is negative, so I added SentimentAnalysis.
Done.
You can check out some of the code here (or just view-source on our demo page in the browser).
That’s one way to use the Algorithmia library to build your own text analysis API to analyze almost any data stream in < 30 minutes. .
Cheers, Diego
Search Posts
Recent Posts
- A Data Science Approach to Writing a Good GitHub README April 28, 2016
- An Introduction to Natural Language Processing April 28, 2016
- Six Natural Language Processing Algorithms for Web Developers April 15, 2016
- Three Forces Accelerating Artificial Intelligence Development April 14, 2016
- Recapping Elon Musk’s Wild Week April 13, 2016
- Going Mainstream: Integrating Machine Learning in the Cloud April 11, 2016
- What AlphaGo Taught Us About The Importance Of Human Intuition April 9, 2016
- The 2016 Internet of Things Landscape in Two Infographics April 7, 2016
- Infographic: What Are Algorithms? April 6, 2016
- Reality Check: Is Augmented and Virtual Reality Ready for Prime Time? April 5, 2016
Recommended Content
- Octocat and the GitHub README AnalyzerMatt Kiser Product marketing manager at Algorithmia, helping developers give their apps super powers. More Read More...
- Benchmarking Sentiment Analysis AlgorithmsSentiment Analysis, also known as opinion mining, is a powerful tool you can use to Read More...
- Using artificial intelligence to detect nudityIf there’s one thing the internet is good for, it’s racy material. This is a Read More...
- Mining Product Hunt Part II: Building a RecommenderThis blog post is Part II of a series exploring the Product Hunt API. Apart Read More...