Do You Feel Swamped by Email Subscriptions?
If you’re like me, your email is probably overflowing with newsletters and notifications from all the subscriptions you’ve signed up for. It feels like a constant battle to catch up with all the “useful” insights that fill your inbox.
Since I started actively reading on Medium and Substack, my inbox has been flooded with notifications. From topics I follow to suggestions based on articles I’ve spent time on, the emails just keep coming. I dive into them, hoping to consume all the great information, but I never seem to get through it all. Inevitably, I end up marking many emails as “read,” even though I haven’t touched them.
This got me thinking:
What if I used GenAI to create a Subscription Summarizer?
A tool that could sift through my inbox, extract the key information, and give me concise, actionable insights.
A little house keeping:
If you’ve been following along with the earlier challenges, you might be wondering where I’ve been and why it took me so long to move on to the third one.
Long story short, I’ve been busy building those projects and turning them into fully functional apps. I now officially have three web apps and can confidently shamelessly call myself an indie-preneur! Feel free to check them out here!
Project Goals
For this project, I set out to build a program that could:
Connect to my Gmail account to retrieve all unread subscription emails.
Parse the emails to extract recommended article links.
Use AI to read those articles and generate summaries for me.
Ultimately, the program should save me hours of reading by surfacing only the most important and relevant information in a digestible format.
Choice of Model and Tools
To achieve this, I chose the following tools and frameworks:
Model: Hugging Face’s BART, a transformer model well-suited for text summarization tasks.
Programming Language: Barebones Python, keeping it lightweight and straightforward.
Frameworks: Hugging Face Transformers library for NLP, and Gmail API for email access.
To ensure I chose the most suitable model, I dedicated two articles to exploring the strengths and weaknesses of various relevant models:
Help from Cursor (As Always)
Once again, Cursor played a pivotal role in getting the basics up and running. It:
Generated boilerplate code to connect with the Gmail API and fetch unread emails.
Helped parse email content, extract links, and handle authentication securely.
Integrated BART from Hugging Face, generating a simple pipeline for summarizing the articles.
Cursor streamlined a lot of the setup, letting me focus on refining the logic and solving edge cases.
Rounds of Adjustments
As with any project, there were plenty of challenges along the way:
1. Extracting Correct Links
While Cursor handled most of the email parsing, I quickly realized that Medium emails are riddled with links — often more than one for the same article. For example, a single recommended article could have multiple embedded links. To solve this, I manually analyzed the email structure and set filtering parameters to capture only the primary article links.
It took 20+ iterations to get the filtering right, but eventually, I was extracting clean article URLs.
2. Improving Summary Length
The initial summaries generated by BART were too short — just a couple of sentences. They didn’t provide enough context to fully understand the article. After tweaking the parameters (like max_length
and min_length
), I finally got summaries that were concise yet informative.
3. From Summarizer to Recommender
Even with improved summaries, I found it overwhelming to read through all of them to decide which articles to open. That’s when I decided to level up the project: turning the summarizer into a recommender system.
From Summarizer to Recommender
To prioritize the best articles, I worked with Cursor to build a scoring system based on multiple metrics. Here’s how it works:
Scoring Metrics
The program calculates a score for each article using these four metrics:
Content Quality: Assesses how coherent and informative the content is, using BART to compare the summary with the original.
Readability: Evaluates the article’s ease of reading using metrics like Flesch Reading Ease.
Relevance: Checks how well the content matches the title, ensuring the article delivers on its promise.
Length Score: Scores articles based on their length to balance short, actionable reads with in-depth insights.
Weighted Scoring
Each metric contributes to the overall score based on its importance:
weights = {
'content_quality': 0.4,
'readability': 0.3,
'relevance': 0.2,
'length_score': 0.1
}
The program calculates a weighted average score for every article. In my inbox, the scores ranged from 0.2 to 0.8. Articles with higher scores are more likely to be insightful and engaging.
Result
Now, instead of skimming through every summary, I can sort articles by their overall score and focus on the top ones. This way, I only spend time on the content that’s truly worth reading.
Problems and Improvements
As satisfying as this project was, it’s far from perfect. Here are some areas for improvement:
Email Parsing Complexity
Subscription emails were particularly challenging to parse due to their inconsistent structure. While I’ve filtered the main article links, there’s room to automate this process further.
Summarization Quality
Although the summaries are decent, they could be improved by fine-tuning BART on a dataset of Medium articles or experimenting with newer models like T5 or GPT-3.5.
Relevance Scoring
The relevance metric is currently based on simple keyword matching between the title and content. Incorporating a semantic similarity model could make this more robust.
Scalability
The current system works well for a small number of articles, but as the volume grows, performance might degrade. Optimizing the pipeline and adding caching for frequently accessed content could help.
Final Thoughts
This project turned out to be a great tool for managing my email overload. What started as a simple summarizer evolved into a powerful recommender system that prioritizes high-quality content. Not only does it save me time, but it also ensures I don’t miss out on the most valuable insights.
Working with BART and Hugging Face was a great learning experience, and Cursor made the process of writing and refining code much smoother. With a few more iterations, I think this tool could become an essential productivity booster — not just for me, but for anyone drowning in email subscriptions.
On to the next GenAI challenge soon! If you have ideas for improving this project or want to try building your own version, let me know — I’d love to hear about it.
Yes, my email IS overflowing with new emails. It wasn't like this before I started Substack, but now ......
I’d try it!