in code

Deleting Duplicate items in Fever RSS

My Fever RSS setup has a lot of feeds that often duplicate items. There are feeds from news sites such a The Times of India corresponding to National, International and Governmental news as well as feeds from tech sites that often repeat things. The end result is that I often see the same title, the same post and thus repetitive news many times during the day. I found the following script to be an excellent way to remove duplicate items from Fever. This works on the MySQL level and so you should be careful when using it, lest you delete everything because of some coding error on my part (though I’ve checked and this works).

DELETE fever_items from fever_items inner join ( select title, MIN(id) as min_id from fever_items group by title having count(*) > 1 ) as good_rows on good_rows.title = fever_items.title and good_rows.min_id <> fever_items.id and fever_items.title <> “”;

 

On the first run, this code removed about 4000 entries from my feeds. That’s a lot of duplicates! What’s worse is that there are news items that are very similar (but not the same). It would be wonderful if someone can modify this script to delete items which have an 80-90% similarity in title. What would be even more impressive if someone could – a) apply this to the items’ content and b) integrate this into Fever itself as a button or link that one could click on or go to, in order to remove duplicates.

Thanks for reading and I hope this improved your Fever experience.