|
So, I’m not the only one looking a solution for this problem.
Basically I want my RSS reader to fetch things (images for example) needed to display every entry during updates, so I can read them offline. Images in most feed entries are referenced remotely (http://), which are usually not downloaded until the entry is actually viewed. Some feeds use enclosures but that works more like an attachment rather than for content.
I’ve tried quite a few RSS readers and Straw seems to be the only one that does full automatic image fetch during updates. However Straw’s development has been stalling, and the latest version seems to be quite unstable.
Liferea has been my RSS reader for quite a while, and so I’ve decided to do it myself with (hopefully) the simplest way possible: a Liferea conversion filter which parses a feed and fetches things for offline reading.
At the moment it works by looking for <img> tags and fetches the image using wget, and then replaces the original image src to point to the local one.
It’s a pretty simple perl script. I have written it in a way so it can be extended it to parse and fetch other things in the future, maybe embedded videos for example. It currently downloads all images, one by one. It also checks if a file is already downloaded or not. You can change $SAVE_PATH in the script as needed.
You can git (yes, git) the script at git://pigeond.net/offline_filter.git. Or alternatively get the latest version here, or browse the repo at http://pigeond.net/git/?p=offline_filter.git.
To use it, set the script as the conversion filter for the feed you want to have things downloaded and it should just work.
Now I can read all the really important stuff on the train, like xkcd and failblog ;).
|
2010/03/20 19:23:38
Hi, can you give me the URL of the feed that you have the error?
You can also edit the offline_filter.pl script and set $debug = 1 and watch the console to see if there’s anything interesting.
Thanks!
reply
2010/03/21 18:05:47
I tried it on two feeds
http://escapepod.org/podcast.xml
and
http://newsrack.in/rss/indiatogether/Environment/Biodiversity/Hotspots/rss.xml
Version is Liferea 1.6.0
Last – if you happen to tweak it can you make it download the entire article(s) in a feed. Will make my life very easy as i need it to archive and store articles and have very poor access to the net
thanks
ram
reply
2010/03/21 18:13:02
I changed debug = 0 to 1
this is the error on both feeds
The last update of this subscription failed!
HTTP error code 304: Feed not available: Server requested unsupported redirection!
There were errors while parsing this feed!
Details
Could not detect the type of this feed! Please check if the source really points to a resource provided in one of the supported syndication formats!
XML Parser Output:
The URL you want Liferea to subscribe to points to a webpage and the auto discovery found no feeds on this page. Maybe this webpage just does not support feed auto discovery.Could not determine the feed type.
You may want to validate the feed using FeedValidator
There were errors while filtering this feed!
Details
/home/ram/Linux2010/software_2010/offline_filter.pl exited with status 126
reply
2010/03/21 18:27:13
Hi.
I just tested with your URL under Liferea 1.6.3, and it seems to be working.
Are you sure you are using the offline_filter.pl as the conversion filter for the feed?
Also, if you just run “echo | ./offline_filter.pl” at the shell, does it run? Any error messages?
Thanks again.
reply
2010/03/22 17:22:22
initially it gave an error saying i do not have permission to run, i tried sudo same error.
Then i right clicked on the file and allowed it to run as a programme, and now i don’t get an error. Thanks
It is now working, i can view the entire page within liferea
BUT – it does not automatically download and store the pages for offline use, even pages that i have already viewed are no longer available
This is the error i get when offline
Unable to load page
Problem occurred while loading the URL http://www.thehindu.com/2010/02/01/stories/2010020157100200.htm
Cannot connect to destination
Would appreciate your inputs
thanks
ram
reply