RSS woes return
Apparently the RSS feed has stopped updating on some services – Feedly, Newsblur – meaning some people are missing updates. This happened last year and nothing I did or that anyone suggested remedied it. In the end, it got better all on its own. Sometimes these blog posts will appear on Feedly etc when comic updates don’t, so if it does, updates are M-W-F unless I explicitly say otherwise. Sorry for the break in service. Nothing has changed at my end!
I just had a look at what NewsBlur does, since it’s open source. Going to get a bit technical here…
https://github.com/samuelclay/NewsBlur/blob/20e49ec586207e2f5efa2622461cd229156031bf/utils/archive/green.py#L17-L19
in the code, I can see that newsblur uses urllib to fetch the rss feed then parse it with feedparser. So I tried to do exactly what that code does and got a “406 not acceptable” response.
In other places NewsBlur uses ‘requests’ instead of ‘urllib’:
https://github.com/samuelclay/NewsBlur/blob/20e49ec586207e2f5efa2622461cd229156031bf/utils/feed_fetcher.py#L211
So it looked like it might be blocking based on what the client was? To test this, I tried curl:
curl -v https://badmachinery.com/feed/>/dev/null
This worked, I get a 200 response. Then I pretend to be urllib, just like NewsBlur:
curl -A “python-urllib2” -v https://badmachinery.com/feed/>/dev/null
this gives a 406. And repeating this pretending to be ‘requests’:
curl -A “python-requests/2.28.2” -v https://badmachinery.com/feed/>/dev/null
also gives a 406.
So it _may be_ that your wordpress is blocking those user agents? You’d want to check your server logs for 406 responses for /feed/ if you have those.
This might not be the cause at all, of course. When you say it solved itself previously, it might be something in the content their feed parsers don’t like, and when that article leaves the feed it’ll work again.
Good stuff! Furthermore, the 406 error is supposed to indicate that “the server could not produce a response matching the list of acceptable values defined in the request’s proactive content negotiation headers and that the server was unwilling to supply a default representation.” (https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/406)
The only content negotiation header my curl request sent was “Accept: */*” and that doesn’t change based on the supplied user agent string. Rather than blocking certain agents, WordPress may have decided (unnecessarily) to assume certain agents require certain object types.
Also, given that some posts get through on inoreader, and not others, it may be that the posts which don’t go through are the ones causing other clients’ requests to get a 406. Maybe wordpress’s “RSS renderer” chokes converting the failing posts into the format it thinks urllib2 is expecting.
I’m looking at the current RSS xml to see if there’s something obviously special about the missing posts. Hold my beer, I’m going in. 🙂
One more bit of info: the 406 is accompanied by this message:
Not Acceptable!Not Acceptable!An appropriate representation of the requested resource could not be found on this server. This error was generated by Mod_Security.
Looks like Mod_Security might be a good place to take a look, though I can’t imagine why.
My last comment on this: fetching the feed works with agent string “python-urlli” but not “python-urllib”. My guess is mod_security is looking for scripted agents, which it should not do because it’s not safe to assume a python-urllib agent is necessarily from an adversary.
John, if you have control of security filtering, you may be able to find the specific rule which is looking for this agent and disable it. Or, if you have control of the apache/nginx module configuration you can do it there. Otherwise you may have to contact your hosting provider to address this specific issue.
I don’t think this is the same as the Inoreader issue since it’s not post-specific.
Thanks for going in so deep. This is so far out of my wheelhouse. I have control of very little on the server, it’s a very standard corporate hosting package.
I suspect that this is the difference in the feed url of https:// vs. http://
Particularly if there is a mod_security module throwing a hissy fit
Not so much “return” as “continue”. All along I’ve seen it happen sporadically, posts appearing on one of the RSS feeds but not the other, or on neither — and differently in Inoreader and Thunderbird. In fact the “manual feed” hasn’t seen an update in Thunderbird since last July, but it’s usually there in Inoreader. Meanwhile the other feed has seen the two posts previous to this one in Thunderbird but they’re missing on both feeds in Inoreader.
… In fact, Inoreader (both feeds) has missed all the comics updates after “Pretty Little Insect” BUT has gotten the non comics updates: “Hourly Comics Day 2025”, “Solver books now in stock…” and “RSS woes return”. Surely that’s a clue.
Wait, no, it’s even weirder than that. I don’t usually check Thunderbird, but I saw the two comics following “Insect” (“Foul! But restorative” and “He’s clean”) so I think those must have appeared in Inoreader… *but they’re not in Inoreader now*!
I had a look at the feed and Foul! But Restorative – the one right after Pretty Little Insect – is now the last item. If something about that content caused the hiccup, it’ll fix itself later this week. However, don’t get your hopes up, I couldn’t see anything odd about it
I suspect the feed stops at “Foul!” because the feed is configured to only serve the 10 most recent posts.
On Inoreader, my feed has all the comics up *to* “Pretty little insect” but none of the comics from “Foul! But restorative” on. As mentioned above, it does have the recent non comics posts.
When I say “my feed” I mean what I see in Inoreader. If I look at the feed XML it has “Foul” as the oldest and includes everything from that one on.
Dear John (there, I said it).
I use Feedly, or it uses me. I stopped counting, but I appear to missed 8-10 episodes. I always blame Feedly, but it may be RSS. I often miss episodes of other comics. Questionable Content is big, Dumbing of Age sometimes, xkcd I can’t tell. I Roved Out… I think many, Dresden Codak sometimes. I only got 7 in my inbox today, which seems very small.
Good luck.
おやすみなさい
The Feedly rss feed is working for me. Everything from your site shows up twice though. I still consider it a success.
I use freely as well. This post came up dated October 11, 2011, (10 hours ago). Which is rather odd in general.
I’m not affected this time, using Feedbro within Firefox
I use Feedly and have noticed many skipped comix. I hope someone figures out a fix soon.
I run my own instance of tinytinyrss and it has been catching all your comics and posts. Thank you for continuing to run the service!
I’ve had no missed comics with Feedly in ages.
I’m sure it’s no coincidence that RSS can be pronounced “arse.”
I use Desktop Ticker (a windows app) for RSS feeds (mainly webcomics), and it has been working fine for almost a decade now. Sometimes a feed hangs, but badmachinery has been working fine.