Solving the perennially frustrating problem of link rot

Earlier this year, I plunged deep into the archives of this ain’t livin’ to work on a problem that’s basically perennial on the internet: Links aren’t perennial. Of some 20,000 unique links over 10 years of writing here, nearly half were broken — particularly on the daily link roundups I used to do. Obviously I didn’t scour old links by hand — I took advantage of Broken Link Checker, which is a very useful WordPress plugin. It took nearly two weeks for it to scan the entire site and then I had to sift through and decide how to deal with broken links — sometimes delinking them altogether, sometimes tracking down archives of the page involved, sometimes finding new sources when a link was being used to authenticate a fact.

The process was really laborious, and I was struck by how many websites don’t set up redirects so that this doesn’t happen. At the same time, I started poring through my 404 logs and finding that, unbeknownst to me, when WordPress changed my permalink structure (from melouhia.net/year/month/post.html to melouhia.net/year/month/post/), it wasn’t redirecting people when they followed links with the old structure. When meant that scores of people were clicking links and hitting the dreaded 404 page.

Thus began my long, intimate, and ongoing relationship with Redirection, which allows you to set up redirects so that people landing on bad links get shuffled to the right place. It was definitely a reminder that maybe I shouldn’t been so hasty to cast stones when I was snarking at other media for not using redirects to manage their incoming traffic, because you never know when someone will want to access a story that’s three years old, or seven, or older.

The internet is this amazing web of links and references and rabbitholes, but it only works when those links stay valid. Especially in the journalism field, we need valid links because we’re using them to lend authority and weight to a story. I need to be able to link to scientific research and government statistics, for example, so that it’s clear I’m not pulling an idea from thin air. When I cite a piece written by another journalist, I need to be able to link to it so readers can explore it in entirety, rather than relying on an excerpt or paraphrase to understand what my colleague said. When I think that something is interesting and I want readers to enjoy it as well, I don’t want them to hit a 404 when they land there.

The WayBack Machine is an incredible tool for grabbing snapshots of things that have disappeared, but even it falls short sometimes, and it still requires going offsite. Which is why I’m super excited about Amber, a WordPress plugin/Drupal module that integrates directly into a site. If a link is valid, the reader clicks it and moves right on through. If it isn’t, the user has the option of seeing a snapshot, ensuring that the information is preserved even if the link has since moved or the content has been taken down (though some sites block it via robots.txt, which is super annoying but ultimately to their detriment).

It’s exceedingly frustrating to think of all the links that go nowhere, following the vanishing staircases of the internet, and I was really surprised by some of the sites that had gone dark or changed their permalink structure when I started plowing through broken links. It was a bitter reminder that many websites don’t last very long in the fast-moving ecosystem of the internet — and one odd result of confronting and fixing my own 404s has been reminders of posts I wrote years ago on issues that people have only just started paying attention to. It’s also been really intriguing to see what kinds of old links people follow (I notice, for instance, that pop culture posts consistently attract tons of traffic even when they’re ancient), and where people follow them from.

The developers of Amber comment that almost 50 percent of links in Supreme Court decisions are dead, which is pretty damning. It’s a consequence of a highly mobile citation and communication system — once, a citation led to a physical piece of media, and now, it’s more ethereal, harder to pin down. As we know, physical media are subject to their own rot and disappearances, one reason people have been trying to digitise them, illustrating that this isn’t just a problem for the internet, but the sheer volume of media produced online is contributing to a magnified sense of the problem. The internet is an ocean of thinkpieces and quick hits and hot takes and investigative features and clickbait and listicles and commentaries and opinion pieces and scientific research and government websites and much, much more, a firehose of information that’s dizzying to deal with, a library too vast to really contain, and when you lose a book, so to speak, you don’t just have a few floors of a sprawling building to search through.

Amber and tools like it help keep the internet relevant, and I hope that more people adopt them. For me, they’re valuable tools as a user because I can protect information that would otherwise die. For me as a reader, they allow me to see what a writer drew upon even after that thing is gone — the automated screenshot, cleaner, neater, and easier to use. This has huge potential to boost confidence in a writer’s authority while it changes the way we interact with media, and that’s undeniably a good thing.

Image: Link, 3Allawi, Flickr