Old Reddit Threads Are Becoming Harder to Access, Thanks to AI

Old Reddit Threads Are Becoming Harder to Access, Thanks to AI

With the increasing presence of AI in Google searches lately, I’ve been relying heavily on that one magic word that makes the internet work: Reddit. Despite its issues, adding “Reddit” to a search is still the best way to get an honest opinion from a real person, which can’t be said for some other platforms. Sadly, the “Reddit” trick is about to become less effective, and once again, AI is to blame.

The problem with any live forum is that information comes and goes as people delete old posts and new updates disrupt older parts of the site. There was a way to bypass this, but that loophole is about to close.

Reddit is about to start blocking the Internet Archive, a nonprofit dedicated to preserving the open internet, which hosts the Wayback Machine. This popular tool lets users browse internet pages that are no longer active or have changed significantly. Simply enter a URL, and you can view captures of how the page appeared in the past, sometimes dating back to the 1990s.

It’s a useful way to see how a site has changed or access information that’s long gone. For instance, you could use it to look at a deleted hotel review on Reddit. While reading a purposefully removed post might feel awkward, since deleting threads when leaving Reddit is common, the Wayback Machine helps preserve useful content and keeps classic memes from becoming lost media.

Unfortunately, while Reddit isn’t opposed to the Wayback Machine generally, it’s stopping the Internet Archive from indexing anything other than the Reddit homepage. Future archives will only include lists of what was popular on Reddit on certain days. Individual subreddits and posts will be blocked.

That’s not entirely useless for internet researchers, but it makes all future Reddit threads more temporary and hampers casual web searches. If I review a hotel now and delete my thread, users won’t be able to see it easily in a few months. On the bright side, existing archives won’t be affected unless Reddit asks the Internet Archive to remove them. Over time, the absence of Reddit archives will likely become a bigger problem.

So why is this happening? Basically, Reddit doesn’t want AI companies scraping content from its site without payment.

“Internet Archive provides a service to the open web,” Reddit spokesperson Tim Rathschmidt told The Verge, “but we’ve been aware of AI companies violating platform policies and scraping data from the Wayback Machine.”

Reddit wants to control which AI companies it works with and has blocked most from crawling its site. With some then scraping Reddit pages captured by the Internet Archive, the company is now cracking down on those captures too. We’re paying the price for a few bad apples.

Rathschmidt told The Verge that limits on the Internet Archive will start “ramping up” today, though specifics were unclear. I’ve reached out to Reddit for details, but for now, I checked and can still access existing archives, so Reddit hasn’t gone nuclear yet.

Regarding future posts, all might not be lost. The Verge also spoke to Wayback Machine director Mark Graham, who said there’s a “longstanding relationship with Reddit” and “ongoing discussions about this matter.”

Leave a Reply

Your email address will not be published. Required fields are marked *