Saving Favorites, Amber Edition

Ari Bader-Natal  15 February 2016  Filed under: Articles  Permalink

Updated on 4/3/2016 to reflect recent improvements to the Amber WordPress plugin.

Amber is a powerful new anti-linkrot tool for WordPress and Drupal. It also happens to be the missing piece needed to turn a personal favelog into a true archive. What follows is a bit about the favelog, a bit about Amber, and notes from my first week of using the Amber plugin with my WordPress-based favelog. If you’re only here for a review of Amber for Wordpress (as of Feb 2016), jump here. If you’re curious about the whole “favelog” thing, read on…

Last month, I wrote about how (and why) to set up a WordPress blog that automatically archives everything that you are collecting around the web: The Favelog Writes Itself. Since September, my own favelog (hosted at favorites.aribadernatal.com) has been dutifully documenting and linking to: (a.) each website that I favorite in Pocket, (b.) each article that I recommend on Medium, (c.) each code repository that I star on GitHub, (d.) each tweet that I like on Twitter, (e.) each project that I back on Kickstarter, (f.g.h.) each photo and video that I like on Vimeo, YouTube, and Flickr, and even (i.) every Processing sketch that I feature in the Sketchpad Gallery. Since I turned on my favelog five months ago, it has recorded 253 new items from my various collections, most of which are websites, tweets, and code repositories.

Part of the motivation for the favelog was to track my collections in a space that I fully control. Where I could re-organize, search, and share what I collect however I see fit. Web services eventually shut down, APIs change, Terms of Service get worse, etc. Own your collections and this problem is solved, right? Well, not really.

Here’s the catch: The favelog records when I add a new article to my Articles collection, and the favelog entry includes a link back to the original article, but the favelog does not archive a copy of the article itself. It turns out that archiving things on the web is remarkably difficult, particularly in the age of fancy Javascript-based personalized single-page applications. A few weeks ago, I heard Ilya Kreymer talk at a Meetup about some of the tech powering web archiving for Internet Archive’s Wayback Machine and WebRecorder.io. The piece that I didn’t expect was the move towards tools for archiving that and learnable/accessible to anyone, not just digital archivists.

About two weeks ago, I heard the announcement about Amber, a new tool developed by the Berkman Center for Internet & Society at Harvard promising almost exactly what I’d been looking for. According to their website, Amber “automatically preserves a snapshot of every page linked to on a website, giving visitors a fallback option if links become inaccessible. If one of the pages linked to on this website were to ever go down, Amber can provide visitors with access to an alternate version.” Amber is available as a WordPress plugin or a Drupal Module, allowing you to add archiving to existing sites with a few clicks. Given that I built my favelog on WordPress, I was able to very easily test it out.

Many strengths:

Adding Amber to your WordPress blog is easy. Just install the plugin.
Customizing settings is possible but not necessary. I’m running with the defaults, and
The development team is very responsive. I ran into two issues, and the team was quick to respond via GitHub. Their wiki includes lots of useful resources. Worth checking out.
Amber can be unobtrusive to your readers, only presenting itself when it can provide an alternative to a broken link. This screenshot shows the popover for a working link, but I’ll change that as soon as I’m convinced that everything is working properly.
Cached websites look good! With expected minor imperfections (e.g. ads not rendering), the archived copies that Amber generates maintained the content and most of the style of the originals. Here’s Mike Caulfield post on Connected Copies (Amber cache and Wordpress original), a Medium-hosted article (Amber cache and Medium original), and EdSurge (Amber cache and EdSurge original.)
Amber goes beyond WordPress. A Drupal Module is also available. Support for general-purpose web servers is also in progress, with an amber_apache module and an amber_nginx module in private beta testing.
Amber is aware of other archived copies of URLs. I haven’t looked into this, but it sounds very very cool so I left it enabled. By default, this makes use of mementoweb.org. If this sounds interesting to you, you should take a look at a long-running project at Stanford that the Amber website links to: LOCKSS (Lots of Copies Keep Stuff Safe).

Two noteworthy limitations:

Amber only archives some of the links that it finds. On my blog, less than 10% of the links were archived (39 of 515.) There are a few things going on here. Many sites limit access or entirely ban web crawlers via the robots.txt protocol, and Amber appropriately respects this (following rules set for User-Agent: Amber.) Other links aren’t archived because URL redirects aren’t being resolved (see Known Issue: Redirections). About a third of my favelog posts include the text of tweets that I favorited, and Twitter re-writes every link in tweets to pass through their t.co URL shortener/tracker service. Amber didn’t archive any of these links. Updated on 4/3/2016: (1.) Amber is now picking up links to tweets! Unfortunately, the resulting snapshots (example) capture content with no style. (2.) Also worth nothing that Amber now displays a “Notes” column that indicates the reasons for some of the snapshotting failures.
Archiving links in new posts doesn’t happen automatically. You can either archive links in an individual post or page by clicking a button in the editor sidebar, or you can scan all posts on your blog from a central Amber Dashboard. For posts created through other means (e.g. from IFTTT recipes), you’ll need to log into WordPress and head to the Amber Dashboard to (1.) re-scan all posts for links and then (2.) generate snapshots for all new links. Doing this for a site with many links is, understandably, fairly slow (I’m about 20 minutes in right now.) If there were a way for Amber to automatically attempt to preserve links when a new post is published (regardless of whether through the GUI or the API), there wouldn’t be a need to regularly log into Wordpress website to kick off Amber snapshotting. Updated on 4/3/2016: Posts that are created programmatically via IFTTT are now correctly being discovered and archived by Amber!

Even with these minor limitations, Amber is awesome. It’s powerful software that helps you fight linkrot, a real problem on the web, and you can use it on your own website with just a few clicks. How often can you say that?

Later:

System and method for decision support in a virtual conference

Earlier:

The Favelog Writes Itself