Research Team Blog Posts


August 1, 2018

Web Science and Digital Libraries Research Group Blog

With the death of Storify, I've been examining alternatives for summarizing web archive collections. Key to these summaries are surrogates. I have discovered that there exist services that provide users with embeds. These embeds allow an author to insert a surrogate into the HTML of their blog post or other web page. These containing pages often use the surrogate to further illustrate some concept from the surrounding content. Unfortunately, not all services generate good surrogates for mementos. After some reading, I came to the conclusion that we can fill in the gap with our own embeddable surrogate service: MementoEmbed.


July 15, 2018

Web Science and Digital Libraries Research Group Blog

There are two US government websites in danger, the National Guideline Clearinghouse (https://www.guideline.gov) and the National Quality Measures Clearinghouse (https://qualitymeasures.ahrq.gov). Both store medical guidelines. Both will "not be available after July 16, 2018". Seeing at these two sites will be shut down on July 16, 2018, how well are they archived?


July 3, 2018

Web Science and Digital Libraries Research Group Blog

At iPres 2018, I will be presenting "The Many Shapes of Archive-It", a paper that focuses on some structural features inherent in Archive-It collections. The paper is now available as a preprint on arXiv. As part of the data gathering for "The Many Shapes of Archive-It", and also as part of the development the Off-Topic Memento Toolkit, I had to write code that extracts metadata and seeds from public Archive-It collections. This capability will be useful to several aspects of our storytelling and summarization work, so I used the knowledge gained from those projects and produced a standalone Python library named Archive-It Utilities (AIU).


July 2, 2018

Web Science and Digital Libraries Research Group Blog

Inspired by AlNoamany's work from "Detecting off-topic pages within TimeMaps in Web archives" I am pleased to announce an alpha release of the Off-Topic Memento Toolkit (OTMT). The results of testing with this software will be presented at iPres 2018 and those results are now available as a preprint.


June 8, 2018

Web Science and Digital Libraries Research Group Blog

On June 3, 2018, PhD students arrived in Fort Worth, Texas to attend the Joint Conference on Digital Libraries Doctoral Consortium. This is a pre-conference event associated with the ACM and IEEE-CS Joint Conference on Digital Libraries. This event gives PhD students a forum in which to discuss their dissertation work with others in the field. The Doctoral Consortium was well attended, not only by the presenting PhD students, their advisors/supervisors, and organizers, but also by those who were genuinely interested in emerging work.


April 24, 2018

Web Science and Digital Libraries Research Group Blog

Web resources can be represented in a variety of ways. In this blog post I go over work that has been done to create surrogates, or representations of web resources, for use on social media, search engine results, and more.


December 12, 2017

Web Science and Digital Libraries Research Group Blog

The Storify platform will be discontinued in May 2018. Here I outline some options for those trying to preserve their work before it disappears.


December 14, 2017

Web Science and Digital Libraries Research Group Blog

We engaged in discussions about a very important topic: the preservation of online news content. Brewster Kahle is well known in digital preservation and especially web archiving circles. I tried to cover elements of all presentations while live tweeting during the event, and wish I could go into more detail here, but, as usual I will only cover a subset.


November 11, 2017

Web Science and Digital Libraries Research Group Blog

The crowds descended upon Arlington, Virginia for the 80th annual meeting of the Association for Information Science and Technology. I attended this meeting to learn more about ASIS&T, including its special interest groups. Also attending with me was former ODU Computer Science student and current Los Alamos National Laboratory librarian Valentina Neblitt-Jones. Here I cover the event.


August 11, 2017

Web Science and Digital Libraries Research Group Blog

This post is a re-examination of the landscape since AlNoamany's dissertation to see if there are tools other than Storify that the Dark and Stormy Archives project can use. It covers the tools living in the spaces of content curation, storytelling, and social media.


July 6, 2017

Web Science and Digital Libraries Research Group Blog

I was fortunate enough to have the opportunity to present Yasmin AlNoamany's work at Web Science 2017. Dr. Nelson offers an excellent class on Web Science, but it has been years since I had taken it and I still was uncertain about the current state of the art. Web Science 2017 took place in Troy, a small city in upstate New York that is home to Rensselaer Polytechnic Institute (RPI). The RPI team had organized an excellent conference focused on a variety of Web Science topics, including cyber bullying, taxonomies, social media, and ethics.


April 26, 2017

Web Science and Digital Libraries Research Group Blog

Though scholars write articles and papers, they also post a lot of content on the web. Datasets, blog posts (like this one), presentations, and more are posted by scholars as part of scholarly communications. What if we could aggregate the content by scholar, instead of by web site?


April 24, 2017

Web Science and Digital Libraries Research Group Blog

Given a scholar's identity on a portal, how can we crawl the scholarly portal to ensure that we capture all of their content? In this post, I evaluate a number of scholarly portals to find their boundaries, the URI patterns that allow us to capture the content of a user.


April 20, 2017

Web Science and Digital Libraries Research Group Blog

In this post, I examine different trusted timestamping methods. I start with some of the more traditional methods before discussing OriginStamp, a solution by Gipp, Meuschke, and Gernandt that uses the Bitcoin blockchain for timestamping.


October 24, 2016

Web Science and Digital Libraries Research Group Blog

As we celebrate the 20th anniversary of the Internet Archive, I realize that using Memento and the Wayback Machine has become second nature when solving certain problems, not only in my research, but also in my life. Those who have read my Master's Thesis, Avoiding Spoilers on Mediawiki Fan Sites Using Memento, know that I am a fan of many fictional television shows and movies. URIs are discussed in these fictional worlds, and sometimes the people making the fiction actually register these URIs, seen in the example below, creating an additional vector for fans to find information on their favorite characters and worlds.


August 30, 2016

Web Science and Digital Libraries Research Group Blog

We are pleased to report that the W3C has embraced Memento for versioning its specifications and its wiki. Completing this effort required collaboration between the W3C and the Los Alamos National Laboratory (LANL) Research Library Prototyping Team. Here we inform others of the brief history of this effort and provide an overview of the technical aspects of the work done to make Memento at the W3C.


with Herbert Van de Sompel, Michael L. Nelson, Lyudmila Balakireva, Martin Klein, and Harihar Shankar

August 15, 2016

Web Science and Digital Libraries Research Group Blog

In a previous post, we discussed a way to use the existing Memento protocol combined with link headers to access unaltered (raw) archived web content. Interest in unaltered content has grown as more use cases arise for web archives. Ilya Kremer and David Rosenthal had previously suggested that a new dimension of content negotiation would be necessary to allow clients to access unaltered content. That idea was not originally pursued, because it would have required the standardization of new HTTP headers. At the time, none of us were aware of the standard Prefer header from RFC7240. Prefer can solve this problem in an intuitive way much like their original suggestion of content negotiation.


June 27, 2016

Web Science and Digital Libraries Research Group Blog

On June 16, 2016, the Library of Congress hosted a one day Symposium entitled Saving the Web: The Ethics and Challenges of Preserving What's on the Internet.


with Herbert Van de Sompel and Michael L. Nelson

April 27, 2016

Web Science and Digital Libraries Research Group Blog

While analyzing mementos in a recent experiment, we discovered problems processing archived content. Many web archives augment the mementos they serve with additional archive-specific information, including HTML, text, and JavaScript. We were attempting to compare content across many web archives, and had to develop custom solutions to remove these augmentations.


April 24, 2016

Web Science and Digital Libraries Research Group Blog

I was fortunate to present a poster at the 25th International World Wide Web Conference, held from April 11, 2016 - April 15, 2016. Though my primary mission was to represent both the WS-DL and the LANL Prototyping Group, I gained a better appreciation for the state of the art of the World Wide Web. The conference was held in Montréal, Canada at the Palais des congrés de Montéal.


with Harihar Shankar

February 24, 2016

Web Science and Digital Libraries Research Group Blog

Recently, we conducted an experiment using mementos for almost 700,000 web pages from more than 20 web archives. These web pages spanned much of the life of the web (1997-2012). Much has been written about acquiring and extracting text from live web pages, but we believe that this is an unparalleled attempted to acquire and extract text from mementos themselves.

Previous Research Team Blog Posts