Publications

Hypercane: Toolkit for Summarizing Large Collections of Archived Webpages

Hypercane: Toolkit for Summarizing Large Collections of Archived Webpages

by Shawn M. Jones, Michele C. Weigle, Michael L. Nelson

In the Dark and Stormy Archives (DSA) project, we focus on storytelling techniques to summarize collections of archived web pages. Since collections can have hundreds or even thousands of seeds (initial URLs) and each seed can be recrawled many times, with each version separat...

Read More
Hypercane: Intelligent Sampling for Web Archive Collections

Hypercane: Intelligent Sampling for Web Archive Collections

Accepted Future Publication

by Shawn M. Jones, Michele C. Weigle, Martin Klein, Michael L. Nelson

Humans can choose individual documents from a web archive collection, but doing so is difficult if they are unfamiliar with the collection. The issue is scale. Most web archive collections consist of thousands of documents. Hypercane is a tool that automates the selection of d...

Read More
It's All About The Cards: Sharing on Social Media Probably Encouraged HTML Metadata Growth

It's All About The Cards: Sharing on Social Media Probably Encouraged HTML Metadata Growth

Accepted Future Publication

by Shawn M. Jones, Valentina Neblitt-Jones, Michele C. Weigle, Martin Klein, and Michael L. Nelson

In a perfect world, all articles consistently contain sufficient metadata to describe the resource. We know this is not the reality, so we are motivated to investigate the evolution of the metadata that is present when authors and publishers supply their own. Because applying ...

Read More
Improving Collection Understanding for Web Archives with Storytelling: Shining Light Into Dark and Stormy Archives

Improving Collection Understanding for Web Archives with Storytelling: Shining Light Into Dark and Stormy Archives

by Shawn M. Jones

Collections are the tools that people use to make sense of an ever-increasing number of archived web pages. As collections themselves grow, we need tools to make sense of them. Tools that work on the general web, like search engines, are not a good fit for these collections be...

Read More
Interoperability for Accessing Versions of Web Resources with the Memento Protocol

Interoperability for Accessing Versions of Web Resources with the Memento Protocol

by Shawn M. Jones, Martin Klein, Herbert Van de Sompel, Michael L. Nelson, and Michele C. Weigle

Used by a variety of researchers, web archive collections have become invaluable sources of evidence. If a researcher is presented with a web archive collection that they did not create, how do they know what is inside so that they can use it for their own research? Search eng...

Read More
Automatically Selecting Striking Images for Social Cards

Automatically Selecting Striking Images for Social Cards

by Shawn M. Jones, Michele C. Weigle, Martin Klein, Michael L. Nelson

To allow previewing a web page, social media platforms have developed social cards: visualizations consisting of vital information about the underlying resource. At a minimum, social cards often include features such as the web resource’s title, text summary, striking image, a...

Read More
Robustifying Links To Combat Reference Rot

Robustifying Links To Combat Reference Rot

by Shawn M. Jones, Martin Klein, and Herbert Van de Sompel

Links to web resources frequently break, and linked content can change at unpredictable rates. These dynamics of the Web are detrimental when references to web resources provide evidence or supporting information. In this paper, we highlight the significance of reference rot, ...

Web mentions

Read More
SHARI -- An Integration of Tools to Visualize the Story of the Day

SHARI -- An Integration of Tools to Visualize the Story of the Day

by Shawn M. Jones, Alexander C. Nwala, Martin Klein, Michele C. Weigle, Michael L. Nelson

Tools such as Google News and Flipboard exist to convey daily news, but what about the past? In this paper, we describe how to combine several existing tools with web archive holdings to perform news analysis and visualization of the “biggest story” for a given date. StoryGrap...

Read More
MementoEmbed and Raintale for Web Archive Storytelling

MementoEmbed and Raintale for Web Archive Storytelling

by Shawn M. Jones, Martin Klein, Michele C. Weigle, Michael L. Nelson

For traditional library collections, archivists can select a representative sample from a collection and display it in a featured physical or digital library space. Web archive collections may consist of thousands of archived pages, or mementos. How should an archivist display...

Web mentions

Read More
Social Cards Probably Provide For Better Understanding Of Web Archive Collections

Social Cards Probably Provide For Better Understanding Of Web Archive Collections

by Shawn M. Jones, Michele C. Weigle, Michael L. Nelson

Used by a variety of researchers, web archive collections have become invaluable sources of evidence. If a researcher is presented with a web archive collection that they did not create, how do they know what is inside so that they can use it for their own research? Search eng...

Read More
The Off-Topic Memento Toolkit

The Off-Topic Memento Toolkit

by Shawn M. Jones, Michelle C. Weigle, and Michael L. Nelson

Web archive collections are created with a particular purpose in mind. A curator selects seeds, or original resources, which are then captured by an archiving system and stored as archived web pages, or mementos. The systems that build web archive collections are often configu...

Web mentions

Read More
The Many Shapes of Archive-It

The Many Shapes of Archive-It

by Shawn M. Jones, Alexander Nwala, Michelle C. Weigle, and Michael L. Nelson

Web archives, a key area of digital preservation, meet the needs of journalists, social scientists, historians, and government orga- nizations. The use cases for these groups often require that they guide the archiving process themselves, selecting their own original resources...

Read More
Avoiding spoilers: wiki time travel with Sheldon Cooper

Avoiding spoilers: wiki time travel with Sheldon Cooper

by Shawn M. Jones, Michael L. Nelson, and Herbert Van de Sompel

A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if fans are behind in their viewing they run the risk of encountering “spoilers”—inf...

Read More
Uniform Access to Raw Mementos

Uniform Access to Raw Mementos

by Herbert Van de Sompel, Michael L. Nelson, Lyudmila Balakireva, Martin Klein, Shawn M. Jones, and Harihar Shankar

Most web archives augment Mementos when presenting them to the user, often for usability or legal purposes. Research efforts and software projects need access the original captured “raw” Mementos. So that users and software do not need to resort to archive-specific solutions, ...

Read More
Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content

Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content

by Shawn M. Jones, Herbert Van de Sompel, Harihar Shankar, Martin Klein, Richard Tobin, and Claire Grover

Increasingly, scholarly articles contain URI references to “web at large” resources including project web sites, scholarly wikis, ontologies, online debates, presentations, blogs, and videos. Authors reference such resources to provide essential context for the research they r...

Web mentions

Read More
Persistent URIs Must Be Used To Be Persistent

Persistent URIs Must Be Used To Be Persistent

by Herbert Van de Sompel, Martin Klein, Shawn M. Jones

We quantify the extent to which references to papers in scholarly literature use persistent HTTP URIs that leverage the Digital Object Identifier infrastructure. We find a significant number of references that do not, speculate why authors would use brittle URIs when persisten...

Web mentions

Read More
Rules of Acquisition for Mementos and Their Content

Rules of Acquisition for Mementos and Their Content

by Shawn M. Jones, Harihar Shankar

Text extraction from web pages has many applications, including web crawling optimization and document clustering. Though much has been written about the acquisition of content from live web pages, content acquisition of archived web pages, known as mementos, remains a relativ...

Read More
Avoiding Spoilers in Fan Wikis of Episodic Fiction

Avoiding Spoilers in Fan Wikis of Episodic Fiction

by Shawn M. Jones, Michael L. Nelson

A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if readers are behind in their viewing they run the risk of encountering “spoilers” ...

Web mentions

Read More
Avoiding Spoilers on Mediawiki Fan Sites Using Memento

Avoiding Spoilers on Mediawiki Fan Sites Using Memento

by Shawn M. Jones

A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if readers are behind in their viewing they run the risk of encountering spoilers” –...

Read More
Bringing Web Time Travel to MediaWiki: An Assessment of the Memento MediaWiki Extension

Bringing Web Time Travel to MediaWiki: An Assessment of the Memento MediaWiki Extension

by Shawn M. Jones, Michael L. Nelson, Harihar Shankar, Herbert Van de Sompel

We have implemented the Memento MediaWiki Extension Version 2.0, which brings the Memento Protocol to MediaWiki, used by Wikipedia and the Wikimedia Foundation. Test results show that the extension has a negligible impact on performance. Two 302 status code datetime negotiatio...

Read More