Tools like Archive-It and Webrecorder exist that allow users to create web archive collections easily. These collections contain thousands of seeds, and every capture of a seed results in a new memento. If a potential user wants to review a collection to understand it, they may need to review thousands of mementos. The Dark and Stormy Archives (DSA) Project provides a solution through summarization by generating an intelligent sample of mementos from an immense collection. With that sample, we visualize those mementos with social media storytelling through surrogates, like cards and browser thumbnails. These stories have many uses. By comparing stories, a researcher can decide which collection meets their information needs. With the DSA Toolkit, educators can sample and demonstrate web archive content to students. Archivists and librarians may generate DSA stories to showcase their collections to patrons. With the SHARI process, we join the DSA Toolkit and StoryGraph project to summarize the biggest news story for a given day. We will highlight the achievements of the Dark and Stormy Archives Project and discuss its future.
For traditional library collections, archivists can select a representative sample from a collection and display it in a featured physical or digital library space. Web archive collections may consist of thousands of archived pages, or mementos. How should an archivist display this sample to drive visitors to their collection? Search engines and social media platforms often represent web pages as cards consisting of text snippets, titles, and images. Web storytelling is a popular method for grouping these cards in order to summarize a topic. Unfortunately, social media platforms are not archive-aware and fail to consistently create a good experience for mementos. They also allow no UI alterations for their cards. Thus, we created MementoEmbed to generate cards for individual mementos and Raintale for creating entire stories that archivists can export to a variety of formats.
Tools such as Google News and Flipboard exist to convey daily news, but what about the past? In this paper, we describe how to combine several existing tools with web archive holdings to perform news analysis and visualization of the "biggest story" for a given date. StoryGraph clusters news articles together to identify a common news story. Hypercane leverages ArchiveNow to store URLs produced by StoryGraph in web archives. Hypercane analyzes these URLs to identify the most common terms, entities, and highest quality images for social media storytelling. Raintale then uses the output of these tools to produce a visualization of the news story for a given day. We name this process SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration).
Used by a variety of researchers, web archive collections have become invaluable sources of evidence. If a researcher is presented with a web archive collection that they did not create, how do they know what is inside so that they can use it for their own research? Search engine results and social media links are represented as surrogates, small easily digestible summaries of the underlying page. Search engines and social media have a different focus, and hence produce different surrogates than web archives. Search engine surrogates help a user answer the question ``Will this link meet my information need?'' Social media surrogates help a user decide ``Should I click on this?'' Our use case is subtly different. We hypothesize that groups of surrogates together are useful for summarizing a collection. We want to help users answer the question of ``What does the underlying collection contain?'' But which surrogate should we use? With Mechanical Turk participants, we evaluate six different surrogate types against each other. We find that the type of surrogate does not influence the time to complete the task we presented the participants. Of particular interest are social cards, surrogates typically found on social media, and browser thumbnails, screen captures of web pages rendered in a browser. At p=0.0569, and p=0.0770, respectively, we find that social cards and social cards paired side-by-side with browser thumbnails probably provide better collection understanding than the surrogates currently used by the popular Archive-It web archiving platform. We measure user interactions with each surrogate and find that users interact with social cards less than other types. The results of this study have implications for our web archive summarization work, live web curation platforms, social media, and more.
This CEDWARC presentation highlights AlNoamany's Algorithm and other research done to apply social media storytelling techniques to web archives, including the MementoEmbed and Raintale projects.
With web archives, journalists ﬁnd evidence and information to back up their stories, historians store information for later users, and social scientists can study the actions of humans during speciﬁc time periods. These diﬀerent groups gain value not only from creating their own collections but from using the collections of others. As users, we currently have no eﬃcient way of understanding what is in each collection without manually reviewing all of its items. While past work has used mementos for studying how web resources change over time or evaluated the changes to various industries, there is still theoretical work to be done in improving the usability of web archive collections. Our goal is to help collection creators and the public at large to make better use of these collections through improvements to collection understanding. We build upon the work of AlNoamany by using visualizations from social media storytelling. In this work, we provide background on the problem, analyze previous work in this area, and highlight our preliminary work before providing a plan for future research.
I presented my conference paper The Many Shapes of Archive-It where I talk about the structural features that can be used to understand web archive collections.
I presented my conference paper The Off-Topic Memento Toolkit where I talk about a software package that can detect off-topic mementos in a web archive collection.
Here I presented work done on my dissertation so far. The doctoral consortium allows me to acquire feedback from the community before I submit my dissertation proposal to the university. Here I propose using a visualization of representative mementos to aide in collection understanding of web archive collections, as inspired by AlNomanay's work.
This is a presentation of social media storytelling tools that were covered in a blog post written for the Web Science and Digital Libraries research group.
A presentation of the work I had done with the Research Library Prototyping Team at Los Alamos National Laboratory given to the local chapter of the Special Libraries Association in New Mexico.
A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. In this presentation, I detailed issues with using the Interent Archive to avoid spoilers.
The rise in fan-based wikis allows fans to discuss TV shows and books in ways never before seen. Unfortunately, as fans, we have the problem of spoilers. In this talk I demonstrate how one can use the Memento MediaWiki Extension to avoid spoilers in fan wikis.
The Internet Archive attempts to reconstruct web pages via snapshots (Mementos) that are taken of pages at various points in time. Many pages change more frequently than the Internet Archive can capture them, meaning that some revisions of a given web page are lost forever. Mediawiki, however, has all past revisions of a given page, and also its associated external resources.
This video shows how to use the Memento Mediawiki Extension to avoid spoilers on fan-based wikis. To achieve this, the Memento Mediawiki Extension must be installed on the given fan-based wiki, and then used with the Memento Chrome Extension.
The Memento Chrome Extension can help avoid spoilers in fan wikis. This video briefly demonstrates how it does so.
This is a brief presentation on Continuous Integration for the Defense Acquisition University, providing a simple overview of the how and why of Continuous Integration.
I was invited to presented Test Driven Development to the ODU chapter of the ACM. Here I provide an introduction to the methodology, why one should use it, and the results of personal experience using the methodology, as well as why some do not think it is worthwhile.