With web archives, journalists ﬁnd evidence and information to back up their stories, historians store information for later users, and social scientists can study the actions of humans during speciﬁc time periods. These diﬀerent groups gain value not only from creating their own collections but from using the collections of others. As users, we currently have no eﬃcient way of understanding what is in each collection without manually reviewing all of its items. While past work has used mementos for studying how web resources change over time or evaluated the changes to various industries, there is still theoretical work to be done in improving the usability of web archive collections. Our goal is to help collection creators and the public at large to make better use of these collections through improvements to collection understanding. We build upon the work of AlNoamany by using visualizations from social media storytelling. In this work, we provide background on the problem, analyze previous work in this area, and highlight our preliminary work before providing a plan for future research.
Raintale is the latest entry in the Dark and Stormy Archives project. Our goal is to provide research studies and tools for combining web archives and social media storytelling. Raintale provides the storytelling capability. It has been designed to visualize a small number of mementos selected from an immense web archive collection, allowing a user to summarize and visualize the whole collection or a specific aspect of it.
We examine different collections at the web archive collection service Archive-It. From here we demonstrate the use of several different structural features that can be used to predict the type of collection.
Raintale is a utility for publishing social media stories from groups of archived web pages (mementos). Raintale uses MementoEmbed to extract memento information and then publishes a story to the given storyteller, a static file or an online social media service.
As part of "Social Cards Probably Provide For Better Understanding Of Web Archive Collections" (recently accepted for publication by CIKM2019), I had to learn how to conduct user studies. One of the most challenging problems to solve while conducting user studies is recruiting participants. Amazon's Mechanical Turk (MT) solves this problem by providing a marketplace where participants can earn money by completing studies for researchers. This blog post summarizes the lessons I have learned from other studies that have successfully employed MT. I have found parts of this information scattered throughout different bodies of knowledge, but not gathered in one place; thus, I hope it is a useful starting place for future researchers.
Used by a variety of researchers, web archive collections have become invaluable sources of evidence. If a researcher is presented with a web archive collection that they did not create, how do they know what is inside so that they can use it for their own research? Search engine results and social media links are represented as surrogates, small easily digestible summaries of the underlying page. Search engines and social media have a different focus, and hence produce different surrogates than web archives. We hypothesize that groups of surrogates together are useful for summarizing a collection. We want to help users answer the question of "What does the underlying collection contain?" But which surrogate should we use? We evaluate six different surrogate types against each other. We find that the type of surrogate does not influence the time to complete the task we presented the participants. Of particular interest are social cards, surrogates typically found on social media, and browser thumbnails, screen captures of web pages rendered in a browser. At p=0.0569, and p=0.0770, respectively, we find that social cards and social cards paired side-by-side with browser thumbnails probably provide better collection understanding than the surrogates currently used by the popular Archive-It web archiving platform. We measure user interactions with each surrogate and find that users interact with social cards less than other types. The results of this study have implications for our web archive summarization work, live web curation platforms, social media, and more.
I have worked in industry for more than 18 years, participating in many aspects of systems and software engineering.
Above, you can find out more information about my journey through academia.
Below, you can follow me on social networking.