conference-paper

Synthesizing Web Archive Collections Into Big Data: Lessons From Mining Data From Web Archives

Synthesizing Web Archive Collections Into Big Data: Lessons From Mining Data From Web Archives

by Shawn M. Jones, Himarsha Jayanetti, Martin Klein, Michele C. Weigle, and Michael L. Nelson

Web archives are sources of big data. When presenting human visitors with archived web pages, or mementos, web archives often apply user interface augmentations to assist them. Unfortunately, these augmentations present challenges for natural language processing, computer visi...

Read More
Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine

Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine

by Shawn M. Jones, Diane Oyen

Much computer vision research has focused on natural images, but technical documents typically consist of abstract images, such as charts, drawings, diagrams, and schematics. How well do general web search engines discover abstract images? Recent advancements in computer visio...

Read More
Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists

Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists

by Himarsha R. Jayanetti, Shawn M. Jones, Martin Klein, Alex Osbourne, Paul Koerbin, Michael L. Nelson, Michele Weigle

As web archives’ holdings grow, archivists subdivide them into collections so they are easier to understand and manage. In this work, we review the collection structures of eight web archive platforms. We note a plethora of different approaches to web archive collection struct...

Read More
Robustifying Links With Zotero

Robustifying Links With Zotero

by Martin Klein, Shawn M. Jones

Referencing resources on the web has become an integral part of our digital scholarship. However, the long-term availability and accessibility of these resources has rarely been the focus of significant research and development efforts. In this paper we introduce the Zotero Ro...

Read More
It's All About The Cards: Sharing on Social Media Probably Encouraged HTML Metadata Growth

It's All About The Cards: Sharing on Social Media Probably Encouraged HTML Metadata Growth

by Shawn M. Jones, Valentina Neblitt-Jones, Michele C. Weigle, Martin Klein, and Michael L. Nelson

In a perfect world, all articles consistently contain sufficient metadata to describe the resource. We know this is not the reality, so we are motivated to investigate the evolution of the metadata that is present when authors and publishers supply their own. Because applying ...

Read More
Automatically Selecting Striking Images for Social Cards

Automatically Selecting Striking Images for Social Cards

by Shawn M. Jones, Michele C. Weigle, Martin Klein, Michael L. Nelson

To allow previewing a web page, social media platforms have developed social cards: visualizations consisting of vital information about the underlying resource. At a minimum, social cards often include features such as the web resource’s title, text summary, striking image, a...

Read More
Social Cards Probably Provide For Better Understanding Of Web Archive Collections

Social Cards Probably Provide For Better Understanding Of Web Archive Collections

by Shawn M. Jones, Michele C. Weigle, Michael L. Nelson

Used by a variety of researchers, web archive collections have become invaluable sources of evidence. If a researcher is presented with a web archive collection that they did not create, how do they know what is inside so that they can use it for their own research? Search eng...

Read More
The Off-Topic Memento Toolkit

The Off-Topic Memento Toolkit

by Shawn M. Jones, Michelle C. Weigle, and Michael L. Nelson

Web archive collections are created with a particular purpose in mind. A curator selects seeds, or original resources, which are then captured by an archiving system and stored as archived web pages, or mementos. The systems that build web archive collections are often configu...

Web mentions

Read More
The Many Shapes of Archive-It

The Many Shapes of Archive-It

by Shawn M. Jones, Alexander Nwala, Michelle C. Weigle, and Michael L. Nelson

Web archives, a key area of digital preservation, meet the needs of journalists, social scientists, historians, and government orga- nizations. The use cases for these groups often require that they guide the archiving process themselves, selecting their own original resources...

Read More