Blog posts of Shawn M. Jones

blog-post

Dec 22, 2022

ECCV 2022 and DIRA 2022 Trip Report

I had a paper accepted to the Drawings and abstract Imagery: Representation and Analysis (DIRA) workshop, allowing me to attend the 17 th European Conference on Computer Vision 2022 (ECCV 2022) in Tel Aviv, Israel, from October 23 - 27. ECCV 2022 is a large conference with att...

From Researcher to ISTI Postdoctoral Fellow

blog-post

Dec 15, 2021

WS-DL Blog

From Researcher to ISTI Postdoctoral Fellow

Dr. Diane Oyen strongly suggested submitting an application to the Information Science & Technology Institute (ISTI) Postdoctoral Fellow program. Each year, out of the hundreds of postdocs at LANL, only two are awarded this prestigious position. Last week, I discovered tha...

HTML Meta Redirects Considered Harmful for Card Generation

blog-post

Nov 18, 2021

WS-DL Blog

HTML Meta Redirects Considered Harmful for Card Generation

A Twitter card often appears in a tweet when a user shares a URL. We found that if the page being shared on Twitter contains an HTML META redirect, Twitter will not follow it to gather information for the card.

The Return of SHARI -- Bringing News and Web Archive Storytelling Together Again

blog-post

Nov 2, 2021

WS-DL Blog

The Return of SHARI -- Bringing News and Web Archive Storytelling Together Again

SHARI is back. Each day, the Dark and Stormy Archives Project will apply the SHARI process to gather the news articles for the previous day’s top stories and present them as a social media story, as seen above. Each card in the story links to a memento of a news article in a w...

How the Internet Archive Helped Me Remember CIKM 2019

blog-post

Oct 20, 2021

WS-DL Blog

How the Internet Archive Helped Me Remember CIKM 2019

Last week, I was discussing my publications with a colleague, and I mentioned that I had a paper published at ACM CIKM 2019. They were curious about the conference and its 2019 call for papers (CFP) date. I was trying to recall the conference’s venue and one of the workshop’s ...

blog-post

Aug 31, 2021

WS-DL Blog

From Student To Researcher III

After graduating, I officially accepted a position in Los Alamos National Laboratory’s Information Sciences Division (CCS-3) working for Diane Oyen. On October 4, 2021, I will no longer be a member of the Los Alamos National Laboratory (LANL) Research Library and I will inste...

The Dark and Stormy Archives Project: Summarizing Web Archives Through Social Media Storytelling

blog-post

Feb 2, 2021

IIPC Blog

The Dark and Stormy Archives Project: Summarizing Web Archives Through Social Media Storytelling

Individual web archive collections can contain thousands of documents. If a researcher wants to use one of these collections, which one best meets their information need? How does the researcher differentiate them? deally, a user would be able to glance at a visualization and ...

Embedded Tweets From @realDonaldTrump: They Won’t Break, But They Can Be Faked

blog-post

Jan 8, 2021

WS-DL Blog

Embedded Tweets From @realDonaldTrump: They Won’t Break, But They Can Be Faked

On Friday, Twitter suspended Donald Trump’s account due to concerns that his current and future tweets might continue to foment violence in the United States. Hayes Brown from MSNBC and Marshall Cohen from CNN echoed a concern I had when developing MementoEmbed: what happens t...

blog-post

Aug 2, 2020

WS-DL Blog

ACM SIGIR 2020 Non-Trip Report

On July 25, more than 1,300 registrants around the world opened their laptops and started attending SIGIR 2020. Via Zoom and the streaming capabilities of the conference web portal, we were able to watch speakers, raise hands, ask questions, and chat with attendees. SIGIR 202...

Robustify your links! A working solution to create persistently robust links

blog-post

Jul 28, 2020

IIPC Blog

Robustify your links! A working solution to create persistently robust links

Links on the web break all the time. We frequently experience the infamous “404 – Page not found” message, also known as “a broken link” or “link rot.” Sometimes we follow a link and discover that the linked page has significantly changed and its content no longer represents w...

Hypercane Part 3: Building Your Own Algorithms

blog-post

Jun 16, 2020

WS-DL Blog

Hypercane Part 3: Building Your Own Algorithms

In Part 1, we introduced Hypercane, a tool for automatically sampling mementos from web archive collections. Web archive collections consist of thousands of documents, and humans need tools to intelligently select mementos for a given purpose. Hypercane’s goal is to supply us ...

Hypercane Part 2: Synthesizing Output For Other Tools

blog-post

Jun 9, 2020

WS-DL Blog

Hypercane Part 2: Synthesizing Output For Other Tools

In Part 1 of this series of blog posts, I introduced Hypercane, a tool for automatically sampling mementos from web archive collections. If a human wishes to create a sample of documents from a web archive collection, they are confronted with thousands of documents from which ...

Hypercane Part 1: Intelligent Sampling of Web Archive Collections

blog-post

Jun 2, 2020

WS-DL Blog

Hypercane Part 1: Intelligent Sampling of Web Archive Collections

Yasmin AlNoamany experimented with summarizing a web collection by choosing a small number of exemplars and then visualizing them with social media storytelling. This is in contrast to approaches that try to account for all members of the collection. When I took over the Dark ...

SHARI: StoryGraph Hypercane ArchiveNow Raintale Integration -- Combining WS-DL Tools For Current Events Storytelling

blog-post

Mar 31, 2020

WS-DL Blog

SHARI: StoryGraph Hypercane ArchiveNow Raintale Integration -- Combining WS-DL Tools For Current Events Storytelling

My research focuses on summarizing existing web archive collections through social media storytelling. For this effort, we developed Raintale to tell the stories produced by a selection of mementos. Collections exist at various web archives, like Archive-It and the UK Web Arch...

The 28th ACM International Conference on Information and Knowledge Management (CIKM)

blog-post

Nov 17, 2019

WS-DL Blog

The 28th ACM International Conference on Information and Knowledge Management (CIKM)

Students, professors, industry experts, and others came to Beijing to attend the 28th ACM International Conference on Information and Knowledge Management (CIKM). This was the first time CIKM had accepted a long paper from the Old Dominion University Web Science and Digital Li...

Continuing Education to Advance Web Archiving (CEDWARC)

blog-post

Oct 30, 2019

WS-DL Blog

Continuing Education to Advance Web Archiving (CEDWARC)

On October 28, 2019, web archiving experts met with librarians and archivists at the George Washington University in Washington, DC. As part of the Continuing Education to Advance Web Archiving (CEDWARC) effort, we covered several different modules related to tools and technol...

Building the Better Crowdsourced Study - Literature on Mechanical Turk

blog-post

Aug 13, 2019

WS-DL Blog

Building the Better Crowdsourced Study - Literature on Mechanical Turk

One of the most challenging problems to solve while conducting user studies is recruiting participants. Amazon’s Mechanical Turk (MT) solves this problem by providing a marketplace where participants can earn money by completing studies for researchers. This blog post summariz...

Raintale -- A Storytelling Tool For Web Archives

blog-post

Jul 10, 2019

WS-DL Blog

Raintale -- A Storytelling Tool For Web Archives

Raintale is the latest entry in the Dark and Stormy Archives project. Our goal is to provide research studies and tools for combining web archives and social media storytelling. Raintale provides the storytelling capability. It has been designed to visualize a small number of ...

Wikis Are Archives: Integrating Memento and Mediawiki

blog-post

Jun 4, 2019

WS-DL Blog

Wikis Are Archives: Integrating Memento and Mediawiki

Since 2013, I have been a principal contributor to the Memento MediaWiki Extension. We recently released version 2.2.0 to support MediaWiki versions of 1.31.1 and greater. During the extension’s development, I have detailed some of its concepts on this blog, I have presented i...

In The Battle of the Surrogates: Social Cards Probably Win

blog-post

May 28, 2019

WS-DL Blog

In The Battle of the Surrogates: Social Cards Probably Win

On Tuesday, we released our latest pre-print "Social Cards Probably Provide Better Understanding of Web Archive Collections". My work builds on AlNoamany’s work of using social media storytelling to provide a visualization that summarizes web archive collections. In previous b...

Google+ Is Being Shuttered, Have We Preserved Enough of It?

blog-post

Feb 7, 2019

WS-DL Blog

Google+ Is Being Shuttered, Have We Preserved Enough of It?

Google+ will be shut down on April 2, 2019. In this blog post I cover how much of Google+ is archived and how to archive its pages.

A Preview of MementoEmbed: Embeddable Surrogates for Archived Web Pages

blog-post

Jul 31, 2018

WS-DL Blog

A Preview of MementoEmbed: Embeddable Surrogates for Archived Web Pages

With the death of Storify, I’ve been examining alternatives for summarizing web archive collections. Key to these summaries are surrogates. I have discovered that there exist services that provide users with embeds. These embeds allow an author to insert a surrogate into the H...

How well are the National Guideline Clearinghouse and the National Quality Measures Clearinghouse Archived?

blog-post

Jul 14, 2018

WS-DL Blog

How well are the National Guideline Clearinghouse and the National Quality Measures Clearinghouse Archived?

There are two US government websites in danger, the National Guideline Clearinghouse (https://www.guideline.gov) and the National Quality Measures Clearinghouse (https://qualitymeasures.ahrq.gov). Both store medical guidelines. Both will “not be available after July 16, 2018”....

Extracting Metadata from Archive-It Collections with Archive-It Utilities

blog-post

Jul 2, 2018

WS-DL Blog

Extracting Metadata from Archive-It Collections with Archive-It Utilities

At iPres 2018, I will be presenting “The Many Shapes of Archive-It”, a paper that focuses on some structural features inherent in Archive-It collections. The paper is now available as a preprint on arXiv. As part of the data gathering for “The Many Shapes of Archive-It”, and a...

blog-post

Jul 1, 2018

WS-DL Blog

The Off-Topic Memento Toolkit

Inspired by AlNoamany’s work from “Detecting off-topic pages within TimeMaps in Web archives” I am pleased to announce an alpha release of the Off-Topic Memento Toolkit (OTMT). The results of testing with this software will be presented at iPres 2018 and those results are now ...

Joint Conference on Digital Libraries (JCDL) Doctoral Consortium Trip Report

blog-post

Jun 7, 2018

WS-DL Blog

Joint Conference on Digital Libraries (JCDL) Doctoral Consortium Trip Report

On June 3, 2018, PhD students arrived in Fort Worth, Texas to attend the Joint Conference on Digital Libraries Doctoral Consortium. This is a pre-conference event associated with the ACM and IEEE-CS Joint Conference on Digital Libraries. This event gives PhD students a forum i...

Let's Get Visual and Examine Web Page Surrogates

blog-post

Apr 23, 2018

WS-DL Blog

Let's Get Visual and Examine Web Page Surrogates

Web resources can be represented in a variety of ways. In this blog post I go over work that has been done to create surrogates, or representations of web resources, for use on social media, search engine results, and more.

Dodging the Memory Hole 2017 Trip Report

blog-post

Dec 13, 2017

WS-DL Blog

Dodging the Memory Hole 2017 Trip Report

We engaged in discussions about a very important topic: the preservation of online news content. Brewster Kahle is well known in digital preservation and especially web archiving circles. I tried to cover elements of all presentations while live tweeting during the event, and ...

Storify Will Be Gone Soon, So How Do We Preserve The Stories?

blog-post

Dec 11, 2017

WS-DL Blog

Storify Will Be Gone Soon, So How Do We Preserve The Stories?

The Storify platform will be discontinued in May 2018. Here I outline some options for those trying to preserve their work before it disappears.

Association for Information Science and Technology (ASIS&T) Annual Meeting 2017

blog-post

Nov 10, 2017

WS-DL Blog

Association for Information Science and Technology (ASIS&T) Annual Meeting 2017

The crowds descended upon Arlington, Virginia for the 80th annual meeting of the Association for Information Science and Technology. I attended this meeting to learn more about ASIS&T, including its special interest groups. Also attending with me was former ODU Computer Sc...

Where Can We Post Stories Summarizing Web Archive Collections?

blog-post

Aug 10, 2017

WS-DL Blog

Where Can We Post Stories Summarizing Web Archive Collections?

This post is a re-examination of the landscape since AlNoamany’s dissertation to see if there are tools other than Storify that the Dark and Stormy Archives project can use. It covers the tools living in the spaces of content curation, storytelling, and social media.

blog-post

Jul 5, 2017

WS-DL Blog

Web Science 2017 Trip Report

I was fortunate enough to have the opportunity to present Yasmin AlNoamany’s work at Web Science 2017. Dr. Nelson offers an excellent class on Web Science, but it has been years since I had taken it and I still was uncertain about the current state of the art. Web Science 2017...

Discovering Scholars Everywhere They Tread

blog-post

Apr 25, 2017

WS-DL Blog

Discovering Scholars Everywhere They Tread

Though scholars write articles and papers, they also post a lot of content on the web. Datasets, blog posts (like this one), presentations, and more are posted by scholars as part of scholarly communications. What if we could aggregate the content by scholar, instead of by web...

blog-post

Apr 23, 2017

WS-DL Blog

Pushing Boundaries

Given a scholar’s identity on a portal, how can we crawl the scholarly portal to ensure that we capture all of their content? In this post, I evaluate a number of scholarly portals to find their boundaries, the URI patterns that allow us to capture the content of a user.

blog-post

Apr 19, 2017

WS-DL Blog

Trusted Timestamping of Mementos

In this post, I examine different trusted timestamping methods. I start with some of the more traditional methods before discussing OriginStamp, a solution by Gipp, Meuschke, and Gernandt that uses the Bitcoin blockchain for timestamping.

Fun with Fictional Web Sites and the Internet Archive

blog-post

Oct 23, 2016

WS-DL Blog

Fun with Fictional Web Sites and the Internet Archive

As we celebrate the 20th anniversary of the Internet Archive, I realize that using Memento and the Wayback Machine has become second nature when solving certain problems, not only in my research, but also in my life. Those who have read my Master’s Thesis, Avoiding Spoilers on...

blog-post

Aug 29, 2016

WS-DL Blog

Memento at the W3C

We are pleased to report that the W3C has embraced Memento for versioning its specifications and its wiki. Completing this effort required collaboration between the W3C and the Los Alamos National Laboratory (LANL) Research Library Prototyping Team. Here we inform others of th...

blog-post

Aug 14, 2016

WS-DL Blog

Mementos in the Raw, Take Two

In a previous post, we discussed a way to use the existing Memento protocol combined with link headers to access unaltered (raw) archived web content. Interest in unaltered content has grown as more use cases arise for web archives. Ilya Kremer and David Rosenthal had previous...

Symposium on Saving The Web at the Library of Congress

blog-post

Jun 26, 2016

WS-DL Blog

Symposium on Saving The Web at the Library of Congress

On June 16, 2016, the Library of Congress hosted a one day Symposium entitled Saving the Web: The Ethics and Challenges of Preserving What’s on the Internet.

blog-post

Apr 26, 2016

WS-DL Blog

Mementos in the Raw

While analyzing mementos in a recent experiment, we discovered problems processing archived content. Many web archives augment the mementos they serve with additional archive-specific information, including HTML, text, and JavaScript. We were attempting to compare content acro...

blog-post

Apr 23, 2016

WS-DL Blog

WWW 2016 Trip Report

I was fortunate to present a poster at the 25th International World Wide Web Conference, held from April 11, 2016 - April 15, 2016. Though my primary mission was to represent both the WS-DL and the LANL Prototyping Group, I gained a better appreciation for the state of the art...

Acquisition of Mementos and Their Content Is More Challenging Than Expected

blog-post

Feb 23, 2016

WS-DL Blog

Acquisition of Mementos and Their Content Is More Challenging Than Expected

Recently, we conducted an experiment using mementos for almost 700,000 web pages from more than 20 web archives. These web pages spanned much of the life of the web (1997-2012). Much has been written about acquiring and extracting text from live web pages, but we believe that ...

Releasing an Open Source Python Project, the Services That Brought py-memento-client to Life

blog-post

Sep 8, 2015

WS-DL Blog

Releasing an Open Source Python Project, the Services That Brought py-memento-client to Life

The LANL Library Prototyping Team recently received correspondence from a member of the Wikipedia team requesting Python code that could find the best URI-M for an archived web page based on the date of the page revision. Collaborating with Wikipedia, Harihar Shankar, Herbert ...

blog-post

Sep 1, 2015

WS-DL Blog

From Student To Researcher II

After successfully defending my Master’s Thesis, I accepted a position as a Graduate Research Assistant at Los Alamos National Laboratory (LANL) Library’s Digital Library Research and Prototyping Team. I now work directly for Herbert Van de Sompel, in collaboration with my ad...

blog-post

Apr 5, 2015

WS-DL Blog

From Student To Researcher...

In 2010, I decided to again study at the Old Dominion University Computer Science Department for better employment opportunities. After taking some classes, I realized that I did not merely want to take classes and earn a Master’s Degree, but also wanted to contribute knowledg...

Potential MediaWiki Web Time Travel for Wayback Machine Visitors

blog-post

Jul 8, 2014

WS-DL Blog

Potential MediaWiki Web Time Travel for Wayback Machine Visitors

Over the past year, I’ve been working on the Memento MediaWiki Extension. In addition to trying to produce a decent product, we’ve also been trying to build support for the Memento MediaWiki Extension at WikiConference USA 2014. Recently, we’ve reached out via Twitter to rai...

blog-post

Jun 2, 2014

WS-DL Blog

WikiConference USA 2014 Trip Report

Amid the smell of coffee and bagels, the crowd quieted down to listen to the opening by Jennifer Baek, who, in addition to getting us energized, also paused to recognize Ardrianne Wadewitz and Cythia Sheley-Nelson, two Wikipedians who had, after contributing greatly to the Wik...

blog-post

Apr 17, 2014

WS-DL Blog

TimeGate Design Options For MediaWiki

We’ve been working on the development, testing, and improvement of the Memento MediaWiki Extension. One of our principle concerns is performance. The Memento MediaWiki Extension supports all Memento concepts: Original Resource (URI-R) - in MediaWiki parlance referred to ...

blog-post

Apr 1, 2014

WS-DL Blog

Yesterday's (Wiki) Page, Today's Image?

In 2010, I decided to again study at the Old Dominion University Computer Science Department for better employment opportunities. After taking some classes, I realized that I did not merely want to take classes and earn a Master’s Degree, but also wanted to contribute knowledg...

Avoiding Spoilers with the Memento Mediawiki Extension

blog-post

Dec 18, 2013

WS-DL Blog

Avoiding Spoilers with the Memento Mediawiki Extension

In 2010, I decided to again study at the Old Dominion University Computer Science Department for better employment opportunities. After taking some classes, I realized that I did not merely want to take classes and earn a Master’s Degree, but also wanted to contribute knowledg...

Blog Posts