Blog Posts

ECCV 2022 and DIRA 2022 Trip Report

ECCV 2022 and DIRA 2022 Trip Report

I had a paper accepted to the Drawings and abstract Imagery: Representation and Analysis (DIRA) workshop, allowing me to attend the 17 th European Conference on Computer Vision 2022 (ECCV 2022) in Tel Aviv, Israel, from October 23 - 27. ECCV 2022 is a large conference with att...

Read More
From Researcher to ISTI Postdoctoral Fellow

From Researcher to ISTI Postdoctoral Fellow

Dr. Diane Oyen strongly suggested submitting an application to the Information Science & Technology Institute (ISTI) Postdoctoral Fellow program. Each year, out of the hundreds of postdocs at LANL, only two are awarded this prestigious position. Last week, I discovered tha...

Read More
HTML Meta Redirects Considered Harmful for Card Generation

HTML Meta Redirects Considered Harmful for Card Generation

A Twitter card often appears in a tweet when a user shares a URL. We found that if the page being shared on Twitter contains an HTML META redirect, Twitter will not follow it to gather information for the card.

Read More
The Return of SHARI -- Bringing News and Web Archive Storytelling Together Again

The Return of SHARI -- Bringing News and Web Archive Storytelling Together Again

SHARI is back. Each day, the Dark and Stormy Archives Project will apply the SHARI process to gather the news articles for the previous day’s top stories and present them as a social media story, as seen above. Each card in the story links to a memento of a news article in a w...

Read More
How the Internet Archive Helped Me Remember CIKM 2019

How the Internet Archive Helped Me Remember CIKM 2019

Last week, I was discussing my publications with a colleague, and I mentioned that I had a paper published at ACM CIKM 2019. They were curious about the conference and its 2019 call for papers (CFP) date. I was trying to recall the conference’s venue and one of the workshop’s ...

Read More
From Student To Researcher III

From Student To Researcher III

After graduating, I officially accepted a position in Los Alamos National Laboratory’s Information Sciences Division (CCS-3) working for Diane Oyen. On October 4, 2021, I will no longer be a member of the Los Alamos National Laboratory (LANL) Research Library and I will inste...

Read More
The Dark and Stormy Archives Project: Summarizing Web Archives Through Social Media Storytelling

The Dark and Stormy Archives Project: Summarizing Web Archives Through Social Media Storytelling

Individual web archive collections can contain thousands of documents. If a researcher wants to use one of these collections, which one best meets their information need? How does the researcher differentiate them? deally, a user would be able to glance at a visualization and ...

Read More
Embedded Tweets From @realDonaldTrump: They Won’t Break, But They Can Be Faked

Embedded Tweets From @realDonaldTrump: They Won’t Break, But They Can Be Faked

On Friday, Twitter suspended Donald Trump’s account due to concerns that his current and future tweets might continue to foment violence in the United States. Hayes Brown from MSNBC and Marshall Cohen from CNN echoed a concern I had when developing MementoEmbed: what happens t...

Read More
ACM SIGIR 2020 Non-Trip Report

ACM SIGIR 2020 Non-Trip Report

On July 25, more than 1,300 registrants around the world opened their laptops and started attending SIGIR 2020. Via Zoom and the streaming capabilities of the conference web portal, we were able to watch speakers, raise hands, ask questions, and chat with attendees. SIGIR 202...

Read More
Robustify your links! A working solution to create persistently robust links

Robustify your links! A working solution to create persistently robust links

Links on the web break all the time. We frequently experience the infamous “404 – Page not found” message, also known as “a broken link” or “link rot.” Sometimes we follow a link and discover that the linked page has significantly changed and its content no longer represents w...

Read More
Hypercane Part 3: Building Your Own Algorithms

Hypercane Part 3: Building Your Own Algorithms

In Part 1, we introduced Hypercane, a tool for automatically sampling mementos from web archive collections. Web archive collections consist of thousands of documents, and humans need tools to intelligently select mementos for a given purpose. Hypercane’s goal is to supply us ...

Read More
Hypercane Part 2: Synthesizing Output For Other Tools

Hypercane Part 2: Synthesizing Output For Other Tools

In Part 1 of this series of blog posts, I introduced Hypercane, a tool for automatically sampling mementos from web archive collections. If a human wishes to create a sample of documents from a web archive collection, they are confronted with thousands of documents from which ...

Read More
Hypercane Part 1: Intelligent Sampling of Web Archive Collections

Hypercane Part 1: Intelligent Sampling of Web Archive Collections

Yasmin AlNoamany experimented with summarizing a web collection by choosing a small number of exemplars and then visualizing them with social media storytelling. This is in contrast to approaches that try to account for all members of the collection. When I took over the Dark ...

Read More
SHARI: StoryGraph Hypercane ArchiveNow Raintale Integration -- Combining WS-DL Tools For Current Events Storytelling

SHARI: StoryGraph Hypercane ArchiveNow Raintale Integration -- Combining WS-DL Tools For Current Events Storytelling

My research focuses on summarizing existing web archive collections through social media storytelling. For this effort, we developed Raintale to tell the stories produced by a selection of mementos. Collections exist at various web archives, like Archive-It and the UK Web Arch...

Read More
The 28th ACM International Conference on Information and Knowledge Management (CIKM)

The 28th ACM International Conference on Information and Knowledge Management (CIKM)

Students, professors, industry experts, and others came to Beijing to attend the 28th ACM International Conference on Information and Knowledge Management (CIKM). This was the first time CIKM had accepted a long paper from the Old Dominion University Web Science and Digital Li...

Read More
Continuing Education to Advance Web Archiving (CEDWARC)

Continuing Education to Advance Web Archiving (CEDWARC)

On October 28, 2019, web archiving experts met with librarians and archivists at the George Washington University in Washington, DC. As part of the Continuing Education to Advance Web Archiving (CEDWARC) effort, we covered several different modules related to tools and technol...

Read More
Building the Better Crowdsourced Study - Literature on Mechanical Turk

Building the Better Crowdsourced Study - Literature on Mechanical Turk

One of the most challenging problems to solve while conducting user studies is recruiting participants. Amazon’s Mechanical Turk (MT) solves this problem by providing a marketplace where participants can earn money by completing studies for researchers. This blog post summariz...

Read More
Raintale -- A Storytelling Tool For Web Archives

Raintale -- A Storytelling Tool For Web Archives

Raintale is the latest entry in the Dark and Stormy Archives project. Our goal is to provide research studies and tools for combining web archives and social media storytelling. Raintale provides the storytelling capability. It has been designed to visualize a small number of ...

Read More
Wikis Are Archives: Integrating Memento and Mediawiki

Wikis Are Archives: Integrating Memento and Mediawiki

Since 2013, I have been a principal contributor to the Memento MediaWiki Extension. We recently released version 2.2.0 to support MediaWiki versions of 1.31.1 and greater. During the extension’s development, I have detailed some of its concepts on this blog, I have presented i...

Read More
In The Battle of the Surrogates: Social Cards Probably Win

In The Battle of the Surrogates: Social Cards Probably Win

On Tuesday, we released our latest pre-print "Social Cards Probably Provide Better Understanding of Web Archive Collections". My work builds on AlNoamany’s work of using social media storytelling to provide a visualization that summarizes web archive collections. In previous b...

Read More
Google+ Is Being Shuttered, Have We Preserved Enough of It?

Google+ Is Being Shuttered, Have We Preserved Enough of It?

Google+ will be shut down on April 2, 2019. In this blog post I cover how much of Google+ is archived and how to archive its pages.

Read More
A Preview of MementoEmbed: Embeddable Surrogates for Archived Web Pages

A Preview of MementoEmbed: Embeddable Surrogates for Archived Web Pages

With the death of Storify, I’ve been examining alternatives for summarizing web archive collections. Key to these summaries are surrogates. I have discovered that there exist services that provide users with embeds. These embeds allow an author to insert a surrogate into the H...

Read More
How well are the National Guideline Clearinghouse and the National Quality Measures Clearinghouse Archived?

How well are the National Guideline Clearinghouse and the National Quality Measures Clearinghouse Archived?

There are two US government websites in danger, the National Guideline Clearinghouse (https://www.guideline.gov) and the National Quality Measures Clearinghouse (https://qualitymeasures.ahrq.gov). Both store medical guidelines. Both will “not be available after July 16, 2018”....

Read More
Extracting Metadata from Archive-It Collections with Archive-It Utilities

Extracting Metadata from Archive-It Collections with Archive-It Utilities

At iPres 2018, I will be presenting “The Many Shapes of Archive-It”, a paper that focuses on some structural features inherent in Archive-It collections. The paper is now available as a preprint on arXiv. As part of the data gathering for “The Many Shapes of Archive-It”, and a...

Read More
The Off-Topic Memento Toolkit

The Off-Topic Memento Toolkit

Inspired by AlNoamany’s work from “Detecting off-topic pages within TimeMaps in Web archives” I am pleased to announce an alpha release of the Off-Topic Memento Toolkit (OTMT). The results of testing with this software will be presented at iPres 2018 and those results are now ...

Read More
Joint Conference on Digital Libraries (JCDL) Doctoral Consortium Trip Report

Joint Conference on Digital Libraries (JCDL) Doctoral Consortium Trip Report

On June 3, 2018, PhD students arrived in Fort Worth, Texas to attend the Joint Conference on Digital Libraries Doctoral Consortium. This is a pre-conference event associated with the ACM and IEEE-CS Joint Conference on Digital Libraries. This event gives PhD students a forum i...

Read More
Let's Get Visual and Examine Web Page Surrogates

Let's Get Visual and Examine Web Page Surrogates

Web resources can be represented in a variety of ways. In this blog post I go over work that has been done to create surrogates, or representations of web resources, for use on social media, search engine results, and more.

Read More
Dodging the Memory Hole 2017 Trip Report

Dodging the Memory Hole 2017 Trip Report

We engaged in discussions about a very important topic: the preservation of online news content. Brewster Kahle is well known in digital preservation and especially web archiving circles. I tried to cover elements of all presentations while live tweeting during the event, and ...

Read More
Storify Will Be Gone Soon, So How Do We Preserve The Stories?

Storify Will Be Gone Soon, So How Do We Preserve The Stories?

The Storify platform will be discontinued in May 2018. Here I outline some options for those trying to preserve their work before it disappears.

Read More
Association for Information Science and Technology (ASIS&T) Annual Meeting 2017

Association for Information Science and Technology (ASIS&T) Annual Meeting 2017

The crowds descended upon Arlington, Virginia for the 80th annual meeting of the Association for Information Science and Technology. I attended this meeting to learn more about ASIS&T, including its special interest groups. Also attending with me was former ODU Computer Sc...

Read More
Where Can We Post Stories Summarizing Web Archive Collections?

Where Can We Post Stories Summarizing Web Archive Collections?

This post is a re-examination of the landscape since AlNoamany’s dissertation to see if there are tools other than Storify that the Dark and Stormy Archives project can use. It covers the tools living in the spaces of content curation, storytelling, and social media.

Read More
Web Science 2017 Trip Report

Web Science 2017 Trip Report

I was fortunate enough to have the opportunity to present Yasmin AlNoamany’s work at Web Science 2017. Dr. Nelson offers an excellent class on Web Science, but it has been years since I had taken it and I still was uncertain about the current state of the art. Web Science 2017...

Read More
Discovering Scholars Everywhere They Tread

Discovering Scholars Everywhere They Tread

Though scholars write articles and papers, they also post a lot of content on the web. Datasets, blog posts (like this one), presentations, and more are posted by scholars as part of scholarly communications. What if we could aggregate the content by scholar, instead of by web...

Read More
Pushing Boundaries

Pushing Boundaries

Given a scholar’s identity on a portal, how can we crawl the scholarly portal to ensure that we capture all of their content? In this post, I evaluate a number of scholarly portals to find their boundaries, the URI patterns that allow us to capture the content of a user.

Read More
Trusted Timestamping of Mementos

Trusted Timestamping of Mementos

In this post, I examine different trusted timestamping methods. I start with some of the more traditional methods before discussing OriginStamp, a solution by Gipp, Meuschke, and Gernandt that uses the Bitcoin blockchain for timestamping.

Read More
Fun with Fictional Web Sites and the Internet Archive

Fun with Fictional Web Sites and the Internet Archive

As we celebrate the 20th anniversary of the Internet Archive, I realize that using Memento and the Wayback Machine has become second nature when solving certain problems, not only in my research, but also in my life. Those who have read my Master’s Thesis, Avoiding Spoilers on...

Read More
Memento at the W3C

Memento at the W3C

We are pleased to report that the W3C has embraced Memento for versioning its specifications and its wiki. Completing this effort required collaboration between the W3C and the Los Alamos National Laboratory (LANL) Research Library Prototyping Team. Here we inform others of th...

Read More
Mementos in the Raw, Take Two

Mementos in the Raw, Take Two

In a previous post, we discussed a way to use the existing Memento protocol combined with link headers to access unaltered (raw) archived web content. Interest in unaltered content has grown as more use cases arise for web archives. Ilya Kremer and David Rosenthal had previous...

Read More
Symposium on Saving The Web at the Library of Congress

Symposium on Saving The Web at the Library of Congress

On June 16, 2016, the Library of Congress hosted a one day Symposium entitled Saving the Web: The Ethics and Challenges of Preserving What’s on the Internet.

Read More
Mementos in the Raw

Mementos in the Raw

While analyzing mementos in a recent experiment, we discovered problems processing archived content. Many web archives augment the mementos they serve with additional archive-specific information, including HTML, text, and JavaScript. We were attempting to compare content acro...

Read More
WWW 2016 Trip Report

WWW 2016 Trip Report

I was fortunate to present a poster at the 25th International World Wide Web Conference, held from April 11, 2016 - April 15, 2016. Though my primary mission was to represent both the WS-DL and the LANL Prototyping Group, I gained a better appreciation for the state of the art...

Read More
Acquisition of Mementos and Their Content Is More Challenging Than Expected

Acquisition of Mementos and Their Content Is More Challenging Than Expected

Recently, we conducted an experiment using mementos for almost 700,000 web pages from more than 20 web archives. These web pages spanned much of the life of the web (1997-2012). Much has been written about acquiring and extracting text from live web pages, but we believe that ...

Read More
Releasing an Open Source Python Project, the Services That Brought py-memento-client to Life

Releasing an Open Source Python Project, the Services That Brought py-memento-client to Life

The LANL Library Prototyping Team recently received correspondence from a member of the Wikipedia team requesting Python code that could find the best URI-M for an archived web page based on the date of the page revision. Collaborating with Wikipedia, Harihar Shankar, Herbert ...

Read More
From Student To Researcher II

From Student To Researcher II

After successfully defending my Master’s Thesis, I accepted a position as a Graduate Research Assistant at Los Alamos National Laboratory (LANL) Library’s Digital Library Research and Prototyping Team. I now work directly for Herbert Van de Sompel, in collaboration with my ad...

Read More
From Student To Researcher...

From Student To Researcher...

In 2010, I decided to again study at the Old Dominion University Computer Science Department for better employment opportunities. After taking some classes, I realized that I did not merely want to take classes and earn a Master’s Degree, but also wanted to contribute knowledg...

Read More
Potential MediaWiki Web Time Travel for Wayback Machine Visitors

Potential MediaWiki Web Time Travel for Wayback Machine Visitors

Over the past year, I’ve been working on the Memento MediaWiki Extension. In addition to trying to produce a decent product, we’ve also been trying to build support for the Memento MediaWiki Extension at WikiConference USA 2014. Recently, we’ve reached out via Twitter to rai...

Read More
WikiConference USA 2014 Trip Report

WikiConference USA 2014 Trip Report

Amid the smell of coffee and bagels, the crowd quieted down to listen to the opening by Jennifer Baek, who, in addition to getting us energized, also paused to recognize Ardrianne Wadewitz and Cythia Sheley-Nelson, two Wikipedians who had, after contributing greatly to the Wik...

Read More
TimeGate Design Options For MediaWiki

TimeGate Design Options For MediaWiki

We’ve been working on the development, testing, and improvement of the Memento MediaWiki Extension. One of our principle concerns is performance. The Memento MediaWiki Extension supports all Memento concepts: Original Resource (URI-R) - in MediaWiki parlance referred to ...

Read More
Yesterday's (Wiki) Page, Today's Image?

Yesterday's (Wiki) Page, Today's Image?

In 2010, I decided to again study at the Old Dominion University Computer Science Department for better employment opportunities. After taking some classes, I realized that I did not merely want to take classes and earn a Master’s Degree, but also wanted to contribute knowledg...

Read More
Avoiding Spoilers with the Memento Mediawiki Extension

Avoiding Spoilers with the Memento Mediawiki Extension

In 2010, I decided to again study at the Old Dominion University Computer Science Department for better employment opportunities. After taking some classes, I realized that I did not merely want to take classes and earn a Master’s Degree, but also wanted to contribute knowledg...

Read More