- Researcher
- Software Engineer
- Cat Parent
- Child of the '80s
- and more...
Enhancing Code Translation in Language Models with Few-Shot Learning via Retrieval-Augmented Generation
by Manish Bhattarai, Javier E. Santos, Shawn M. Jones, Ayan Biswas, Boian Alexandrov, Daniel O'Malley
The advent of large language models (LLMs) has significantly advanced the field of code translation, enabling automated translation between programming languages. However, these models often struggle with complex translation tasks due to inadequate contextual understanding. Th...
DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding
by Kehinde Ajayi, Xin Wei, Martin Gryder, Winston Shields, Jian Wu, Shawn M. Jones, Michal Kucer, and Diane Oyen
Recent advances in computer vision (CV) and natural language processing have been driven by exploiting big data on practical applications. However, these research fields are still limited by the sheer volume, versatility, and diversity of the available datasets. CV tasks, such...
Synthesizing Web Archive Collections Into Big Data: Lessons From Mining Data From Web Archives
by Shawn M. Jones, Himarsha Jayanetti, Martin Klein, Michele C. Weigle, and Michael L. Nelson
Web archives are sources of big data. When presenting human visitors with archived web pages, or mementos, web archives often apply user interface augmentations to assist them. Unfortunately, these augmentations present challenges for natural language processing, computer visi...
Summarizing Web Archive Corpora Via Social Media Storytelling By Automatically Selecting and Visualizing Exemplars
by Shawn M. Jones, Martin Klein, Michele C. Weigle, and Michael L. Nelson
People often create themed collections to make sense of an ever-increasing number of archived web pages. Some of these collections contain hundreds of thousands of documents. Thousands of collections exist, many covering the same topic. Few collections include standardized met...
Discovering Image Usage Online: A Case Study With "Flatten the Curve"
by Shawn M. Jones and Diane Oyen
Understanding the spread of images across the web helps us understand the reuse of scientific visualizations and their relationship with the public. The “Flatten the Curve” graphic was heavily used during the COVID-19 pandemic to convey a complex concept in a simple form. It d...
ECCV 2022 and DIRA 2022 Trip Report
I had a paper accepted to the Drawings and abstract Imagery: Representation and Analysis (DIRA) workshop, allowing me to attend the 17 th European Conference on Computer Vision 2022 (ECCV 2022) in Tel Aviv, Israel, from October 23 - 27. ECCV 2022 is a large conference with att...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine
Much computer vision research has focused on natural images, but technical documents typically consist of abstract images, such as charts, drawings, diagrams, and schematics. How well do general web search engines discover abstract images? Recent advancements in computer visio...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine
by Shawn M. Jones, Diane Oyen
Much computer vision research has focused on natural images, but technical documents typically consist of abstract images, such as charts, drawings, diagrams, and schematics. How well do general web search engines discover abstract images? Recent advancements in computer visio...
Creating Structure in Web Archives With Collections: Different Concepts From Web Archivists
by Himarsha R. Jayanetti, Shawn M. Jones, Martin Klein, Alex Osbourne, Paul Koerbin, Michael L. Nelson, Michele Weigle
As web archives’ holdings grow, archivists subdivide them into collections so they are easier to understand and manage. In this work, we review the collection structures of eight web archive platforms. We note a plethora of different approaches to web archive collection struct...
Robustifying Links With Zotero
by Martin Klein, Shawn M. Jones
Referencing resources on the web has become an integral part of our digital scholarship. However, the long-term availability and accessibility of these resources has rarely been the focus of significant research and development efforts. In this paper we introduce the Zotero Ro...