Me presenting at CEDWARC 2019
  • Researcher
  • Software Engineer
  • Cat Parent
  • Child of the '80s
  • and more...
Lost in OCR Translation? Vision-Based Approaches to Robust Document Retrieval

Lost in OCR Translation? Vision-Based Approaches to Robust Document Retrieval

by Alexander Most, Joseph Winjum, Ayan Biswas, Shawn M. Jones, Nishath Rajiv Ranasinghe, Dan O'Malley, Manish Bhattarai

Retrieval-Augmented Generation (RAG) has become a popular technique for enhancing the reliability and utility of Large Language Models (LLMs) by grounding responses in external documents. Traditional RAG systems rely on Optical Character Recognition (OCR) to first process scan...

Read More
LLM-Assisted Translation of Legacy FORTRAN Codes to C++: A Cross-Platform Study

LLM-Assisted Translation of Legacy FORTRAN Codes to C++: A Cross-Platform Study

by Nishath Rajiv Ranasinghe, Shawn M. Jones, Michal Kucer, Ayan Biswas, Daniel O’Malley, Alexander Most, Selma Liliane Wanna, Ajay Sreekumar

Large Language Models (LLMs) are increasingly being leveraged for generating andtranslating scientific computer codes by both domain-experts and non-domain experts. Fortran has served as one of the go to programming languages in legacy high-performance computing(HPC) for scien...

Read More
Metadata Tracking and Analysis of LLM-Based Source-to-Source Code Translation

Metadata Tracking and Analysis of LLM-Based Source-to-Source Code Translation

by Benjamin Knobloch, Christine Sweeney, Ayan Biswas, Shawn M. Jones

Modern software development education and tool chains favor current programming languages. Many important programs still exist in older languages, such as Fortran, yet they are difficult to maintain, augment, and port to new hardware. Translating programs from Fortran to a mod...

Read More
Enhancing Code Translation in Language Models with Few-Shot Learning via Retrieval-Augmented Generation

Enhancing Code Translation in Language Models with Few-Shot Learning via Retrieval-Augmented Generation

by Manish Bhattarai, Javier E. Santos, Shawn M. Jones, Ayan Biswas, Boian Alexandrov, Daniel O'Malley

The advent of large language models (LLMs) has significantly advanced the field of code translation, enabling automated translation between programming languages. However, these models often struggle with complex translation tasks due to inadequate contextual understanding. Th...

Read More
DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding

DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding

by Kehinde Ajayi, Xin Wei, Martin Gryder, Winston Shields, Jian Wu, Shawn M. Jones, Michal Kucer, and Diane Oyen

Recent advances in computer vision (CV) and natural language processing have been driven by exploiting big data on practical applications. However, these research fields are still limited by the sheer volume, versatility, and diversity of the available datasets. CV tasks, such...

Read More
Synthesizing Web Archive Collections Into Big Data: Lessons From Mining Data From Web Archives

Synthesizing Web Archive Collections Into Big Data: Lessons From Mining Data From Web Archives

by Shawn M. Jones, Himarsha Jayanetti, Martin Klein, Michele C. Weigle, and Michael L. Nelson

Web archives are sources of big data. When presenting human visitors with archived web pages, or mementos, web archives often apply user interface augmentations to assist them. Unfortunately, these augmentations present challenges for natural language processing, computer visi...

Read More
Summarizing Web Archive Corpora Via Social Media Storytelling By Automatically Selecting and Visualizing Exemplars

Summarizing Web Archive Corpora Via Social Media Storytelling By Automatically Selecting and Visualizing Exemplars

by Shawn M. Jones, Martin Klein, Michele C. Weigle, and Michael L. Nelson

People often create themed collections to make sense of an ever-increasing number of archived web pages. Some of these collections contain hundreds of thousands of documents. Thousands of collections exist, many covering the same topic. Few collections include standardized met...

Read More
Discovering Image Usage Online: A Case Study With

Discovering Image Usage Online: A Case Study With "Flatten the Curve"

by Shawn M. Jones and Diane Oyen

Understanding the spread of images across the web helps us understand the reuse of scientific visualizations and their relationship with the public. The “Flatten the Curve” graphic was heavily used during the COVID-19 pandemic to convey a complex concept in a simple form. It d...

Read More
ECCV 2022 and DIRA 2022 Trip Report

ECCV 2022 and DIRA 2022 Trip Report

I had a paper accepted to the Drawings and abstract Imagery: Representation and Analysis (DIRA) workshop, allowing me to attend the 17 th European Conference on Computer Vision 2022 (ECCV 2022) in Tel Aviv, Israel, from October 23 - 27. ECCV 2022 is a large conference with att...

Read More
Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine

Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine

Much computer vision research has focused on natural images, but technical documents typically consist of abstract images, such as charts, drawings, diagrams, and schematics. How well do general web search engines discover abstract images? Recent advancements in computer visio...

Read More