Source Code Curation Tooling for the Code Forager

Abstract

The Web has changed the dynamics of programming. We are in an era where reusing code from the Web is frequently the norm, not the exception. In this era, programmers use question and answer (Q&A) sites like StackOverflow to pose coding questions and receive coding answers. Programmers using these sites often engage in a complex code foraging process of understanding and adapting the code snippets they encounter to determine their fitness for use. While search still dominates modern code retrieval, search alone offers little support for validating search results for fitness of use. This is, in large part, due to the inherently questionable quality of online source code. Most online source code is not guaranteed to be good, to work properly, or to be trustworthy. This dissertation focuses on this challenge by introducing Source Code Curation, along with a set of tools that implements it. Source Code Curation is a blend of filtering, refinement, and validation activities. Source Code Curation can help programmers determine what source code is more likely to be useful, and what’s not. Specifically, it can help them both address the inherently questionable quality of online source code upfront, and complete their source code understanding tasks quickly and accurately.