Mining Billions of AST Nodes to Study Actual and Potential Usage of Java Language Features

By: Robert Dyer, Hridesh Rajan, Hoan Anh Nguyen, and Tien N. Nguyen

PDF Download Download Paper
Slides

Abstract

Programming languages evolve over time, adding additional language features to simplify common tasks and make the language easier to use. For example, the Java Language Specification has four editions and is currently drafting a fifth. While the addition of language features is driven by an assumed need by the community (often with direct requests for such features), there is little empirical evidence demonstrating how these new features are adopted by developers once released. In this paper, we analyze over 31k open-source Java projects representing over 9 million Java files, which when parsed contain over 18 billion AST nodes. We analyze this corpus to find uses of new Java language features over time. Our study gives interesting insights, such as: there are millions of places features could potentially be used but weren’t; developers convert existing code to use new features; and we found almost 200k instances of potential resource handling bugs.

ACM Reference

Dyer, R. et al. 2014. Mining Billions of AST Nodes to Study Actual and Potential Usage of Java Language Features. 36th International Conference on Software Engineering (Jun. 2014), 779–790.

BibTeX Reference

@inproceedings{dyer2014mining,
  author = {Dyer, Robert and Rajan, Hridesh and Nguyen, Hoan Anh and Nguyen, Tien N.},
  title = {Mining Billions of {AST} Nodes to Study Actual and Potential Usage of {Java} Language Features},
  booktitle = {36th International Conference on Software Engineering},
  series = {ICSE'14},
  month = {June},
  year = {2014},
  pages = {779--790},
  location = {Hyderabad, India},
  entrysubtype = {conference},
  abstract = {
    Programming languages evolve over time, adding additional language features to
    simplify common tasks and make the language easier to use. For example, the
    Java Language Specification has four editions and is currently drafting a
    fifth. While the addition of language features is driven by an assumed need by
    the community (often with direct requests for such features), there is little
    empirical evidence demonstrating how these new features are adopted by
    developers once released. In this paper, we analyze over 31k open-source Java
    projects representing over 9 million Java files, which when parsed contain
    over 18 billion AST nodes. We analyze this corpus to find uses of new Java
    language features over time. Our study gives interesting insights, such as:
    there are millions of places features could potentially be used but weren't;
    developers convert existing code to use new features; and we found almost 200k
    instances of potential resource handling bugs.
  }
}