Are Code Examples on an Online Q&A Forum Reliable? A Study of API Misuse on Stack Overflow

By: Tianyi Zhang, Ganesha Upadhyaya, Anastasia Reinhardt, Hridesh Rajan, and Miryung Kim

PDF Download Download Paper

Abstract

Programmers often consult an online Q&A forum such as Stack Overflow to learn new APIs. This paper presents an empirical study on the prevalence and severity of API misuse on Stack Overflow. To reduce manual assessment effort, we design Maple, an API usage mining approach that extracts patterns from over 380K Java repositories on GitHub and subsequently reports potential API usage violations in Stack Overflow posts. We analyze 217,818 Stack Overflow posts using Maple and find that around 31% of them have potential API usage violations that may produce the symptoms such as program crashes and resource leaks. Such API misuse is caused by three main reasons—missing control constructs, missing or incorrect order of API calls, and incorrect guard conditions. Even the posts that are accepted as correct answers or upvoted by other programmers are not necessarily more reliable than other posts in terms of API misuse. This study result calls for a new human-in-the-loop approach to augment Stack Overflow code snippets and help the user consider better or alternative API usage.

ACM Reference

Zhang, T. et al. 2018. Are Code Examples on an Online Q&A Forum Reliable? A Study of API Misuse on Stack Overflow. ICSE’18: The 40th International Conference on Software Engineering (May 2018).

BibTeX Reference

@inproceedings{ReliableQA2018,
  author = {Tianyi Zhang and Ganesha Upadhyaya and Anastasia Reinhardt and Hridesh Rajan and Miryung Kim},
  title = {Are Code Examples on an Online Q&A Forum Reliable? A Study of API Misuse on Stack Overflow},
  booktitle = {ICSE'18: The 40th International Conference on Software Engineering},
  location = {Gothenberg, Sweden},
  month = {May 27-June 3, 2018},
  year = {2018},
  entrysubtype = {conference},
  abstract = {
   Programmers often consult an online Q&A forum such as Stack Overflow to learn new APIs. 
   This paper presents an empirical study on the prevalence and severity of API misuse on Stack Overflow. 
   To reduce manual assessment effort, we design Maple, an API usage mining approach 
   that extracts patterns from over 380K Java repositories on GitHub and subsequently 
   reports potential API usage violations in Stack Overflow posts.
   We analyze 217,818 Stack Overflow posts using Maple and find that around 31% of them 
   have potential API usage violations that may produce the symptoms such as program 
   crashes and resource leaks. Such API misuse is caused by three main 
   reasons---missing control constructs, missing or incorrect order of API calls, and 
   incorrect guard conditions. Even the posts that are accepted as correct answers or 
   upvoted by other programmers are not necessarily more reliable than other posts in 
   terms of API misuse. This study result calls for a new human-in-the-loop approach 
   to augment Stack Overflow code snippets and help the user consider better or 
   alternative API usage.
  }
}