Statistical Learning for Inference between Implementations and Documentation

By: Hung Phan, Hoan Anh Nguyen, Tien N. Nguyen, and Hridesh Rajan

PDF Download Download Paper

Abstract

API documentation is useful for developers to better understand how to correctly use the libraries. However, not all libraries provide good documentation on API usages. To provide better documentation, existing techniques have been proposed including program analysis-based and data mining-based approaches. In this work, instead of mining, we aim to generate behavioral exception documentation for any given code. We treat the problem of automatically generating documentation from a novel perspective: statistical machine translation (SMT). We consider the documentation and source code for an API method as the two abstraction levels of the same intention. We use SMT to translate documentation from source code and vice versa. Our preliminary results show that the direction of statistical learning for inference between implementations and documentation is very promising.

ACM Reference

Phan, H. et al. 2017. Statistical Learning for Inference Between Implementations and Documentation. Proceedings of the 39th International Conference on Software Engineering: New Ideas and Emerging Results Track (Piscataway, NJ, USA, May 2017), 27–30.

BibTeX Reference

@inproceedings{phan2017statistical,
  author = {Phan, Hung and Nguyen, Hoan Anh and Nguyen, Tien N. and Rajan, Hridesh},
  title = {Statistical Learning for Inference Between Implementations and Documentation},
  booktitle = {Proceedings of the 39th International Conference on Software Engineering: New Ideas and Emerging Results Track},
  series = {ICSE-NIER '17},
  month = {May},
  year = {2017},
  isbn = {978-1-5386-2675-7},
  location = {Buenos Aires, Argentina},
  pages = {27--30},
  numpages = {4},
  url = {https://doi.org/10.1109/ICSE-NIER.2017.9},
  doi = {10.1109/ICSE-NIER.2017.9},
  acmid = {3102971},
  publisher = {IEEE Press},
  address = {Piscataway, NJ, USA},
  keywords = {API documentation generation, machine translation},
  entrysubtype = {conference},
  abstract = {
    API documentation is useful for developers to better understand how to correctly
    use the libraries. However, not all libraries provide good documentation on API
    usages. To provide better documentation, existing techniques have been proposed
    including program analysis-based and data mining-based approaches. In this work,
    instead of mining, we aim to generate behavioral exception documentation for any
    given code. We treat the problem of automatically generating documentation from
    a novel perspective: statistical machine translation (SMT). We consider the
    documentation and source code for an API method as the two abstraction levels of
    the same intention. We use SMT to translate documentation from source code and
    vice versa. Our preliminary results show that the direction of statistical
    learning for inference between implementations and documentation is very
    promising.
  }
}