BCFA Bespoke Control Flow Analysis for CFA at Scale

By: Ramanathan Ramu, Ganesha Upadhyaya, Hoan Nguyen, and Hridesh Rajan

PDF Download Download Paper

Abstract

Many data-driven software engineering tasks such as discovering programming patterns, mining API specifications, etc., perform source code analysis over control flow graphs (CFGs) at scale. Analyzing millions of CFGs can be expensive and performance of the analysis heavily depends on the underlying CFG traversal strategy. State-of-the-art analysis frameworks use a fixed traversal strategy. We argue that a single traversal strategy does not fit all kinds of analyses and CFGs and propose bespoke control flow analysis (BCFA). Given a control flow analysis (CFA) and a large number of CFGs, BCFA selects the most efficient traversal strategy for each CFG. BCFA extracts a set of properties of the CFA by analyzing the code of the CFA and combines it with properties of the CFG, such as branching factor and cyclicity, for selecting the optimal traversal strategy. We have implemented BCFA in Boa, and evaluated BCFA using a set of representative static analyses that mainly involve traversing CFGs and two large datasets containing 287 thousand and 162 million CFGs. Our results show that BCFA can speedup the large scale analyses by 1%-28%. Further, BCFA has low overheads; less than 0.2%, and low misprediction rate; less than 0.01%.

ACM Reference

Ramu, R. et al. 2020. BCFA: Bespoke Control Flow Analysis for CFA at Scale. ICSE’20: The 42nd International Conference on Software Engineering (May 2020).

BibTeX Reference

@inproceedings{ramu20bcfa,
  author = {Ramanathan Ramu and Ganesha Upadhyaya and Hoan Nguyen and Hridesh Rajan},
  title = {BCFA: Bespoke Control Flow Analysis for CFA at Scale},
  booktitle = {ICSE'20: The 42nd International Conference on Software Engineering},
  location = {Seoul, South Korea},
  month = {May 23-May 29, 2020},
  year = {2020},
  entrysubtype = {conference},
  abstract = {
    Many data-driven software engineering tasks such as discovering
    programming patterns, mining API specifications, etc.,
    perform source code analysis over control flow graphs (CFGs) at scale.
    Analyzing millions of CFGs can be expensive and
    performance of the analysis heavily depends on the underlying CFG
    traversal strategy. State-of-the-art analysis frameworks
    use a fixed traversal strategy. We argue that a single traversal strategy
    does not fit all kinds of analyses and CFGs and propose
    bespoke control flow analysis (BCFA). Given a control flow analysis (CFA)
    and a large number of CFGs, BCFA selects the most
    efficient traversal strategy for each CFG. BCFA extracts a set of
    properties of the CFA by analyzing the code of the CFA and
    combines it with properties of the CFG, such as branching factor and
    cyclicity, for selecting the optimal traversal strategy. We
    have implemented BCFA in Boa, and evaluated BCFA using a set of
    representative static analyses that mainly involve traversing
    CFGs and two large datasets containing 287 thousand and 162 million CFGs.
    Our results show that BCFA can speedup the
    large scale analyses by 1%-28%. Further, BCFA has low overheads; less
    than 0.2%, and low misprediction rate; less than
    0.01%.
  }
}