These pages describe work carried out on design implementation, and applications of a technique that we call static approximate phase analysis. The PI is Hridesh Rajan and much of the work is carried out by Tyler Sondag.
NewsMarch 2011: Invited talk on phase-based tuning at SMART '11. December 2010: Paper on Frances-A tool accepted for CCSC 2011. November 2010: Paper on phase-guided tuning accepted for CGO 2011. August 2010: Paper on cache analysis accepted for RTSS 2010. January 2010: Tutorial on Frances tool accepted for CCSC 2010. October 2009: Paper on Frances tool accepted for SIGCSE 2010. |
Phase-guided Thread-to-core Assignment for Improved Utilization of Performance-Asymmetric Multi-Core ProcessorsTyler Sondag and Hridesh RajanAbstractCPU vendors are starting to explore trade offs between die size, number of cores on a die, and power consumption leading to performance asymmetry among cores on a single chip. For efficient utilization of these performance- asymmetric multi-core processors, application threads must be assigned to cores such that the resource needs of a thread closely matches resource availability at the assigned core. This significantly complicates the task of an average programmer. The contribution of this work is a technique for automatically determining the mapping between threads and performance-asymmetric cores of a processor. Our approach, which we call phase-guided thread-to-core assignment, builds on a well-known insight that programs exhibit phase behavior. We first take code sections and group them into clusters such that each section in a cluster is likely to exhibit similar runtime characteristics. The key idea is that with this clustering, characteristics of a small number of representative sections in a cluster give insight into the behavior of the entire cluster. Thus the exhibited characteristics of the representative sections on different types of cores can be used for automating thread-to-core assignment at a lower runtime cost. Variations of our technique show up to an average 150% improvement in throughput over the stock Linux scheduler for systems with a constant feed of jobs, while maintaining comparable fairness and efficiency. Bibliographic Information
@inproceedings{Sondag-Rajan-09, Most recent version: PDF Previous version appeared as Technical Report 08-14, Computer Science, Iowa State University, January 31, 2009. [PDF] |