Publications
- Antonio J. Peña and P. Balaji A Data-Oriented Profiler to Assist in Data Partitioning and Distribution for Heterogeneous Memory in HPC. Parallel Computing , 2015 [pdf]
- A. J. Peña and P. Balaji. Toward the efficient use of multiple explicitly managed memory subsystems. IEEE Cluster 2014, Madrid, Spain, Sep. 2014.
- Antonio J. Peña and P. Balaji. A framework for tracking memory accesses in scientific applications. 2014 43nd International Conference on Parallel Processing Workshops, Minneapolis, MN, USA, Sep. 2014.
- Ashwin M. Aji, Lokendra S. Panwar, Feng Ji, Milind Chabbi, Karthik Murthy, Pavan Balaji, Keith R. Bisset, James S. Dinan, Wu-chun Feng, John Mellor-Crummey, Xiaosong Ma and Rajeev Thakur. On the Efficacy of GPU-Integrated MPI for Scientific Applications. ACM International Symposium on High Performance Parallel and Distributed Computing (HPDC). Jun. 17–21, 2013, New York, New York. [pdf] [slides]
- Ashwin M. Aji, Pavan Balaji, James S. Dinan, Wu-chun Feng and Rajeev S. Thakur. Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming. Workshop on Accelerators and Hybrid Exascale Systems (AsHES); held in conjunction with the IEEE International Parallel and Distributed Processing Symposium (IPDPS). May 20th, 2013, Boston, Massachusetts. [pdf] [slides]
- John Jenkins, James S. Dinan, Pavan Balaji, Nagiza F. Samatova and Rajeev S. Thakur. Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments. IEEE International Conference on Cluster Computing (Cluster). Sep. 28–30, 2012, Beijing, China. [pdf] [slides]
- Ashwin M. Aji, James S. Dinan, Darius T. Buntinas, Pavan Balaji, Wu-chun Feng, Keith R. Bisset and Rajeev S. Thakur. MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-Based Systems. IEEE International Conference on High Performance Computing and Communications (HPCC). June 25–27, 2012, Liverpool, UK. [pdf] [slides]
- Feng Ji, Ashwin M. Aji, James S. Dinan, Darius T. Buntinas, Pavan Balaji, Rajeev S. Thakur, Wu-chun Feng and Xiaosong Ma. DMA-Assisted, Intranode Communication in GPU Accelerated Systems. IEEE International Conference on High Performance Computing and Communications (HPCC). June 25–27, 2012, Liverpool, UK. [pdf] [slides]
- Feng Ji, Ashwin M. Aji, James Dinan, Darius Buntinas, Pavan Balaji, Wu-chun Feng, Xiaosong Ma. Efficient Intranode Communication in GPU-Accelerated Systems. 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 1838-1847, May 21-25, 2012. [pdf]