Publications

  1. Antonio J. Peña and P. Balaji A Data-Oriented Profiler to Assist in Data Partitioning and Distribution for Heterogeneous Memory in HPC.  Parallel Computing , 2015 [pdf]
  2. A. J. Peña and P. Balaji. Toward the efficient use of multiple explicitly managed memory subsystems. IEEE Cluster 2014, Madrid, Spain, Sep. 2014.
  3. Antonio J. Peña and P. Balaji. A framework for tracking memory accesses in scientific applications. 2014 43nd International Conference on Parallel Processing Workshops, Minneapolis, MN, USA, Sep. 2014.
  4. Ashwin M. Aji, Lokendra S. Panwar, Feng Ji, Milind Chabbi, Karthik Murthy, Pavan Balaji, Keith R. Bisset, James S. Dinan, Wu-chun Feng, John Mellor-Crummey, Xiaosong Ma and Rajeev Thakur. On the Efficacy of GPU-Integrated MPI for Scientific Applications. ACM International Symposium on High Performance Parallel and Distributed Computing (HPDC). Jun. 17–21, 2013, New York, New York. [pdf] [slides]
  5. Ashwin M. Aji, Pavan Balaji, James S. Dinan, Wu-chun Feng and Rajeev S. Thakur. Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming. Workshop on Accelerators and Hybrid Exascale Systems (AsHES); held in conjunction with the IEEE International Parallel and Distributed Processing Symposium (IPDPS). May 20th, 2013, Boston, Massachusetts. [pdf] [slides]
  6. John Jenkins, James S. Dinan, Pavan Balaji, Nagiza F. Samatova and Rajeev S. Thakur. Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments. IEEE International Conference on Cluster Computing (Cluster). Sep. 28–30, 2012, Beijing, China. [pdf] [slides]
  7. Ashwin M. Aji, James S. Dinan, Darius T. Buntinas, Pavan Balaji, Wu-chun Feng, Keith R. Bisset and Rajeev S. Thakur. MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-Based Systems. IEEE International Conference on High Performance Computing and Communications (HPCC). June 25–27, 2012, Liverpool, UK. [pdf] [slides]
  8. Feng Ji, Ashwin M. Aji, James S. Dinan, Darius T. Buntinas, Pavan Balaji, Rajeev S. Thakur, Wu-chun Feng and Xiaosong Ma. DMA-Assisted, Intranode Communication in GPU Accelerated Systems. IEEE International Conference on High Performance Computing and Communications (HPCC). June 25–27, 2012, Liverpool, UK. [pdf] [slides]
  9. Feng Ji, Ashwin M. Aji, James Dinan, Darius Buntinas, Pavan Balaji, Wu-chun Feng, Xiaosong Ma. Efficient Intranode Communication in GPU-Accelerated Systems. 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 1838-1847, May 21-25, 2012. [pdf]