Dynamic data layouts for cache-conscious implementation of a class of signal transforms
暂无分享,去创建一个
[1] Dragan Mirkovic,et al. An adaptive software library for fast Fourier transforms , 2000, ICS '00.
[2] Mahmut T. Kandemir,et al. Static and Dynamic Locality Optimizations Using Integer Linear Programming , 2001, IEEE Trans. Parallel Distributed Syst..
[3] R. C. Whaley,et al. Automatically Tuned Linear Algebra Software (ATLAS) , 2011, Encyclopedia of Parallel Computing.
[4] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[5] Matteo Frigo,et al. Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).
[6] Margaret Martonosi,et al. Characterizing the Memory Behavior of Compiler-Parallelized Applications , 1996, IEEE Trans. Parallel Distributed Syst..
[7] A. W. M. van den Enden,et al. Discrete Time Signal Processing , 1989 .
[8] Mahmut T. Kandemir,et al. Compiler-directed selection of dynamic memory layouts , 2001, Ninth International Symposium on Hardware/Software Codesign. CODES 2001 (IEEE Cat. No.01TH8571).
[9] David J. DeWitt,et al. Weaving Relations for Cache Performance , 2001, VLDB.
[10] Sandeep K. S. Gupta,et al. Implementing Fast Fourier Transforms on Distributed-Memory Multiprocessors Using Data Redistributions , 1994, Parallel Process. Lett..
[11] Hiroshi Nakamura,et al. Augmenting Loop Tiling with Data Alignment for Improved Cache Performance , 1999, IEEE Trans. Computers.
[12] Mithuna Thottethodi,et al. Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.
[13] George Karypis,et al. Introduction to Parallel Computing , 1994 .
[14] Ramesh C. Agarwal,et al. A high performance parallel algorithm for 1-D FFT , 1994, Proceedings of Supercomputing '94.
[15] Mahmut T. Kandemir,et al. Improving locality using loop and data transformations in an integrated framework , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[16] Olivier Temam,et al. To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts , 1993, Supercomputing '93. Proceedings.
[17] R. Tolimieri,et al. Algorithms for Discrete Fourier Transform and Convolution , 1989 .
[18] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[19] Markus Püschel,et al. In search of the optimal Walsh-Hadamard transform , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[20] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[21] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[22] W. Jalby,et al. To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts , 1993, Supercomputing '93.
[23] Monica S. Lam,et al. Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..
[24] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[25] David H. Bailey,et al. FFTs in external or hierarchical memory , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[26] Sebastian Egner,et al. Zur algorithmischen Zerlegungstheorie linearer Transformationen mit Symmetrie , 1997 .
[27] Larry Carter,et al. Faster FFTs via architecture-cognizance , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).
[28] Kevin R. Wadleigh,et al. High Performance FFT Algorithms for Cache-Coherent Multiprocessors , 1999, Int. J. High Perform. Comput. Appl..
[29] David H. Bailey. Unfavorable strides in cache memory systems , 1992 .
[30] C. Loan. Computational Frameworks for the Fast Fourier Transform , 1992 .