This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
readings [2015/02/16 20:19] rachata |
readings [2015/02/20 19:26] rachata |
||
---|---|---|---|
Line 147: | Line 147: | ||
* {{p50-fatahalian.pdf|Fatahalian, K., & Houston, M. (2008). A closer look at GPUs. Commun. ACM.}} | * {{p50-fatahalian.pdf|Fatahalian, K., & Houston, M. (2008). A closer look at GPUs. Commun. ACM.}} | ||
+ | ===== Lecture 14 (2/18 Wed.) ===== | ||
+ | ** Required ** | ||
+ | * {{04523358.pdf|Lindholm, E., Nickolls, J., Oberman, S., & Montrym, J. (2008). NVIDIA Tesla: A Unified Graphics and Computing Architecture. Micro, IEEE.}} | ||
+ | * {{p50-fatahalian.pdf|Fatahalian, K., & Houston, M. (2008). A closer look at GPUs. Commun. ACM.}} | ||
+ | |||
+ | **Mentioned during lecture:** | ||
+ | * {{30470407.pdf|Fung, W. W. L., Sham, I., Yuan, G., & Aamodt, T. M. (2007). Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture.}} | ||
+ | * {{p253-suleman.pdf |Suleman, M. A., Mutlu, O., Qureshi, M. K., & Patt, Y. N. (2009). Accelerating critical section execution with asymmetric multi-core architectures. Proceedings of the 14th international conference on Architectural support for programming languages and operating systems.}} | ||
+ | * {{01447203.pdf|Flynn, M. J. (1966). Very high-speed computing systems. Proceedings of the IEEE.}} | ||
+ | * {{fisher_-_1983_-_very_long_instruction_word_architectures_and_the_eli-512.pdf|Fisher, J. A. (1983). Very Long Instruction Word architectures and the ELI-512. Proceedings of the 10th annual international symposium on Computer architecture.}} | ||
+ | * {{Smith-1982-Decoupled-Access-Execute-Computer-Architectures.pdf|Smith, J. E. (1982). Decoupled access/execute computer architectures. Proceedings of the 9th annual symposium on Computer Architecture.}} | ||
+ | * {{p289-smith.pdf|Smith, J. E. (1984). Decoupled access/execute computer architectures. ACM Trans. Comput. Syst.}} | ||
+ | * {{p199-smith.pdf|Smith, J. E., Dermer, G. E., Vanderwarn, B. D., Klinger, S. D., & Rozewski, C. M. (1987). The ZS-1 central processor. Proceedings of the second international conference on Architectual support for programming languages and operating systems.}} | ||
+ | * {{00030730.pdf|Smith, J. E. (1989). Dynamic instruction scheduling and the Astronautics ZS-1. IEEE Computer.}} | ||
+ | * {{kung_-_1982_-_why_systolic_architectures.pdf|Kung, H. T. (1982). Why Systolic Architectures? IEEE Computer.}} | ||
+ | * {{annaratone_et_al._-_1986_-_warp_architecture_and_implementation.pdf|Annaratone, M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. S. (1986). Warp architecture and implementation. Proceedings of the 13th annual international symposium on Computer architecture.}} | ||
+ | * {{annaratone_et_al._-_1987_-_the_warp_computer_architecture_implementation_and_performance.pdf|Annaratone, M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. (1987). The warp computer: Architecture, implementation, and performance. IEEE Transactions on Computers.}} | ||
+ | |||
+ | ===== Lecture 15 (2/20 Fri.) ===== | ||
+ | ** Required ** | ||
+ | * {{04523358.pdf|Lindholm, E., Nickolls, J., Oberman, S., & Montrym, J. (2008). NVIDIA Tesla: A Unified Graphics and Computing Architecture. Micro, IEEE.}} | ||
+ | * {{p50-fatahalian.pdf|Fatahalian, K., & Houston, M. (2008). A closer look at GPUs. Commun. ACM.}} | ||
+ | |||
+ | **Mentioned during lecture:** | ||
+ | * {{30470407.pdf|Fung, W. W. L., Sham, I., Yuan, G., & Aamodt, T. M. (2007). Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture.}} | ||
+ | * {{fisher_-_1983_-_very_long_instruction_word_architectures_and_the_eli-512.pdf|Fisher, J. A. (1983). Very Long Instruction Word architectures and the ELI-512. Proceedings of the 10th annual international symposium on Computer architecture.}} | ||
+ | * {{Smith-1982-Decoupled-Access-Execute-Computer-Architectures.pdf|Smith, J. E. (1982). Decoupled access/execute computer architectures. Proceedings of the 9th annual symposium on Computer Architecture.}} | ||
+ | * {{p289-smith.pdf|Smith, J. E. (1984). Decoupled access/execute computer architectures. ACM Trans. Comput. Syst.}} | ||
+ | * {{p199-smith.pdf|Smith, J. E., Dermer, G. E., Vanderwarn, B. D., Klinger, S. D., & Rozewski, C. M. (1987). The ZS-1 central processor. Proceedings of the second international conference on Architectual support for programming languages and operating systems.}} | ||
+ | * {{00030730.pdf|Smith, J. E. (1989). Dynamic instruction scheduling and the Astronautics ZS-1. IEEE Computer.}} | ||
+ | * {{kung_-_1982_-_why_systolic_architectures.pdf|Kung, H. T. (1982). Why Systolic Architectures? IEEE Computer.}} | ||
+ | * {{annaratone_et_al._-_1986_-_warp_architecture_and_implementation.pdf|Annaratone, M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. S. (1986). Warp architecture and implementation. Proceedings of the 13th annual international symposium on Computer architecture.}} | ||
+ | * {{annaratone_et_al._-_1987_-_the_warp_computer_architecture_implementation_and_performance.pdf|Annaratone, M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. (1987). The warp computer: Architecture, implementation, and performance. IEEE Transactions on Computers.}} | ||
+ | * {{jog_orchestrated.pdf|Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, Ravishankar Iyer, and Chita R. Das. 2013. Orchestrated scheduling and prefetching for GPGPUs. ISCA '13}} | ||
+ | * {{large-gpu-warps_micro11|Veynu Narasiman, Michael Shebanow, Chang Joo Lee, Rustam Miftakhutdinov, Onur Mutlu, and Yale N. Patt. 2011. Improving GPU performance via large warps and two-level warp scheduling.MICRO-44}} |