Differences

This shows you the differences between two versions of the page.

--- readings [2015/02/06 16:58]
rachata
+++ readings [2015/02/20 19:23]
rachata
@@ Line 127: / Line 127: @@
   * {{kim_isca07.pdf|Hyesoon Kim, José A. Joao, Onur Mutlu, Chang Joo Lee, Yale N. Patt, and Robert Cohn. 2007. VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization. ISCA'07}}
+===== Lecture 11 (2/11 Wed.) =====
+**Required:**
+**Mentioned in the Lecture:**
+  * {{p18-hwu.pdf|Hwu and Patt (1987). Checkpoint Repair for Out-of-order Execution Machines.}}
+  * {{00004607.pdf|Smith, J. E., & Pleszkun, A. R. (1988). Implementing precise interrupts in pipelined processors. Computers, IEEE Transactions on.}}
+  * {{ogehl.pdf | Seznec (2005). Analysis of the O-GEometric History Length Branch Predictor. ISCA}}
+  * {{hpca01.pdf|Daniel A. Jimenez and Calvin Lin. 2001. Dynamic Branch Prediction with Perceptrons. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA '01)}}
+===== Lecture 12 (2/13 Fri.) =====
+**Required:**
+  * {{kessler_-_1999_-_the_alpha_21264_microprocessor.pdf|Kessler, R. E. (1999). The Alpha 21264 Microprocessor. IEEE Micro.}}
+===== Lecture 13 (2/16 Mon.) =====
+**Required:**
+  * {{kessler_-_1999_-_the_alpha_21264_microprocessor.pdf|Kessler, R. E. (1999). The Alpha 21264 Microprocessor. IEEE Micro.}}
+  * {{00476078.pdf|Smith, J. E., & Sohi, G. S. (1995). The microarchitecture of superscalar processors. Proceedings of the IEEE.}}
+  * {{04523358.pdf|Lindholm, E., Nickolls, J., Oberman, S., & Montrym, J. (2008). NVIDIA Tesla: A Unified Graphics and Computing Architecture. Micro, IEEE.}}
+  * {{p50-fatahalian.pdf|Fatahalian, K., & Houston, M. (2008). A closer look at GPUs. Commun. ACM.}}
+===== Lecture 13 (2/18 Wed.) =====
+** Required **
+  * {{04523358.pdf|Lindholm, E., Nickolls, J., Oberman, S., & Montrym, J. (2008). NVIDIA Tesla: A Unified Graphics and Computing Architecture. Micro, IEEE.}}
+  * {{p50-fatahalian.pdf|Fatahalian, K., & Houston, M. (2008). A closer look at GPUs. Commun. ACM.}}
+**Mentioned during lecture:**
+  * {{30470407.pdf|Fung, W. W. L., Sham, I., Yuan, G., & Aamodt, T. M. (2007). Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture.}}
+  * {{p253-suleman.pdf |Suleman, M. A., Mutlu, O., Qureshi, M. K., & Patt, Y. N. (2009). Accelerating critical section execution with asymmetric multi-core architectures. Proceedings of the 14th international conference on Architectural support for programming languages and operating systems.}}
+  * {{01447203.pdf|Flynn, M. J. (1966). Very high-speed computing systems. Proceedings of the IEEE.}}
+  * {{fisher_-_1983_-_very_long_instruction_word_architectures_and_the_eli-512.pdf|Fisher, J. A. (1983). Very Long Instruction Word architectures and the ELI-512. Proceedings of the 10th annual international symposium on Computer architecture.}}
+  * {{Smith-1982-Decoupled-Access-Execute-Computer-Architectures.pdf|Smith, J. E. (1982). Decoupled access/execute computer architectures. Proceedings of the 9th annual symposium on Computer Architecture.}}
+  * {{p289-smith.pdf|Smith, J. E. (1984). Decoupled access/execute computer architectures. ACM Trans. Comput. Syst.}}
+  * {{p199-smith.pdf|Smith, J. E., Dermer, G. E., Vanderwarn, B. D., Klinger, S. D., & Rozewski, C. M. (1987). The ZS-1 central processor. Proceedings of the second international conference on Architectual support for programming languages and operating systems.}}
+  * {{00030730.pdf|Smith, J. E. (1989). Dynamic instruction scheduling and the Astronautics ZS-1. IEEE Computer.}}
+  * {{kung_-_1982_-_why_systolic_architectures.pdf|Kung, H. T. (1982). Why Systolic Architectures? IEEE Computer.}}
+  * {{annaratone_et_al._-_1986_-_warp_architecture_and_implementation.pdf|Annaratone, M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. S. (1986). Warp architecture and implementation. Proceedings of the 13th annual international symposium on Computer architecture.}}
+  * {{annaratone_et_al._-_1987_-_the_warp_computer_architecture_implementation_and_performance.pdf|Annaratone, M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. (1987). The warp computer: Architecture, implementation, and performance. IEEE Transactions on Computers.}}
+===== Lecture 14 (2/20 Fri.) =====
+** Required **
+  * {{04523358.pdf|Lindholm, E., Nickolls, J., Oberman, S., & Montrym, J. (2008). NVIDIA Tesla: A Unified Graphics and Computing Architecture. Micro, IEEE.}}
+  * {{p50-fatahalian.pdf|Fatahalian, K., & Houston, M. (2008). A closer look at GPUs. Commun. ACM.}}
+**Mentioned during lecture:**
+  * {{30470407.pdf|Fung, W. W. L., Sham, I., Yuan, G., & Aamodt, T. M. (2007). Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture.}}
+  * {{fisher_-_1983_-_very_long_instruction_word_architectures_and_the_eli-512.pdf|Fisher, J. A. (1983). Very Long Instruction Word architectures and the ELI-512. Proceedings of the 10th annual international symposium on Computer architecture.}}
+  * {{Smith-1982-Decoupled-Access-Execute-Computer-Architectures.pdf|Smith, J. E. (1982). Decoupled access/execute computer architectures. Proceedings of the 9th annual symposium on Computer Architecture.}}
+  * {{p289-smith.pdf|Smith, J. E. (1984). Decoupled access/execute computer architectures. ACM Trans. Comput. Syst.}}
+  * {{p199-smith.pdf|Smith, J. E., Dermer, G. E., Vanderwarn, B. D., Klinger, S. D., & Rozewski, C. M. (1987). The ZS-1 central processor. Proceedings of the second international conference on Architectual support for programming languages and operating systems.}}
+  * {{00030730.pdf|Smith, J. E. (1989). Dynamic instruction scheduling and the Astronautics ZS-1. IEEE Computer.}}
+  * {{kung_-_1982_-_why_systolic_architectures.pdf|Kung, H. T. (1982). Why Systolic Architectures? IEEE Computer.}}
+  * {{annaratone_et_al._-_1986_-_warp_architecture_and_implementation.pdf|Annaratone, M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. S. (1986). Warp architecture and implementation. Proceedings of the 13th annual international symposium on Computer architecture.}}
+  * {{annaratone_et_al._-_1987_-_the_warp_computer_architecture_implementation_and_performance.pdf|Annaratone, M., Arnould, E., Gross, T., Kung, H. T., & Lam, M. (1987). The warp computer: Architecture, implementation, and performance. IEEE Transactions on Computers.}}
+  * {{jog_orchestrated.pdf|Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, Ravishankar Iyer, and Chita R. Das. 2013. Orchestrated scheduling and prefetching for GPGPUs. ISCA '13}}
+  * {{large-gpu-warps_micro11|Veynu Narasiman, Michael Shebanow, Chang Joo Lee, Rustam Miftakhutdinov, Onur Mutlu, and Yale N. Patt. 2011. Improving GPU performance via large warps and two-level warp scheduling.MICRO-44}}

18-447 Introduction to Computer Architecture – Spring 2015

User Tools

Site Tools

Differences

Page Tools