# 台灣聯合大學系統110學年度碩士班招生考試試題

# 類組:<u>電機類</u> 科目:<u>計算機系統(計算機組織)(300A)</u>

共\_5\_頁第1\_頁

1. (10%) The importance of having a good branch predictor depends on how often conditional branches are executed. Together with branch predictor accuracy, this will determine how much time is spent stalling due to mispredicted branches. In this problem, assume a 5-stage pipeline RISC processor with the breakdown of dynamic instructions into various instruction categories is as follows

| R-type | BEQ | JUMP | LW  | SW |
|--------|-----|------|-----|----|
| 40%    | 25% | 5%   | 25% | 5% |

Also, assume the following branch predictor accuracies:

| Always-Taken | Always-Not-Taken | 2-Bit Predictor |
|--------------|------------------|-----------------|
| 45%          | 55%              | 85%             |

- (a) (2%) Stall cycles due to mispredicted branches increase the CPI. What is the extra CPI due to mispredicted branches with the always-taken predictor? Assume that branch outcomes are determined in the EX stage, that there are no data hazards, and that no delay slots are used.
- (b) (2%) Repeat (a) for the "always-not-taken" predictor.
- (c) (2%) Repeat (a) for the 2-bit predictor.
- (d) (2%) With the 2-bit predictor, what speedup would be achieved if we could convert half of the branch instructions in a way that replaces a branch instruction with an ALU instruction? Assume that correctly and incorrectly predicted instructions have the same chance of being replaced.
- (e) (2%) Some branch instructions are much more predictable than others. If we know that 80% of all executed branch instructions are easy-to-predict loop-back branches that are always predicted correctly, what is the accuracy of the 2-bit predictor on the remaining 20% of the branch instructions?
- 2. (8%) In this problem, we examine how pipelining affects the clock cycle time of the processor. Assume the individual stages of the datapath have the following latencies:

| IF    | ID    | EXE   | MEM   | WB    |
|-------|-------|-------|-------|-------|
| 300ps | 100ps | 600ps | 350ps | 100ps |

And, assume that instructions executed by the processor are broken down as follows:

| ALU | Beq | LW  | SW  |
|-----|-----|-----|-----|
| 35% | 30% | 20% | 15% |

- (a) (2%) What is the clock cycle time in a pipelined and non-pipelined processor?
- (b) (2%) If we can split one stage of the pipelined datapath into two new stages, each with half the latency of the original stage, which stage would you split and what is the new clock cycle time of the processor?
- (c) (2%) Assuming there are no stalls or hazards, what is the utilization of the data memory?
- (d) (2%) Assuming there are no stalls or hazards, what is the utilization of the write-register port of the "Registers" unit?

注意:背面有試題

## 類組:<u>電機類</u> 科目:計算機系統(計算機組織)(300A)

共\_5\_頁第\_2\_頁

3. (8%) Assume that logic blocks needed to implement a processor's datapath have the following latencies:

| Instruction- | Add   | Mux   | ALU   | Pagistara | Data-  | Sign-  | Shift- |
|--------------|-------|-------|-------|-----------|--------|--------|--------|
| Memory       | Auu   | IVIUX | ALU   | Registers | Memory | Extend | left-2 |
| 200ps        | 100ps | 10ps  | 250ps | 100ps     | 250ps  | 15ps   | 10ps   |

(a) (2%) Consider a datapath similar to the one in the textbook shown below, but for a processor that only has one type of instruction: unconditional PC-relative branch. What would the cycle time be for this datapath?



- (b) (2%) Repeat (a), but this time we need to support only conditional PC-relative branches. The remaining three problems refer to the datapath element Shift-left-2:
- (c) (2%) For which kind of instructions (if any) requires this resource on the critical path?
- (d) (2%) Assuming that we only support beq and add instructions, discuss how changes in the given latency of this resource affect the cycle time of the processor. Assume that the latencies of other resources do not change.
- 4. (8%) Assume that we would like to expand the MIPS register file from 32 registers to 128 registers and expand the instruction set to contain four times as many instructions.
  - (a) (4%) How would this affect the size of each of the bit fields in the R-type, and I-type, respectively, instructions?
  - (b) (4%) How could each of the two proposed changes decrease the size of an MIPS assembly program? On the other hand, how could the proposed change increase the size of an MIPS assembly program?

注意:背面有試題

## 台灣聯合大學系統 110 學年度碩士班招生考試試題

類組:<u>電機類</u> 科目:<u>計算機系統(計算機組織)(300A)</u>

共\_5\_頁第3頁

5. (12%) Translate the following C code to MIPS assembly code. Please use a minimum number of instructions. Assume that the values of a, b, i and j are in registers \$s0, \$s1, \$t0, and \$t1, respectively. Also, assume that register \$s2 holds the base address of the array D.

- 6. (4%) SISD, SIMD, MISD, and MIMD are one categorization of parallel hardware architectures proposed in the 1960s and still used today. What are SISD, SIMD, MISD, and MIMD? Please give a brief explanation and one example for each architecture.
- 7. [9%] Consider two different implementations, M1 and M2, of the same instruction set. There are three classes of instructions (A, B, and C) in the instruction set. M1 has a clock rate of 100MHz and M2 has a clock rate of 150MHz. The average number of cycles for each instruction class and their frequencies (for a typical program) are as the table below. (Note, the MIPS in these statements stand for Millions of Instruction Per Second)

| Instruction | Machine M1-              | Machine M2-              | Frequency |
|-------------|--------------------------|--------------------------|-----------|
| class       | Cycles/instruction class | Cycles/instruction class |           |
| A           | 1                        | 2                        | 60%       |
| В           | 3                        | 3                        | 30%       |
| С           | 5                        | 4                        | 10%       |

- (a) [3%] What is the CPI for M1 and M2?
- (b) [3%] What is the MIPS rating for MIPS<sub>M1</sub> and MIPS<sub>M2</sub>?
- (c) [3%] A favorite program runs in 10 seconds on machine M1. If we want to build a new machine M3, which will run this program in 6 seconds. But it require 1.2 times as many clock cycles as M1 for this program. What is the clock rate of M3?

#### 8. [13%]

- (a) [3%] Assume 185 and 169 are signed 8-bit decimal integers store in sign-magnitude format. Calculate 185+169. The result should be written in decimal.
- (b) [3%] Assume 185 and 169 are signed 8-bit decimal integers store in two's complement format. Calculate 185+169. The result should be written in decimal.
- (c) [3%] What decimal number does the bit pattern 0x0E000000 represent if it is a floating point number? Use the IEEE 754 standard.
- (d) [4%] Calculate the binary number (1.1100)\*(-0.1101) step by step. You should use guard, round and sticky bit in rounding step. The result should store in IEEE 754 with 5 bits mantissa.

| S    | EXP   | Mantissa |
|------|-------|----------|
| 1bit | 8bits | 5bits    |

## 台灣聯合大學系統 110 學年度碩士班招生考試試題

## 類組:電機類 科目:計算機系統(計算機組織)(300A)

共\_5\_\_頁 第\_4\_頁

9. [16%] For a direct-mapped cache design with 32-bit address, the following bits of the address are used to access the cache:

|         | Tag field | Index field | Offset field |
|---------|-----------|-------------|--------------|
| Case I  | [31—10]   | [9—4]       | [30]         |
| Case II | [31—12]   | [11—5]      | [40]         |

- (a) [3%] What is the block size (in words) of each case?
- (b) [3%] For the total of two cases, how many entries does the cache have??

Starting from power on, the following byte-addressed cache references are recorded.

| address | 0 | 4 | 16 | 132 | 232 | 160 | 1024 | 30 | 140 | 3100 |
|---------|---|---|----|-----|-----|-----|------|----|-----|------|

- (c) [3%] For Case I, how many blocks are replaced?
- (d) [3%] For Case I, what is the hit ratio?
- (e) [4%] For Case I as the final state of the cache, how many valid entries in this cache? Show the index for each valid entry.

#### 10. [6%]

- (a) [3%] Assume the miss rate of an instruction cache is 2% and the miss rate of the data cache is 4%. If a processor has a CPI of 2 without any memory stalls and the miss penalty is 200 cycles for all misses. Assume the frequency of all loads and stores is 36%. Determine how much faster a processor would run with a perfect cache that never missed.
- (b) [3%] If the processor has a base CPI of 1, and clock rate of 4 GHz. Assume the memory access time is 100 ns, including all the miss handling. Suppose the miss rate per instruction at the primary cache is 3%. Now we add a secondary data cache that has a 5 ns access time for either a hit or a miss, and is large enough to reduce the miss rate to main memory to 0.5%. What is the total CPI of this two-level cache?
- 11. [6%] Listed below are key page table parameters.

| Virtual address |  | Physical DRAM | Page size | PTE size |
|-----------------|--|---------------|-----------|----------|
| size            |  | installed     |           |          |
| 42 bits         |  | 8GiB          | 4 KiB     | 4 bytes  |

- (a) [3%] For a single-level page table, how much physical memory is needed for storing the page table?
- (b) [3%] In a memory hierarchy system, we can integrate virtual memory, TLB and cache. A memory reference can encounter three different types of misses: a TLB miss, a page fault and a cache miss. Considering all the combinations of these three events with one or more occurring. Identify the number for the three events which is impossible.

注意:背面有試題

# 台灣聯合大學系統110學年度碩士班招生考試試題

類組:<u>電機類</u> 科目:<u>計算機系統(計算機組織)(300A)</u> 共 5 頁 第 5 頁

| Event number | TLB  | Page table | Cache |
|--------------|------|------------|-------|
| 1            | Miss | Miss       | Miss  |
| 2            | Miss | Miss       | Hit   |
| 3            | Miss | Hit        | Miss  |
| 4            | Miss | Hit        | Hit   |
| 5            | Hit  | Miss       | Miss  |
| 6            | Hit  | Miss       | Hit   |
| 7            | Hit  | Hit        | Miss  |