Chapter 1 Solutions S-3 To protect the rights of the author(s) and publisher we inform you that this PDF is an uncorrected proof for internal business use only by the author(s), editor(s), reviewer(s), Elsevier and typesetter MPS. It is not allowed to publish this proof online or in print. Th is proof copy is the copyright property of the publisher and is confi dential until formal publication.
1.1 Personal computer: Computer that emphasizes delivery of good performance
to a single user at low cost and usually executes third-party soft ware.
Server: Computer used for large workloads and usually accessed via a network.
Embedded computer: Computer designed to run one application or one set
of related applications and integrated into a single system.
1.2
- Performance via Pipelining
- Dependability via Redundancy
- Performance via Prediction
- Make the Common Case Fast
- Hierarchy of Memories
- Performance via Parallelism
- Use Abstraction to Simplify Design
- 1280 × 1024 pixels = 1,310,720 pixels => 1,310,720 × 3 = 3,932,160 bytes/
- 3,932,160 bytes × (8 bits/byte) /100E6 bits/second = 0.31 seconds
1.3 Th e program is compiled into an assembly language program, which is itself assembled into a machine language program.
1.4
frame.
1.5 Desktop Processor Year Tech Max.Clock Speed (GHz) Integer IPC/ core Cores Max.
DRAM Bandwidth (GB/s) SP Floating Point (Gfl op/s) MiB Westmere i7-620
2010 32 3.33 4 2 17.1 107 4
Ivy Bridge i7-3770K
2013 22 3.90 6 4 25.6 250 8
Broadwell i7-6700K
2015 14 4.20 8 4 34.1 269 8
Kaby Lake i7-7700K
2017 14 4.50 8 4 38.4 288 8
Coffee Lake i7-9700K
2019 14 4.90 8 8 42.7 627 12
Imp./year 20% 4% 7% 15% 10% 19% 12% Doubles every 4 years 18 years 10 years 5 years 7 years 4 years 6 years solution 1123.indd 3solution 1123.indd 322-09-2021 20:52:2222-09-2021 20:52:22 Computer Organization and Design MIPS Edition The Hardware Software Interface, 6e David Patterson, John Hennessy Solution Manual all Chapters 1 / 4
S-4 Chapter 1 Solutions To protect the rights of the author(s) and publisher we inform you that this PDF is an uncorrected proof for internal business use only by the author(s), editor(s), reviewer(s), Elsevier and typesetter MPS. It is not allowed to publish this proof online or in print. Th is proof copy is the copyright property of the publisher and is confi dential until formal publication.
1.6
- performance of P1 (instructions/sec) = 3 × 10
9
/1.5 = 2 × 10
9 performance of P2 (instructions/sec) = 2.5 × 10 9
/1.0 = 2.5 × 10
9 performance of P3 (instructions/sec) = 4 × 10 9
/2.2 = 1.8 × 10
9
- cycles(P1) = 10 × 3 × 10
9
= 30 × 10
9 s cycles(P2) = 10 × 2.5 × 10 9
= 25 × 10
9 s cycles(P3) = 10 × 4 × 10 9
= 40 × 10
9 s
- No. instructions(P1) = 30 × 10
9
/1.5 = 20 × 10
9 No. instructions(P2) = 25 × 10 9
/1 = 25 × 10
9 No. instructions(P3) = 40 × 10 9
/2.2 = 18.18 × 10
9 CPI new = CPI old × 1.2, then CPI(P1) = 1.8, CPI(P2) = 1.2, CPI(P3) = 2.6 f = No. instr. × CPI/time, then f(P1) = 20 × 10 9 × 1.8/7 = 5.14 GHz f(P2) = 25 × 10 9 × 1.2/7 = 4.28 GHz f(P3) = 18.18 × 10 9 × 2.6/7 = 6.75 GHz 1.7
a. Class A: 10
5
instr. Class B: 2 × 10
5
instr. Class C: 5 × 10
5
instr. Class D: 2 ×
10 5 instr.Time = No. instr. × CPI/clock rate Total time P1 = (10 5
+ 2 × 10
5
× 2 + 5 × 10
5
× 3 + 2 × 10
5
× 3)/(2.5 × 10
9 )
= 10.4 × 10
−4 s Total time P2 = (10 5
× 2 + 2 × 10
5
× 2 + 5 × 10
5
× 2 + 2 × 10
5
× 2)/(3 × 10
9 )
= 6.66 × 10
−4 s
CPI(P1) = 10.4 × 10
−4
× 2.5 × 10
9 /10 6 = 2.6
CPI(P2) = 6.66 × 10
−4
× 3 × 10
9 /10 6 = 2.0
- clock cycles(P1) = 10
5
× 1+ 2 × 10
5
× 2 + 5 × 10
5
× 3 + 2 × 10
5
× 3 = 26 × 10
5 clock cycles(P2) = 10 5
× 2+ 2 × 10
5
× 2 + 5 × 10
5
× 2 + 2 × 10
5
× 2 = 20 × 10
5 1.8
- CPI = T
exec × f/No. instr.Compiler A CPI = 1.1 Compiler B CPI = 1.25 solution 1123.indd 4solution 1123.indd 422-09-2021 20:52:2222-09-2021 20:52:22 2 / 4
Chapter 1 Solutions S-5 To protect the rights of the author(s) and publisher we inform you that this PDF is an uncorrected proof for internal business use only by the author(s), editor(s), reviewer(s), Elsevier and typesetter MPS. It is not allowed to publish this proof online or in print. Th is proof copy is the copyright property of the publisher and is confi dential until formal publication.
- f
- T
B /f A = (No. instr.(B) × CPI(B))/(No. instr.(A) × CPI(A)) = 1.37
A /T new
= 1.67
T B /T new
= 2.27
1.9
1.9.1 C = 2 × DP/(V
2 × F)
Pentium 4: C = 3.2E–8F
Core i5 Ivy Bridge: C = 2.9E–8F
1.9.2 Pentium 4: 10/100 = 10%
Core i5 Ivy Bridge: 30/70 = 42.9%
1.9.3 (S
new
- D
- D
new )/(S old
old
) = 0.90
D new
= C × V
new
2 × F
S old = V old × I S new = V new × I
Th erefore:
V new = [D new
/(C × F)]1/2
D new
= 0.90 × (S
old
- D
old
) − S
new S new = V new × (S old /V old )
Pentium 4:
S new = V new
× (10/1.25) = V
new × 8 D new
= 0.90 × 100 − V
new
× 8 = 90 − V
new × 8 V new
= [(90 − V
new
× 8)/(3.2E8 × 3.6E9)]
1/2 V new
= 0.85 V
Core i5:
S new = V new
× (30/0.9) = V
new
× 33.3
D new
= 0.90 × 70 − V
new
× 33.3 = 63 − V
new
× 33.3
V new
= [(63 − V
new
× 33.3)/(2.9E8 × 3.4E9)]
1/2 V new
= 0.64 V
solution 1123.indd 5solution 1123.indd 522-09-2021 20:52:2222-09-2021 20:52:22 3 / 4
S-6 Chapter 1 Solutions To protect the rights of the author(s) and publisher we inform you that this PDF is an uncorrected proof for internal business use only by the author(s), editor(s), reviewer(s), Elsevier and typesetter MPS. It is not allowed to publish this proof online or in print. Th is proof copy is the copyright property of the publisher and is confi dential until formal publication.
1.10
1.10.1
p # arith inst. # L/S inst. # branch inst. cycles ex. time speedup
1 2.56E9 1.28E9 2.56E8 1.92E10 9.60 1.00
2 1.83E9 9.14E8 2.56E8 1.41E10 7.04 1.36
4 9.14E8 4.57E8 2.56E8 7.68E9 3.84 2.50
8 4.57E8 2.29E8 2.56E8 4.48E9 2.24 4.29
1.10.2
p ex. time
1 41.02 29.34 14.68 7.33
1.10.3 3
1.11
1.11.1 die area
15cm = wafer area/dies per wafer = π × 7.5 2 / 84 = 2.10 cm 2 yield 15cm
= 1/(1 + (0.020 × 2.10/2))
2
= 0.9593
die area 20cm = wafer area/dies per wafer = π × 10 2 /100 = 3.14 cm 2 yield 20cm
= 1/(1 + (0.031 × 3.14/2))
2
= 0.9093
1.11.2 cost/die
15cm
= 12/(84 × 0.9593) = 0.1489
cost/die 20cm
= 15/(100 × 0.9093) = 0.1650
1.11.3 die area
15cm = wafer area/dies per wafer = π × 7.5 2 /(84 × 1.1) = 1.91 cm 2 yield 15cm
= 1/(1 + (0.020 × 1.15 × 1.91/2))
2
= 0.9575
die area 20cm = wafer area/dies per wafer = π × 10 2 /(100 × 1.1) = 2.86 cm 2 yield 20cm
= 1/(1 + (0.03 × 1.15 × 2.86/2))
2
= 0.9082
1.11.4 defects per area
0.92 = (1–y .5 )/(y .5 × die_area/2) = (1 − 0.92 .5 )/ (0.92 .5 × 2/2) = 0.043 defects/cm 2 defects per area 0.95 = (1–y .5 )/(y .5 × die_area/2) = (1 − 0.95 .5 )/ (0.95 .5 × 2/2) = 0.026 defects/cm 2 1.12
1.12.1 CPI = clock rate × CPU time/instr. count
clock rate = 1/cycle time = 3 GHz CPI(bzip2) = 3 × 10 9
× 750/(2389 × 10
9
)= 0.94
solution 1123.indd 6solution 1123.indd 622-09-2021 20:52:2222-09-2021 20:52:22
- / 4