Shaodi Wang, Hochul Lee, Pedram Khalili, Cecile Grezes, Kang L. Wang and Puneet Gupta

University of California, Los Angeles

## VARIATION MONITOR-ASSISTED ADAPTIVE MRAM WRITE



### Write mechanism of STT-RAM and MeRAM



- STT-MTJ write
  - Bi-directional current-driven
  - Critical current density (*J<sub>c</sub>*)
  - Deterministic write
  - Slow (5~10ns)
  - High power (0.2pJ~1 *pJ/bit*) due to low MTJ resistance (1k-10k Ω)

# Voltage-control magnetic tunnel junction (VC-MTJ)

Free layer energy barrier Free MgOFixed V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0 V = 0

- VC-MTJ write
  - Uni-directional voltage-driven
  - Critical voltage (V<sub>c</sub>)
  - Non-deterministic write (leads to write errors)
  - Fast(~1ns)
  - Low power (10~50 *fJ/bit*) due to high MTJ resistance (20k-200k Ω)

#### MRAM write error rate (WER) under variation



#### **MRAM write under variation**



#### Sensing write behavior change under variation

30°C changes WER from  $10^{-6}$  to  $10^{-4} \rightarrow$  High energy and long delay





| Monitor         | Latency            | Accuracy   | Energy         | Area          |
|-----------------|--------------------|------------|----------------|---------------|
| C. Chung, et al | $0.1 \mathrm{ms}$  | $9^{o}C$   | $0.015 \mu J$  | $0.01 mm^2$   |
| K. Woo, et al   | $0.2 \mathrm{ms}$  | $3^{o}C$   | $0.24 \mu J$   | $0.04mm^{2}$  |
| P. Chen, et al  | $1 \mathrm{ms}$    | $2^{o}C$   | $0.49 \mu J$   | $0.01 mm^{2}$ |
| A. Aita, et al  | $100 \mathrm{ms}$  | $0.1^{o}C$ | $13.8 \mu J$   | $0.04 mm^{2}$ |
| this(STT)       | $1-10\mu s$        | $10^{o}C$  | 0.12 - 1.2 n J | $0.0005 mm^2$ |
| this(Me)        | $1\text{-}10\mu s$ | $10^{o}C$  | 0.27 - 2.7 nJ  | $0.0005 mm^2$ |

# Application of the variation monitor - adaptive

#### write

- Dynamically select optimal pulses for multiple-write<sup>1</sup>
  - Write latency variation minimization
    - Three write pulse choices are enough
    - 1.2X for 1-MB STT-RAM write latency improvement
    - 2.4X for 1-MB MeRAM write latency improvement





### **Evaluation of adaptive write**

- Experimental setup:
  - 32nm Single-core X86, 8-MB universal MRAM cache
- Simulations
  - MTJ switching simulation (experimentally verified physical models )
  - Circuit simulation (SPICE and NVSIM)
  - Architecture simulation (gem5)
  - Thermal simulation (Hotspot)
  - Power simulation (CACTI)
- 1.7X and 1.1X application run time improvement for processor with MeRAM and STT-RAM





## Conclusion

- The proposed variation monitor can sense combined wafer-level process and temperature variation
  - 10X faster, 5X energy-efficient, and 20X smaller than conventional 65nm temperature monitor with same accuracy
- Adaptive write scheme dynamically selects optimized write pulse through variation monitoring
  - MeRAM receives more benefit than STT-RAM
    - 2.4X and 1.2X cache speed improvement for MeRAM and STT-RAM
    - MeRAM suffers from more variation impact
    - STT-RAM without multiple-write is expected to see much more improvement in both power and latency (future work)
    - 1.7X application run time reduction for processor with MeRAM cache
    - 1.1X application run time reduction for processor with STT-RAM cache
- Thank you for your attention