#### UCLA



## Latency, Bandwidth and Power Benefits of the Simple Universal Parallel intERface (SuperCHIPS) Integration Scheme

SivaChandra Jangam, Saptadeep Pal, Adeel A. Bajwa, Sudhakar Pamarti, Puneet Gupta and Subramanian S. Iyer

Center for Heterogeneous Integration and Performance Scaling (CHIPS)

**Electrical Engineering Department** 

University of California Los Angeles

UCLA ENGINEERING Henry Samueli School of Engineering and Applied Science

Birthplace of the Internet



IEEE 67th ECTC – Orlando, FL, USA

The 67th Electronic Components

and Technology Conference

#### Outline

- Motivation
- Simple Universal Parallel intERface (SuperCHIPS) Protocol
  - SuperCHIPS Fine Pitch Interconnect (FPI) Scheme
  - Silicon Interconnect Fabric (Si-IF)
- Interconnect Modelling
  - PCB vs Si-IF links
  - Superior Transfer Characteristics for High Speed Data Transfer
  - Signal Integrity Analysis
- Benefits of SuperCHIPS protocol

   Si-IF vs Conventional PCB
- Experimental Results
- Conclusion





#### **Motivation**



- High communication Bandwidth & low Power consumption
- Fine pitch interconnects operating at lower speed for lower energy per bit and reduced area per channel.



IEEE 67th ECTC –Orlando, FL, USA



## **SuperCHIPS Protocol**





## **SuperCHIPS Fine Pitch Interconnect (FPI) Scheme**

- Die-to-Wafer Bonding
  - Metal-metal Thermal Compression Bonding (TCB)
- SuperCHIPS FPI Scheme
  - Silicon Interconnect Fabric (Si-IF)
  - Small Dielets (0.5 5 mm edge length)
  - Fine pitch (2 10 µm) interconnects
  - Inter-dielet spacing (50 100 µm)





CHIPS CHIPS ENTER FOR HETEROGRAPHICS INTERNATION

IEEE 67th ECTC – Orlando, FL, USA

SivaChandra Jangam

## Silicon Interconnect Fabric (Si-IF)

- Thermomechanical Properties
  - Rigid and Mechanically robust substrate.
  - Minimize thermomechanical mismatch.
  - Good heat dissipation.
- Electrical Properties
  - Fine traces:  $(1 5 \mu m)$ .
  - Fine pitch interconnects: (2 10 µm).
  - Up to 4 levels of dual damascene wiring.



A. A. Bajwa, et.al, "Fine Pitch Die-to-Si Interconnections using Thermal Compression Bonding", ECTC (2017).

Friday, June 2, 8:00 am. Southern Hemisphere II.





## **Interconnect Modelling**

- 3D interconnect models simulated in ANSYS HFSS.
- BOEL top metal layer dimensions for links
  - 1 μm width, 1.5-10 μm
     pitch
- Direct Cu-Cu bonding with no intermetallic.
- Different configurations for signal transfer.



(a) GSSG config. (b) GSG config. (c) GSSSSG config.





## PCB vs Si-IF links

| PCB links                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | Si-IF links                                                                                                                                                     |  |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| <ul> <li>Long channels (several mm)         <ul> <li>High parasitic inductance.</li> <li>RLC link behavior.</li> </ul> </li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | <ul> <li>Short channels (&lt;500 µm)</li> <li>– Low parasitic inductance.</li> <li>– RC link behavior.</li> </ul>                                               |  |  |
| <ul> <li>Transmission Line Model         <ul> <li>Signal Reflections &amp; Matching</li> <li>Vs</li> <li>Vs</li></ul></li></ul> | <ul> <li>RC Line Model         <ul> <li>No signal reflections</li> <li>vs</li> <li>Cw</li> <li>Line Model</li> <li>No signal reflections</li> </ul> </li> </ul> |  |  |
| <ul> <li>Inter Symbol Interference         <ul> <li>Large Transceiver ~0.81mm<sup>2</sup>*</li> <li>Energy/bit: &gt;23pJ/bit.</li> </ul> </li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | <ul> <li>No Inter Symbol Interference         <ul> <li>Simple inverter driver ~0.05µm<sup>2</sup></li> <li>Energy/bit: &lt;0.3pJ/bit.</li> </ul> </li> </ul>    |  |  |
| <ul> <li>Synchronous data transfer</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | <ul> <li>Can be Asynchronous</li> </ul>                                                                                                                         |  |  |
| * R. Navid et al., "A 40 Gb/s Serial Link Transceiver in                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 1 28 nm CMOS Technology," JSSC 2015.                                                                                                                            |  |  |

#### **Reduced Link Parasitics**

- Ansys Q3D extractor model.
- Low Parasitic Inductance
   RC link behavior
- Low Parasitic Capacitance – Low latency and power.
- Channel loss <-2dB for 500 µm wires even at 100 GHz.

| Interconnect<br>pitch/ length | R @1GHz*<br>(Ω) | L (nH) | C (fF) |
|-------------------------------|-----------------|--------|--------|
| 2 µm/ 100 µm                  | 2.09            | 0.1    | 17.3   |
| 10 µm/ 100µm                  | 1.89            | 0.1    | 8.54   |

\*Accounting for skin depth



Insertion Loss for 10 µm interconnect pitch.





#### Low Cross-talk

- Excellent dielectric isolation of SiO2.
- Lower Cross-talk than typical acceptable value of -12dB.



NEXT for signals without shared ground



NEXT for signals with shared ground





IEEE 67th ECTC –Orlando, FL, USA

#### Superior Transfer Characteristics for High Speed Data Transfer

- Digital signals (0.1-100 GHz) transfer with loss <-2dB for short channels (< 500 µm).</li>
- Cross-talk is <-15dB for digital signal transfer.
- Can achieve Data-rates of >20Gbps/channel.





#### Low Attenuation for THz frequencies

- Short wires of <100 µm.
  - RC behavior. Characteristic Impedance not defined.
  - Attenuation: < 3dB even for THz signals.</li>
- Achievable termination > 100  $\Omega$ .





# **Signal Integrity Analysis**

- Simple tapered inverter I/O driver. Eliminate SerDES.
- Latency and power dominated by ESD cap.



Eye-diagram of 2  $\mu$ m pitch interconnect

Eye-diagram of 10 µm pitch interconnect



IEEE 67<sup>th</sup> ECTC –Orlando, FL, USA

## **Si-IF vs Conventional PCB**

| Interconnect pitch/protocol       |        | 10 μm on Si IF <mark>Super-</mark><br>CHIPS | 50 µm on Si Interposer<br>DDR3 | 400 μm on FR4<br>PCB/ <mark>SerDes</mark> |  |
|-----------------------------------|--------|---------------------------------------------|--------------------------------|-------------------------------------------|--|
| Dielet Size (mm <sup>2</sup> )    |        | 10-100                                      | 25-600                         | 25-625                                    |  |
| No of signal links                |        | 600-2,000                                   | 100-1,000                      | 100-500                                   |  |
| Inter-die distance (µm)           |        | <500                                        | <5,000                         | 10,000                                    |  |
| Overall Latency (ps)              | No ESD | 40.22                                       | 200[23]                        | 1.000                                     |  |
|                                   | ESD    | 58.8                                        | 500[23]                        | ~1,000                                    |  |
| Max data-rate/link                | No ESD | 13                                          | 1 [24]                         | 40[37]                                    |  |
| (Gbps)                            | ESD    | 4.21                                        | 1.0 <sup>[2+]</sup>            | 40 <sup>[37]</sup>                        |  |
| Energy per bit (                  | oJ/b)  | <0.4                                        | 9.48 <sup>[24]</sup>           | 23.2 <sup>[37]</sup>                      |  |
| Max Bandwidth per mm<br>(Gbps/mm) | No ESD | 1,300                                       | 20                             | 100                                       |  |
|                                   | ESD    | 421                                         | 52                             | 100                                       |  |
| Total I/O power (W)               |        | 2.13-6.74                                   | 6-15                           | 46-230                                    |  |

[23] H. Kalargaris, et. al, "Interconnect design tradeoffs for silicon and glass interposers," (NEWCAS), 2014.
 [24] M. A. Karim, et. al," Power comparison of 2D, 3D and 2.5D interconnect solutions and power optimization of interposer interconnects" ECTC 2016.

[37] R. Navid et al., "A 40 Gb/s Serial Link Transceiver in 28 nm CMOS Technology," JSSC 2015.





IEEE 67th ECTC –Orlando, FL, USA

## **Benefits of SuperCHIPS**

- Inter-dielet distance: 10-20x
- I/O pins compared to BGA:
   15-80x
- Latency: 13-27x
- Energy per bit: 20-80x
- Bandwidth per mm: 30-120x





\*M. A. Karim, et. al," Power comparison of 2D, 3D and 2.5D interconnect solutions and power optimization of interposer interconnects" ECTC 2016.



IEEE 67th ECTC –Orlando, FL, USA

SivaChandra Jangam

## **Experimental Results**

- DC results-
  - Demonstrated continuity with 400 interconnects per daisy chain with 99% yield.
  - Contact resistance: 42 mΩ.

500 um

500

- AC results-
  - High freq measurements in progress.

A.Bajwa,et.al, "Fine Pitch Die-to-Si Interconnections using Thermal Compression Bonding", ECTC 2017.



100 µm

10 µm

100 µm







## Conclusion

- SuperCHIPS protocol shows SoC-like performance with technology heterogeneity and flexibility.
- Channel losses are less than 2dB for digital data transfer of greater than 20Gbps/channel.
- Latencies are 27x smaller compared to PCB.
- Fine Pitch interconnects and shorter channels achieve 120x improvement in Bandwidth per mm.
- 80x Lower power due to elimination of SerDes.
- Reduces cost of design and validation by IP reuse.





#### Acknowledgement

 We thank DARPA and ONR (grant N00014-16-1-263). The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

 Members of the UCLA CHIPS consortium for their support in this work.

chips.ucla.edu



IEEE 67th ECTC –Orlando, FL, USA





# **THANK YOU**

# **Any Questions?**





IEEE 67th ECTC –Orlando, FL, USA

# **BACK** up





IEEE 67th ECTC –Orlando, FL, USA

#### Application of SuperCHIPS to Hexa-core CORTEX M0

- Hexa-core CORTEX M0
   architecture.
  - 2 cores for high throughput.
  - 4 cores for higher energy efficiency.
- Monolithic vs Heterogenous technologies
  - 65nm General Purpose (GP): High performance.
  - 65nm Low Power Early (LPE): Energy efficiency.
  - 15% and 37% energy savings.



■ Iso-Performance Activity Optimized Cores ■ Heterogeneous Process CMP

| Design: CortexM0         | Power in mW |       |      |
|--------------------------|-------------|-------|------|
| Activity Factor          | 0.001       | 0.01  | 0.1  |
| GP+GP: nominal/nominal   | 0.262       | 0.526 | 3.8  |
| GP+LPE: nominal/nominal  | 0.174       | 0.546 | 7.44 |
| LPE+LPE: nominal/nominal | 0.086       | 0.564 | 11.8 |



