



# Photonic Integrated Subsystems for next Generation Leadership Class HPC

#### **Keren Bergman**

Lightwave Research Lab, Columbia University

New York, NY, USA













#### High Performance Systems: Trends and Challenges

- SUMMIT (Oak Ridge National Laboratory)
  - Most powerful supercomputer\* (June, 2018)
  - Peak performance: 122.3 PetaFLOPS (Linpack)
  - Data Analytics applications up to 3.3 ExaFLOPs
  - Power consumption: 13MW
    - Power efficiency: 13.9 GFLOPs/Watt (#5 Green 500)
  - 4608 Nodes with:
    - 200 G (Dual-rail Mellanox EDR 100G InfiniBand)
    - 9216 IBM Power9 CPUs (2 per node)
    - 27648 Nvidia Volta V100 GPUs (6 per node)

Next challenge:

Reach Exascale+ within 20MW  $\rightarrow$ 



Source: www.olcf.ornl.gov/summit/











#### Performance/Communications Trends for Top 10 (2010-2018)



Sunway TaihuLight (Nov 2017) B/F = 0.004; Summit HPC (June 2018) B/F = 0.0005 → 8X decrease





### Performance and the Data Movement Energy Budget

- GFLOPs/Watt = GFlop/second / Joule/second = GFlop/Joule
- Target: 50 GFLOPs/W ⇔ 20 pJ/FLOP
- Energy per bit total budget (200 bits/FLOP): 14 GFLOPs/W: 72 pJ/FLOP 0.36 pJ/bit 50 GFLOPs/W: 20 pJ/FLOP 0.1 pJ/bit

| Data Movement Energy:                                  |               |
|--------------------------------------------------------|---------------|
| – Access SRAM                                          | O(10fJ/bit)   |
| – Access DRAM cell                                     | O(1 pJ/bit)   |
| – Movement to HBM/MCDRAM (few mm)                      | O(10 pJ/bit)  |
| <ul> <li>Movement to DDR3 off-chip (few cm)</li> </ul> | O(100 pJ/bit) |

- Scaling performance under ultra-tight energy budget:
  - Raise cache hit rates (expanded caches, more reuse)
  - Improve memory access (read, write) energy efficiency
  - Improve data movement energy efficiency:
    - Novel interconnect technologies and architectures



#### Top 500 and "Green 500"

| June 2016       |                |         | Nove      | mber           | 2016    |
|-----------------|----------------|---------|-----------|----------------|---------|
| Name            | Top500<br>rank | GFlop/W | Name      | Top500<br>rank | GFlop/W |
| Shoubu          | 94             | 6.7     | DGX Sat.V | 28             | 9.5     |
| Satsuki         | 486            | 6.2     | Piz Daint | 8              | 7.5     |
| Sunway TL 1 6.1 |                | Shoubu  | 116       | 6.7            |         |
|                 |                |         | Sunway TL | 1              | 6.1     |

| Zettascaler 1.6      |  |  |  |  |  |
|----------------------|--|--|--|--|--|
| Zettascaler 2.0      |  |  |  |  |  |
| Zettascaler 2.2      |  |  |  |  |  |
| Tesla P100           |  |  |  |  |  |
| DGX-1 station + P100 |  |  |  |  |  |
| DGX-1 station + V100 |  |  |  |  |  |

Zettascaler 1.6 + Tesla P100

| June 2017     |                |         |  |  |  |  |
|---------------|----------------|---------|--|--|--|--|
| Name          | Top500<br>rank | Gflop/W |  |  |  |  |
| TSUBAME3.0    | 61             | 14.1    |  |  |  |  |
| kukai         | 465            | 14.0    |  |  |  |  |
| AIST AI Cloud | 148            | 12.7    |  |  |  |  |
| RAIDEN        | 305            | 10.6    |  |  |  |  |
| Wilkes-2      | 100            | 10.4    |  |  |  |  |
| Piz Daint     | 3              | 10.4    |  |  |  |  |
| Gyoukou       | 69             | 10.2    |  |  |  |  |
| GOSAT-2       | 220            | 9.8     |  |  |  |  |
|               | 31             | 9.5     |  |  |  |  |
| DGX Sat.V     | 32             | 9.5     |  |  |  |  |
| Reedbush-H    | 203            | 8.6     |  |  |  |  |
| JADE          | 425            | 8.4     |  |  |  |  |
| Cedar         | 86             | 8.0     |  |  |  |  |
| DAVIDE        | 299            | 7.7     |  |  |  |  |
| Shoubu        | 137            | 6.7     |  |  |  |  |
| Hokule'a      | 466            | 6.7     |  |  |  |  |
| Sunway TL     | 1              | 6.1     |  |  |  |  |

| Novem         | ber 2          | 2017    |
|---------------|----------------|---------|
| Name          | Top500<br>rank | GFlop/W |
| Shoubu B      | 259            | 17.0    |
| Suiren2       | 307            | 16.8    |
| Sakura        | 276            | 16.7    |
| DGX Volta     | 149            | 15.1    |
| Gyoukou       | 4              | 14.2    |
| TSUBAME3.0    | 13             | 13.7    |
| AIST AI Cloud | 195            | 12.7    |
| RAIDEN        | 419            | 10.6    |
| Wilkes-2      | 115            | 10.4    |
| Piz Daint     | 3              | 10.4    |
| Reedbush-L    | 291            | 10.2    |
| GOSAT-2       | 319            | 9.8     |
|               | 35             | 9.5     |
| DGX Saturn V  | 36             | 9.5     |
| Era-Al        | 109            | 8.6     |
| Reedbush-H    | 295            | 8.6     |
| Cedar         | 94             | 8.0     |
| DAVIDE        | 440            | 7.9     |
| Shoubu        | 180            | 6.7     |
| Sunway TL     | 1              | 6.1     |

| June 2018     |            |              |  |  |  |  |
|---------------|------------|--------------|--|--|--|--|
| Name          | Тор<br>500 | Gflop/W      |  |  |  |  |
| Shoubu B      | 359        | 18.4         |  |  |  |  |
| Suiren2       | 419        | 16.8         |  |  |  |  |
| Sakura        | 385        | 16.7         |  |  |  |  |
| DGX Volta     | 227        | 15.1         |  |  |  |  |
| Summit        | 1          | 13.9         |  |  |  |  |
| TSUBAME3.0    | 19         | 13.7         |  |  |  |  |
| AIST AI Cloud | 287        | 12.7         |  |  |  |  |
| Sunway TL     | 2          | 6.1<br>(#23) |  |  |  |  |



### NVidia's GPU/memory Integration Assembly

| June 2016      |                                         |  |  |  |  |
|----------------|-----------------------------------------|--|--|--|--|
| Top500<br>rank | GFlop/W                                 |  |  |  |  |
| 94             | 6.7                                     |  |  |  |  |
| 486            | 6.2                                     |  |  |  |  |
| 1              | 6.1                                     |  |  |  |  |
| 440            | 5.3                                     |  |  |  |  |
| 446            | 4.8                                     |  |  |  |  |
|                | Top500<br>rank<br>94<br>486<br>1<br>440 |  |  |  |  |

|   |                | /              |   |
|---|----------------|----------------|---|
|   |                |                |   |
| / |                | ~100pJ/bit     | ١ |
|   |                |                |   |
|   |                | 130603012+0063 |   |
|   | GDDR           |                |   |
|   | <b>chips</b>   | Kepler         |   |
|   |                |                |   |
|   |                | A TAINA IBITAT |   |
|   |                | GK110-300-A1   |   |
|   |                |                |   |
|   | -              |                |   |
|   |                |                | / |
|   | [techspot.com] |                |   |

| June 2017     |                |      |  |  |  |
|---------------|----------------|------|--|--|--|
| Name          | Top500<br>rank |      |  |  |  |
| TSUBAME3.0    | 61             | 14.1 |  |  |  |
| kukai         | 465            | 14.0 |  |  |  |
| AIST AI Cloud | 148            | 12.7 |  |  |  |
| RAIDEN        | 305            | 10.6 |  |  |  |
| Wilkes-2      | 100            | 10.4 |  |  |  |
| Piz Daint     | 3              | 10.4 |  |  |  |

NVidia major new design



- Memory closer to GPU
- CoWoS: Chip on wafer on Substrate



#### ZettaScaler 2.2

| November 2017 |                |         |  |  |  |
|---------------|----------------|---------|--|--|--|
| Name          | Top500<br>rank | GFlop/W |  |  |  |
| Shoubu B      | 259            | 17.0    |  |  |  |
| Suiren2       | 307            | 16.8    |  |  |  |
| Sakura        | 276            | 16.7    |  |  |  |

- ZettaScaler architecture:
  - Modular design
  - Liquid cooled
  - ThruChip Interface (TCI) with <u>sub-pJ/bit efficiency</u>

Architectures <u>big gains</u> in GFlops/Watt: Innovative Data Movement Solutions



- High Performance Data Centers: Convergence on Al
- Strong interest in energy efficiency of Data Centers on AI
- ...And not only for "small" systems
  - Training Deep Neural Networks (DNN) takes time!
    - "Our network takes between five and six days to train on two GTX 580 3GB GPUs" (Krizhevsky et al., 2012)
    - "On a system equipped with four NVIDIA Titan Black GPUs, training a single net took 2–3 weeks" (Simonyan et al., 2015)
    - "our [...] system trains ResNet-50 [...] on
       256 GPUs in one hour" (Goyal et al., 2017)
- Facebook and NVidia's clusters have 1,000 GPUs (3.3 PFlops)

|       | June 2017     |                | Novem   | ber 2         | 2017           | June    | e 201         |            |  |
|-------|---------------|----------------|---------|---------------|----------------|---------|---------------|------------|--|
|       | Name          | Top500<br>rank | GFlop/W | Name          | Top500<br>rank | GFlop/W | Name          | Top<br>500 |  |
|       | TSUBAME3.0    | 61             | 14.1    | Shoubu B      | 259            | 17.0    | Shoubu B      | 359        |  |
|       | kukai         | 465            | 14.0    | Suiren2       | 307            | 16.8    | Suiren2       | 419        |  |
|       | AIST AI Cloud | 148            | 12.7    | Sakura        | 276            | 16.7    | Sakura        | 385        |  |
|       | RAIDEN        | 305            | 10.6    | DGX Volta     | 149            | 15.1    | DGX Volta     | 227        |  |
|       | Wilkes-2      | 100            | 10.4    | Gyoukou       | 4              | 14.2    | Summit        | 1          |  |
|       | Piz Daint     | 3              | 10.4    | TSUBAME3.0    | 13             | 13.7    | TSUBAME3.0    | 19         |  |
|       | Gyoukou       | 69             | 10.2    | AIST AI Cloud | 195            | 12.7    | AIST AI Cloud | 287        |  |
|       | GOSAT-2       | 220            | 9.8     | RAIDEN        | 419            | 10.6    | Sunway TL     | 2          |  |
| ,<br> | Facebook      | 31             | 9.5     | Wilkes-2      | 115            | 10.4    |               |            |  |
|       | DGX Sat.V     | 32             | 9.5     | Piz Daint     | 3              | 10.4    |               |            |  |
|       | Reedbush-H    | 203            | 8.6     | Reedbush-L    | 291            | 10.2    |               |            |  |
|       | JADE          | 425            | 8.4     | GOSAT-2       | 319            | 9.8     |               |            |  |
|       | Cedar         | 86             | 8.0     | Facebook      | 35             | 9.5     |               |            |  |
|       | DAVIDE        | 299            | 7.7     | DGX Saturn V  | 36             | 9.5     |               |            |  |
|       | Shoubu        | 137            | 6.7     | Era-Al        | 109            | 8.6     |               |            |  |
|       | Hokule'a      | 466            | 6.7     | Reedbush-H    | 295            | 8.6     |               |            |  |
|       | Sunway TL     | 1              | 6.1     | Cedar         | 94             | 8.0     |               |            |  |
|       |               |                |         | DAVIDE        | 440            | 7.9     |               |            |  |
|       |               |                |         | Shoubu        | 180            | 6.7     |               |            |  |
|       |               |                |         | Sunway TL     | 1              | 6.1     |               |            |  |



Gflop/W

18.4

16.8

16.7

15.1

13.9

13.7

12.7

(#23)

6.1



### Scaling chip 'escape' bandwidth density



- 18 NVLink 2.0 ports  $\rightarrow$  9 per long edge top/bottom
- 50GB/s per port (25GB/s each Tx/Rx)
- 1 NVLink ~ 2mm of linear edge
- 50GB/s per 2mm → <u>200Gb/s/mm</u>







#### The Photonic Opportunity for Data Movement



#### **Reduce Energy Consumption**

Eliminate Bandwidth Taper

R. Lucas et al., "Top ten exascale research challenges," DOE ASCAC subcommittee Report, 2014





#### **Silicon Photonics Dense-WDM**

#### Scalable, >Tb/s/mm, <1pJ/bit <u>"distance transparent"</u> Optical Interconnect







### Only "Power Up" Needed Optical Links: Disaggregated Architecture

Current server



However... Inter Node Bandwidth (10 GB/s) << needed Embedded Bandwidth (100 GB/s – 1 TB/s)



#### Disaggregated System Architecture: flexibly interconnected heterogeneous resources







### Multi-Host/Storage Architecture with Photonic I/Os





ASCR/SBIR Collaborative Project (R. Carlson): Photonic-Storage Subsystem Input/Output (P-SSIO)

#### **Objectives:**

- Energy efficient integrated photonic I/O (0.5pJ/b)
- High bandwidth throughput (256GB/s)





#### **P-SSIO System performance goals**

- 4-8 Server class PCIe version 4.0 x 32 controller chips (CPU or dedicated controller)
- 16-32 Non-Volatile Memory Express (NVMe) based Storage Subsystems connected at 16 GT/s I/O rate each
- Simultaneous access from every PCIe controller to multiple NVMe storage devices (256 GB/s aggregate I/O rate with 4 PCIe controllers)
- WDM optical transceivers matched to the PCIe I/O v 4.0 transmission rates
- Reconfigurable optical interconnect fabric
- Low loss Optical connectors and/or integrated Micro Optical Bench assemblies

Partners:





PLC CONNECTIONS





nanoPrecision

products



#### **Disaggregation:** <u>Deeper into the Heterogeneous Hierarchy</u>







#### **Optically-Connected Memory Architecture**



Photonic Memory Controller Module (P-MCM)





#### PHOTONIC MEMORY CONTROLLER MODULE (P-MCM) SBIR COLLABORATIVE PROJECT PHASE 2

| Package     Package     Nemory-in-package HBM2   Approach with 1024 wires with |                                                                          | CPU 1 + - + Optical<br>Switch<br>CPU 2 + - + + + + + + + + + + + + + + + + + | Disaggregated architecture             |
|--------------------------------------------------------------------------------|--------------------------------------------------------------------------|------------------------------------------------------------------------------|----------------------------------------|
| 1Gb/s                                                                          | HBM2                                                                     | HMC gen3                                                                     | Partners:                              |
| Bandwidth                                                                      | 256 GB/s                                                                 | 320 GB/s                                                                     | PHOTONICS                              |
| Ю                                                                              | 8 Parallel (1-2G)128b per channel                                        | 30G SerDes 4 links per<br>HMC                                                | 10 compute connected to<br>10 HMC gen3 |
| Package type                                                                   | Si-interposer                                                            | Discrete (SerDes)                                                            | 10 HMC gen3 PLC CONNECTIONS            |
| Memory access                                                                  | DDR                                                                      | Packet based                                                                 | Aggregate bandwidth: FREEDOM           |
| Target market                                                                  | Graphics, Networking, less frequently accessed memory, Small form-factor | High-performance<br>Computing, Networking                                    | 10*4*16*30Gb/s = 2.4TB/s               |

(a) 1.1

### **P-MCM SiP Subsystems**

 Leveraging the AIM fabrication facility, Analog Photonics is developing SiP photonic based power efficient microdisk transceivers:



(a) Thermal tuning speed of micro-disk modulator. (b) 40Gbit/s eye diagram achieved from a micro-disk modulator. (c) The doping profile and connection design of our vertical junction microdisk modulator. (d) SEM of the fabricated microdisk.





- Delivered 4x25Gbaud low-power WDM transceiver
- Co-integration of front and back-end electronics (Drivers/TIAs)



#### **P-MCM WDM Optical Source**



Single DFB laser

Centralized efficient WDM source for multiple transceiver node







250

150

100

50

200

300

Bias current (mA)

400

500

200 2

Dutput p

Array of DFB laser coupled to a star coupler and distributed as efficient WDM sources.

> *"WDM Source Based on High Power, Efficient* 1280nm DFB Lasers for Terabit Interconnect Technologies" PTL 2018 [in review]



- Transform WDM laser source from a "golden box" form factor to an integrated solution with the SiP transceivers.
- DFB laser technology developed by Freedom exhibit a ultra high efficiency (>35%) when operated at high power.



### **P-MCM SiP Packaging**

Low-loss of fiber arrays coupling and Packaging solutions for optical sources to SiP subsystems

#### PLC CONNECTIONS





- 100-port Silica transposer with 20µm output pitch;
- 6-fiber, 127µm-pitch, lid-less fiber array;
- Recessed optical facet on a typical SiP die

# Electrical/Optical fully packaged SiP switch







#### System architecture and testbed for P-MCM









Processor/HMC gen2 testbed FPGA platform



"Reconfigurable Silicon Photonic Platform for Memory Scalability and Disaggregation." 2018 OFC 2018.



#### Deep Neural Network (DNN) in the P-MCM PLATFORM



- A DNN model is stored on global memory. It is computed with (GPU, CPU, TPU) where each has its own memory (device memory)
- DNN has three bottlenecks: network bandwidth, memory bandwidth and engineer bandwidth. (S. Han, W. Dally, DAC' 18)
- Memory bandwidth (device) could be saved with model compression: Pruning, Quantization, Decomposition, Distillation.
- Network bandwidth (global) could be saved with gradient compression.
- **Problem**: Model matrix with compression becomes sparse = sparse memory access (Yu et. al, ISCA'17)
- Solution:
  - Normal... custom accelerators on FPGA design with optimized memory access.
  - Goal ... Photonics could enable scalability, disaggregation and increased bandwidth for DNNs.



#### The Integrated Photonics Manufacturing Institute's Core Hubs - Albany



- 300mm tools provide unprecedented quality silicon photonics
   unmatched 2.5D/3D stacking w/CMOS
- □ partnerships drive continuous revitalization investments





#### SUNY POLYTECHNIC INSTITUTE



95nm Si<sub>3</sub>N<sub>4</sub> Taper on

Si Waveguide





AIM Photonics Proprietary and Confidential © Copyright 2016

t 2016

24



#### **ASIC / Silicon Photonic Interposer Integration**



#### Active Interposer 2.5D





### Active Interposer Full Network on Chip (NoC)

- <u>Active</u> interposer combines PIC and interposer
   = single platform
- Allows for laser integration
- EIC chips flip chipped on top of PIC
- Ideal platform for fully integrated network on chip









### Active Interposer Full Network-on-Chip (NoC)

- TX and RX: located on ring around outside of active interposer to shorten RF electrical paths and PCB routing
- All signals route out to PCB through BGAs on back of interposer
- Switch: 8x8 MZI based switch
- **RX EIC:** TIAs for 11.3 Gbps, single channel, integrated on active interposer
- Additional switch, modulators, and laser integration test structures





#### **Photonic Switch Fabrication and Packaging – silicon interposer**

AIM 3<sup>rd</sup> Run 12x12 T-O Clos switch-and-select





- Small pitch of bonding pads on chip with large density: 100um
- Fine Pitch of electrical traces: 8um
- Enabled complex PIC with reduced footprint improving loss/performance



## Adaptive, Flexible Connectivity → Deep Disaggregation

- <u>Universal</u> photonic WDM-switch fabric
  - Extend TB/s photonic connectivity
  - 'anywhere' in the system
- Flexibly assembled topology, direct connectivity of resources
  - Energy efficient usage
- Transparent for packets
  - Low-Latency direct connectivity





# 2019 SBIR/STTR Phase 1 Collaborative Development Project: Photonic - Universal Accelerator Interconnect

- The photonic UAI must deliver >1 TByte/s of bandwidth to the CPU/Memory/Accelerator at bandwidth densities of >800Gb/s per optical I/O channel.
- Accelerator chips may be located up to 100 meters distance from the CPU/memory.
- Any chip (e.g., CPU core, Memory module, or accelerator) must be able to communicate directly with any other chip.

# COLUMBIA UNIVERSITY



#### **SUMMIT – Node details**

- IBM AC922 nodes
- 2x IBM POWER9 + 512GB DDR4
  - 44 cores / 176 threads; 3GB RAM / thread  $\rightarrow$  240GB/s total BW
- 6x NVIDIA Volta GPU
  - 16GB HBM / GPU  $\rightarrow$  900GB/s / GPU
  - 96GB HBM total  $\rightarrow$  5.4 TB/s total BW
- NVLink
  - 2 groups of 1 CPU + 3 GPUs
  - Within a group: all-to-all connected
  - 100GB/s per link
  - 2.4TB/s total BW
- Node Memory: 608GB
- Node compute: ~40TF/s double-precision



Source: IBM Power System AC922 Introduction and Technical Overview www.redbooks.ibm.com/redpapers/pdfs/redp5472.pdf



**Optically-Connected Heterogeneous Node Architecture** 



= 25 cubes, 12.5 TB capacity, 1875 TB/s

= 300 cubes, 150 TB capacity, 1.9 PB/s





#### **Unified Photonic Fabric**

#### Per node:

Compute: ~5PF/s (~125x SUMMIT) Memory: ~150TB memory (~250x SUMMIT) Communications: ~2PB/s (~250x SUMMIT)

Internal Node Bandwidth = Node Escape Bandwidth Large optically connected memory pool Accessible by all compute nodes High density, multi core, multi wavelength optical links

Embedded Photonics Potential: 0.4 B/s / FLOP  $\rightarrow$  800 X SUMMIT



- **Summary:** Data Movement is Critical to any Future Performance Scaling
  - Power Consumption
  - Bandwidth Density (and Cost)
  - Photonics: System-Wide PB/s Connectivity Bandwidth
    - 10sTb/s per 'wire' and 1 pJ/bit
    - High bandwidth Optically Connected Heterogenous: Memory/GPUs/CPUs
    - Intra-Node communications bandwidth: PB/s = Inter-Node escape bandwidth
  - Deeply disaggregated Architectures
    - Optical connectivity for <u>flexibly assembled interconnectivity</u> topologies
  - Computer architecture landscape is changing rapidly Data Analytics, Al
    - Optical bandwidth steering, adaptable architectures for scalability
    - Ultimate energy efficiency use only required resources for needed time period





#### **Extra Slides**

## COLUMBIA UNIVERSITY



#### **P-SSIO System Testbed**

Optical PCIe PCIe (Gen2 x1) testbed:



Eye diagram when optical link is on:







### Device configuration and data transaction records at Root-port :

| printf("Bar0 c                                               | forl Result is %x\n",dat                                      |
|--------------------------------------------------------------|---------------------------------------------------------------|
| <                                                            |                                                               |
|                                                              |                                                               |
|                                                              |                                                               |
|                                                              | ole 🔲 Properties 🛷 Search 🛗 Nios                              |
|                                                              | : USB-BlasterII on localhost [USB-2] device ID: 1 instance ID |
| read_DevicePort Result is 4200                               |                                                               |
| Inside Set_BridgeBusNumber<br>Inside write cfgr0             |                                                               |
| Word Aligned!                                                |                                                               |
| Inside read VenderID                                         |                                                               |
| Inside read_venderib                                         |                                                               |
| read BridgeBusNumber Result is 1010                          | 0.0                                                           |
| Inside read VenderID EndPoint                                |                                                               |
| Inside read_venderin_Endroint<br>Inside read cfgr completion |                                                               |
|                                                              | 0040                                                          |
| Inside Read Bar0 cfgrl                                       | 0040                                                          |
| Word Aligned!                                                |                                                               |
| Inside read cfgr completion                                  | Device configuration                                          |
| The Size of Memory at the Endpoint                           | is fffff000                                                   |
| Word Aligned!                                                |                                                               |
| Inside read cfgr completion                                  |                                                               |
| Bar0 cfgrl Result is 100000                                  |                                                               |
| Inside set Command Register                                  |                                                               |
| Inside read cfgr completion                                  |                                                               |
| Word Unaligned!                                              |                                                               |
| Inside read_cfgr_completion                                  |                                                               |
| Command Register at Endpoint is 10                           | 0006                                                          |
| Inside read_cfgr_completion                                  |                                                               |
| Inside write_cfgr0                                           |                                                               |
| Word Unaligned!                                              |                                                               |
| Inside read_cfgr_completion                                  |                                                               |
| Command Register at Rootport is 10                           | 0006                                                          |
| Inside Write Bar0_cfgrl                                      |                                                               |
| Inside write_cfgr0                                           |                                                               |
| Word Aligned!                                                |                                                               |
| Inside read_cfgr_completion                                  |                                                               |
| Endpoint base limit and base address                         | ss is 100010                                                  |
| Exiting read memory                                          |                                                               |
| Data before write is 0                                       |                                                               |
| Exiting write memory                                         |                                                               |
| Exiting read memory                                          |                                                               |
| Data after write is 1234abcd                                 | Data transaction                                              |
| Data before write is 1234abcd                                |                                                               |
| Exiting write memory<br>Exiting read memory                  |                                                               |
|                                                              |                                                               |









#### Ayar Labs:



- Supply silicon photonic ring-based transceiver chip for system demonstration
- Configure optical TX/RX settings for PCIe data rate

Freedom Photonics:







- Delivered Tunable laser source for system demonstration
- Improve temperature performance with uncooled operation
- Design WDM DFB laser array for Phase II





nanoPrecision

products

### Fiber attachment and SiP Packaging



#### PLC Connections:





- In-plane (horizontal) design for reduced height fiber attachment with <u>low coupling loss</u>
- Finalized two designs for fiber attached on-chip testing

#### nanoPresicion:



PLC CONNECTIONS

Develop a commercially viable (low-cost, high-volume) approach to packaging the P-SSIO components and electro-optical co-assembly



#### **Conventional Architecture** → "Assembled" with Flexible Interconnect









**GPU Centric / CMPs Data-Accelerators** 

→ Node Architecture "Assembled" with Flexible Interconnect

