

# Lightning (PCIe JBOF): Update, challenges, and solutions

Chris Petersen, Hardware Systems Technologist, Facebook Wesley Yung, Applications Technical Lead, Microsemi Bob Pebly, Platform Architect, Intel

OPEN HARDWARE. OPEN SOFTWARE. OPEN FUTURE.



# Lightning update

# Design and validation status = Complete!



#### OCP contributions

- <u>Hardware</u>:
  - Wiwynn is contributing full Lightning design package
  - PCle retimer specification available and design package coming
- <u>Software</u>:
  - OpenBMC (<u>https://github.com/facebook</u> /openbmc)
  - Switch management
  - Drivers

# PCIe JBOF Challenges

□ M.2 support

Enclosure management

PCIe Hot-plug



## M.2 solution

#### Thermal



#### PCIe Hot-plug





## JBOF Enclosure Management Wesley Yung, Applications Technical Lead, Microsemi





# Aligning PCIe Switch management to SAS capabilities



- Utilizes SCSI-Enclosure Services (SES) to manage the JBOD enclosure
- SAS expander supports in-band and out-of-band management of endpoints



- Utilizes a PCIe management endpoint to manage the JBOF enclosure
- Supports in-band and out-ofband management of endpoints through Memory-mapped Remote Procedure Calls (MRPC)

# Up-stream/open-source driver and utilities

- Kernel Driver
  - <u>https://github.com/sbates13027</u> <u>2/switchtec-kernel</u>
  - Lightweight shim driver providing access to switch internal management endpoint
  - Targeting Kernel 4.11
  - <u>https://lkml.org/lkml/2017/2/2/4</u> <u>45</u>

- User-space utility
  - <u>https://github.com/sbates130</u>
    <u>272/switchtec-user</u>
  - FW Download / Upload
  - Error, Performance counters
  - Temperature monitor
  - Port Status / Health

| gunthorp@cgy1-donard:~\$                                           | switchtec | versi | .on       |           |         |       |  |
|--------------------------------------------------------------------|-----------|-------|-----------|-----------|---------|-------|--|
| switchtec version 0.4                                              |           |       |           |           |         |       |  |
| gunthorp@cgy1-donard:~\$                                           | switchtec | list  |           |           |         |       |  |
| /dev/switchtec0                                                    | PSX 48XG3 |       | RevB      | 1.06 B03F | 0000:03 | :00.1 |  |
| gunthorp@cgy1-donard:~\$                                           | switchtec | test  | /dev/swit | tchtec0   |         |       |  |
| /dev/switchtec0: success                                           |           |       |           |           |         |       |  |
| <pre>gunthorp@cgy1-donard:~\$ switchtec temp /dev/switchtec0</pre> |           |       |           |           |         |       |  |
| 48.1 °C                                                            |           |       |           |           |         |       |  |

## Driver Architecture / Integration





# PCIe & NVMe Hot Plug

#### Bob Pebly, Platform Architect, Intel Wesley Yung, Applications Technical Lead, Microsemi





# High level goal

Do this...



#### ...and THIS doesn't happen

Your PC ran into a problem and needs to restart. We're just collecting some error info, and then we'll restart for you. (0% complete)

If you'd like to know more, you can search online later for this error: HAL\_INITIALIZATION\_FAILED



# Challenges: Completion Timeout



- Posted vs. Non-posted Transactions
  - Posted Request only, no completion response
  - Non-posted Split Transaction, Request & Completion
- Surprise hot plug can (& will) leave <u>many</u> transactions incomplete
  - Completion Timeout (CTO) is the PCIe mechanism to terminate incomplete transactions



# Challenges: All 1's (All F's) Completions

- A completion with data (CpID) where the data is All 1's
  - Memory Read, Config Read
- Happens when a completion never returns to protect requester from timing out
- Prior to Lightning \*<u>NO</u>\* support for All 1's Completions in NVMe or PCIe service drivers (or most other Linux drivers)
- Otherwise...

[265862.256129] Uhhuh. NMI received for unknown reason 39 on CPU 0. [265862.268121] Do you have a strange power saving mode enabled? [265862.279578] Dazed and confused, but trying to continue

Time's up, I better send "something" to the driver... (intel) Microsemi CIE FANOUT SWITCH PFX 96xG3 PM8536

## Solutions: Intel Surprise Hot Plug Linux Contributions

| Contribution                                                                                                              | Kernel Version |
|---------------------------------------------------------------------------------------------------------------------------|----------------|
| New PCIe Downstream Port Containment (DPC) driver                                                                         | 4.7            |
| Enhancements & optimizations to PCI driver<br>Recognizing All 1's as a missing device on key config registers             | pending        |
| Enhancements & optimizations to AER driver<br>Caching of extended capability pointers                                     | 4.4            |
| Enhancements to NVMe driver<br>Recognizing All 1's as a missing device<br>Cleaning up after hot remove without further IO | 4.7            |
| Enhancements & fixes in the block multi-queue driver<br>Dealing with errors returned on IO following surprise removal     | 4.7            |

Kudos to Keith Busch for Linux NVMe & PCIe driver enhancements

### Before & After: NVMe Surprise Remove PCIe IO Trace



## Challenges / Solutions: Downstream Port Containment (DPC)

- Defined in PCI-SIG Base Specification 3.1
- Enabled by *new* DPC Driver (see slide 15)
- Allows for *ErrFatal+* to be supported in the host



# PCIe JBOF Challenges Solved!

M.2 support

Enclosure management

✓ PCIe Hot-plug



