# DEVELOPMENT OF AN OPEN-SOURCE HARDWARE PLATFORM FOR SIRIUS BPM AND ORBIT FEEDBACK

D. O. Tavares<sup>#</sup>, R. A. Baron, F. H. Cardoso, S. R. Marques, L. M. Russo, LNLS, Campinas, Brazil A. P. Byszuk, G. Kasprowicz, A. J. Wojeński, Warsaw University of Technology, Warsaw, Poland

#### Abstract

The Brazilian Synchrotron Light Laboratory (LNLS) is developing a BPM and orbit feedback system for Sirius, the new low emmitance synchrotron light source under construction in Brazil. In that context, 3 open-source boards and accompanying low-level firmware/software were developed in cooperation with the Warsaw University of Technology (WUT) to serve as hardware platform for the BPM data acquisition and digital signal processing platform as well as orbit feedback data distributor: (i) FPGA board with 2 high-pin count ANSI/VITA FMC slots in PICMG® AMC form factor; (ii) 4-channel 16-bit 130 MS/s ADC board in FMC form factor; (iii) 4-channel 16-bit 250 MS/s ADC board in FMC form factor. The design of such boards and experience of integrating their prototypes in a COTS MicroTCA.4 crate is reported.

#### **INTRODUCTION**

Sirius is a new 3 GeV synchrotron light source under construction in Brazil [1], targeting a 0.28 nm.rad natural emittance. One of the key challenges of the machine design is to keep the vertical electron beam position RMS displacement below the 140 nm level (10% of minimum vertical beam size) within a 0.1 Hz to 1 kHz bandwidth despite the external disturbances. It guarantees proper photon beam stability for the end users.

In order to reach this requirement, an active orbit correction system with sufficient disturbance rejection bandwidth is mandatory, where precise and accurate beam position measurement, low group delay digital signal processing, low latency data distribution, high bandwidth orbit corrector magnet response and optimized design of the feedback controller algorithm are the key ingredients to reach the utmost performance.

In that context, LNLS has initiated a research and development program to build a Beam Position Monitor (BPM) and Fast Orbit Feedback (FOFB) system that would allow meeting the beam stability requirements while giving all flexibility for in-house customizations and collaboration with other institutes worldwide.

The Sirius RF BPM electronics is currently in prototype phase [2]. Apart from its standalone RF Front-End board [3], all hardware was conceived to be usable in applications outside the Sirius's BPM and FOFB projects. Great progress has also been done on building the basic hardware platform and basic FPGA HDL and low-level software infrastructure. The WUT has contributed at both fronts providing development services to LNLS and closely collaborating at the hardware specification level.

# SYSTEM REQUIREMENTS

The Sirius BPM system have ordinary requirements of accelerator data acquisition systems such as synchronization with the beam's frequency, triggered acquisition of large amounts of data (within the 100 MB to 1 GB range), programmable logic resources for fast and parallel digital signal processing, integration with an accelerator-wide distributed control system via 1 Gb Ethernet and standardized control software running over Linux. Special requirements exist for resolution and accuracy. Signal-to-noise ratio around 100 dB must be reached at the orbit feedback bandwidth in order to reach a measurement resolution below 140 nm (RMS value integrated in a 0.1 to 1 kHz bandwidth). Long-term drifts of channel gains, mainly caused by temperature and beam current dependences, must be kept below the 1 mdB level.

The Sirius FOFB requires a high count of devices to be integrated to one single MIMO feedback control loop with low latency for sensors and actuators data distribution. In effect, when fully populated, Sirius's storage ring should have approximately 240 RF BPMs, 17 insertion device gap/phase encoders, 70 X-Ray BPMs (XBPMs) and 440 orbit corrector power supplies integrated in one feedback loop with 100 kS/s update rate and closed-loop latency below 25 µs.

#### SYSTEM ARCHITECTURE

A central element of the system is the digital back-end (DBE). As depicted in Fig. 1, the DBE is implemented as an FPGA board with 2 I/O mezzanine slots, large SDRAM memory, synchronization resources, trigger input/outputs, PCI Express connectivity and multigigabit serial lines with SFP+ hardware interface.



Figure 1: Sirius BPM and orbit feedback architecture.

In the case of BPMs, the I/O slots are occupied with 4channel ADC mezzanine boards used for digitizing the RF BPM signals coming from standalone RF front-end electronics specially designed for Sirius.

<sup>#</sup>daniel.tavares@lnls.br

Besides the roles of digital signal processing and data acquisition of RF BPMs, the FPGA will be used to concentrate synchronized data coming from all other FOFB sensors (e.g. X-ray BPMs, insertion device gap/phase encoders) and command actuators using up to 8 multigigabit serial lines with SFP+ hardware interface. The same interfaces can be used to send/receive aggregated data to/from the FOFB controller via a dedicated low-latency network, either based on 1 GbE/10 GbE network [4] or on a hardware-based low-latency fast communication protocol. LNLS evaluates the possibility of using the Synchronous Device Interface (SDI) [5], developed by Brookhaven National Laboratory (BNL) for NSLS-II.

All FPGA boards will receive from the Sirius timing system a reference clock and a number of triggers synchronized with the beam, notably the revolution clock, beam injection trigger and beam loss trigger. It will be also able to provide interlock signals for the Machine Protection System (MPS) in order to avoid damages due to mis-steered high energy X-ray beams.

The PCIe interface provides fast and low-latency communication with a commodity CPU running the distributed control software (e.g. EPICS IOC) on top of Linux operating system.

#### HARDWARE DESIGNS

A modular approach greatly based on proven and emerging industrial standards has been chosen for the hardware architecture. The key technologies and standards in use are MicroTCA.4 crates, AMC modules, FMC modules, 1 Gigabit Ethernet and PCI Express connectivity. The following sections describe the hardware developed by LNLS and WUT.

# AMC FMC Carrier (AFC)

The DBE FPGA board was specified by LNLS and designed by WUT as a double-width AMC card fully compliant to the MicroTCA.4 standard, featuring large SDRAM memory (512 MB or 2 GB) with 32-bit interface, 2 fully populated high-pin count FMC mezzanine slots, option for 8 multigigabit links routed to the MicroTCA Rear Transition Module (µRTM) connector, 8 MicroTCA.4 M-LVDS triggers, connectivity to PCIe at Fat Pipe 1 (x4 link), redundant 1 Gb Ethernet ports, full hardware support for the White Rabbit timing system, flexible clocking multiplexing allowing routing of MicroTCA.4 low-jitter clocks to any of the FMC slots and provision for standalone operation. The FPGA device in use is Xilinx Artix-7 200T FFG1156, which provides large I/O pin count at low price.

The first prototype of the board, shown in Fig. 2, has been produced and is in test phase at the moment of writing. Basic features such as power supplies, JTAG chain, FPGA configuration, SDRAM read/write, PCIe interface, I<sup>2</sup>C bus, IPMI, M-LVDS triggering lines, FMC slots and EEPROM have been successfully tested.



Figure 2: AFC first prototype in a MicroTCA.4 crate.

#### FMC ADC 130 MS/s 16-bit 4-channel

The FMC ADC 130 MS/s board is a compliant ANSI/VITA 57.1 FMC mezzanine module with 4 LTC2208 ADCs capable of up to 130MS/s sampling. It has been designed by LNLS in collaboration with WUT. Fist prototypes were built, as shown in Fig. 3a.

Two options of clock distribution exist: (i) the passive distribution, which replicates the ADC clocks from an external source using only passive components to minimize uncorrelated low-frequency phase noise on ADC clocks; (ii) the active distribution, using AD9510 as clock distributor and optionally as PLL. When used as PLL, the ADC clock signal can be sourced by one of the onboard oscillators, phase-locked to a reference clock provided at front panel connector or at FMC connector.

Two different options of oscillators are possible: (i) the Silicon Labs Si571 providing programmable center frequencies; and (ii) the Crystek CVHD-950 providing fixed center frequency (customizable when ordering the parts) with better jitter performance.

The board has 2 temperature sensors and provides one bi-directional trigger accessible from the front panel.

The tests were conducted successfully. All the available integrated circuits were programmed and worked as expected. Special firmware was developed for fast acquisition of raw data from ADC chips. The measured return loss of all input channels showed approximately -20 dB for frequencies lower than 500 MHz. The measured attenuation at 500 MHz was around 4 dB. Despite these good results in terms of impedance matching and bandwidth, and having the board achieved acceptable SNR, SINAD and SFDR, strong distortions (THD > -65 dBc) could be verified for input powers the greater than -8 dBm. The issue is being investigated and will be handled for the next version of the board. N

# FMC ADC 250 MS/s 16-bit 4-channel

The FMC ADC 250 MS/s board which is shown in Fig. 3b is another FMC module, with the same clock resources and similar mechanical layout as the FMC 130 MS/s board, only having different ADC (ISLA216P25) and associated circuitry. The board was designed by WUT for LNLS in order to test a second 🚍 option of data converter, having more convenient control  $\odot$ interface, calibration features and lower dissipation.

and

The FMC ADC 250 MS/s board has been successfully tested in a similar way as the FMC ADC 130 MS/s board, with special version of FPGA firmware. The measured return loss for frequencies below 500 MHz was about -9 dB. Strong attenuation at 500 MHz (> 12 dB) has been verified. Effort to improve the performance is ongoing.



Figure 3: (a) FMC ADC 130 MS/s 16-bit 4-channel first prototype; (b) FMC ADC 250 MS/s 16-bit 4-channel first prototype; (c) Details on heat sink and front panel (identical for both boards).

# MicroTCA System

The AFC board together with both FMC ADC module versions were tested in a Vadatech VT811 MicroTCA.4 crate with Vadatech PowerPC AMC717 CPU. The AFC's IPMI was successfully implemented based on the MMC developed for CMS experiment by the University of Wisconsin. It required porting the software code from ATmega to ARM processor.

IPMI incompatibilities between Vadatech MCH and N.A.T. power supplies were found, causing severe startup and cooling failures on the system. The system operated correctly when Vadatech power supplies were used.

Although functional, the option for PowerPC P2020 processor was considered to be inadequate due to its lack of sufficiently big system storage memory and strong ties to vendor's outdated Linux SDK, which would impede software development. Next prototype version will be based on x86 CPU boards.

#### HDL AND SOFTWARE DESIGNS

The FPGA firmware was developed following a modularized approach with Wishbone as interconnect standard. The firmware modules are connected to the local bus via a Wishbone crossbar switch, allowing shared access of board peripherals for control and monitoring from master nodes. PCI Express Wishbone master has been developed to provide direct access to the board peripherals from the MicroTCA crate's CPU Linux operating system.

# Due to the modularization and standardization of the firmware on top of Wishbone, other masters can also be used to provide seamless integration with the existing code, for example, the LM32 softcore processor, RS232 system controller or Etherbone.

#### PCI Express Infrastructure

The PCIe firmware and device driver designs were based on projects from University of Heidelberg [6] and OpenCores repository [7]. Original firmware was subject to many modifications, the main ones being porting from old TRN to new AXI interface, addition of support for DDR memory controller, new datapath for Wishbone slave endpoints and support for hdlmake tool for synthesis and simulation. Also, old testbench code was completely rewritten. New testbench code is integrated with Xilinx's testbenches for PCIe and DDR IP cores, allowing better simulation of firmware operation in real world conditions.

The FPGA firmware has three 64-bit addressable Base Address Register (BAR) ranges. BAR0 for core firmware registers, BAR2 for DDR memory access and BAR4 for Wishbone endpoint access. The BAR2 and BAR4 memory spaces are, respectively, 1 MB and 512 KB wide. Paging mechanism is used for Memory Mapped IO (MMIO) operations that want to access all available resources (e.g. 512 MB of onboard DDR SDRAM). This solution was used to work around bugs present in some CPU platforms that do not provide enough memory resources and to ensure portability to as many platforms as possible. Firmware supports DMA transactions to/from BAR2 (DDR memory) and BAR4 (Wishbone) memory spaces. DMA transaction can be set up to work in traditional single-buffer mode or multiple DMA transactions can be chainloaded to support Scatter-Gather (SG) operation targeting multiple memory buffers. Initial performance results, acquired with the Xilinx ML605 board and PCIe x1 (Gen 1) link are presented in Fig 4.



Figure 4: DMA performance of PCIe firmware running on Xilinx ML605 development kit, PCIe (Gen1) x1.

The Linux device driver is responsible for 2 tasks: basic configuration necessary for device operation (memory remapping, interrupt handling, etc.) and memory buffer management. It supports wide range of Linux kernel versions (currently 2.6.32 to 3.10 are confirmed to work). Currently only x86 and x86-64 platforms are supported. Device driver is accompanied by a set of C/C++ libraries that hide the complexity of driver's IOCTL (IO ConTroL) calls from programmer and provide basic error handling. The C/C++ libraries present an API that allows easier device access and memory buffer management.

ISBN 978-3-95450-139-7

# FMC Configuration Software (FCS)

# COLLABORATION

The FMC Configuration Software (FCS) was developed to set up FPGA Carriers and FMC boards through a common interface between the crate's CPU and the FPGA boards. It is written in C++ language in a modular way, so that parts of the project can be easily replaced in order to suit different FPGA firmware, FMC and backplane board designs.

The general structure of FCS can be described as different levels of abstraction in terms of interface drivers. In the lowest-level, there is a Wishbone Master driver which communicates with the FPGA board through a specified interface (e.g. RS-232, PCIe). Secondly, there is a protocol-specific level driver, which implements IP core interface handling for communication protocols (e.g. SPI, I<sup>2</sup>C). The last level of abstraction comprises the development of chip-specific register map semantics, protocols and function handling (e.g. setting an oscillator frequency). Those abstraction levels create a reusable database of drivers for automated communication with the hardware. Each of the drivers on different levels can be easily swapped to match the hardware that is being used.

Due to its flexible implementation, the software can be adapted for many other purposes, such as the ongoing efforts to support multiple boards and its use for diagnostic purposes for measurement systems. There is also a plan to include support for SDB (Self Describing Bus), in order to automate the Wishbone register map discovery and drivers loading.

FCS is currently used in LNLS BPM and GSI TMS projects. All of the tests of the FMC cards herein presented were performed using it.

#### **NEW HARDWARE DEVELOPMENTS**

Two key hardware elements of Sirius FOFB system are still to be built, namely the MicroTCA RTM with 8 SFP+, to be used in conjunction with AFC, and a compact and universal node with SFP+ hardware interface and small FPGA to be used as translator between FOFB low-latency data distribution network protocol and all types of digital protocols (e.g. Ethernet, RS-232, RS-485, CAN, SPI, I<sup>2</sup>C) or direct control (e.g. parallel control of peripherals, PWM). Both developments will be made compatible with White Rabbit timing system to broaden its reach of use. Likewise, the RTM SFP+ module will follow DESY's recommendation for RTM connector pin assignment.

Although not essential for the Sirius BPM and FOFB systems, plans for building a fully compliant White Rabbit MicroTCA Carrier Hub (MCH) with PCIe switch and x16 link to a  $\mu$ RTM COM Express CPU board exist and is currently in planning and proof of concept phase. The MCH tongue 2, dedicated for clocks, will be capable of Distributed DDS over White Rabbit. In addition, a prototype AMC COM Express Type 6 CPU board has been built by WUT and tested at LNLS for replacement of the PowerPC AMC CPU board.

The project follows an entirely open-source approach, with all related designs made publicly available at the CERN Open Hardware Repository [8]. It aims at evolving the final product towards the state-of-the-art with easy integration with other accelerator systems, avoiding duplication of efforts, vendor lock-in and providing costeffective solutions for problems with similar requirements.

The hardware designed by LNLS and WUT is licensed under the CERN Open Hardware License (OHL). HDL and software designs are licensed under GPL. Besides these own projects, several open-source designs and tools have been used and/or extended to build the system, for example: (i) automated tools for Wishbone slave generation, script-based FPGA firmware synthesis and simulation (hdlmake) and Wishbone crossbar switch, developed by the White Rabbit project team; (ii) the PCIe Scatter-Gather DMA controller HDL and PCIe Linux device driver, written by WUT based on the respective projects hosted at OpenCores website and University of Heidelberg; and (iii) the LM32 softcore developed by Lattice Semiconductor and extended by GSI.

#### **CONCLUSION**

The paper has shown the system architecture of Sirius BPM and FOFB comprising modularized approach in terms of hardware, firmware and low-level software. The project is based on proven and emerging industrial standards and tied to the open-source approach. Although overall functionality and performance have been proven, efforts for improvements are ongoing. A new round of board prototypes is foreseen to December 2013.

#### ACKNOWLEDGMENT

The authors would like to thank Bartek Juszczyk for porting the IPMI code developed by the University of Wisconsin for CMS from ATmega to ARM processor.

#### REFERENCES

- L. Liu, et al., "A new 5BA low emmitance lattice for Sirius", IPAC'13, Shanghai, May 2013, TUPWO001, p. 1874 (2013).
- [2] D. O. Tavares, et al., "Development of the Sirius RF BPM Electronics", IBIC'13, Oxford, September 2013, MOPC09.
- [3] R. A. Baron, et al., "Development of the RF Front-End Electronics for the Sirius BPM System", IBIC'13, Oxford, September 2013, WEPC07.
- [4] J. G. R. S. Franco, et al. "Sirius Control System: Conceptual Design", these proceedings.
- [5] Y. Tian, et al., "Synchronous Device Interface and Power Supply Control at NSLS-II", PAC'09, Vancouver, May 2009, FR5REP005.
- [6] "PCIe Scatter-Gather DMA controller", http://opencores.org/project,pcie\_sg\_dma
- [7] "PCIe SG DMA Device Driver", http://li5.ziti.uniheidelberg.de/mprace
- [8] "Sirius BPM project at the Open Hardware Repository," http://www.ohwr.org/projects/bpm/wiki

tive authors

the

N