# A VERSATILE BPM SIGNAL PROCESSING SYSTEM BASED ON THE XILINX ZYNQ SOC\*

R. Hulsart<sup>†</sup>, P. Cerniglia, N.M. Day, R. Michnoff, Z. Sorrell, Brookhaven National Laboratory, Upton, Long Island, New York, USA

## Abstract

A new BPM electronics module (V301) has been developed at BNL that uses the latest System on a Chip (SoC) technologies to provide a system with better performance and lower cost per module than before. The future of RHIC ion runs will include new RF conditions as well as a wider dynamic range in intensity. Plans for the use of electron beams, both in ion cooling applications and a future electron-ion collider, have also driven this architecture toward a highly configurable approach [1]. The RF input section has been designed such that jumpers can be changed to allow a single board to provide ion or electron optimized analog filtering. These channels are sampled with four 14-bit 400MSPS A/D converters. The SoC's ARM processor allows a Linux OS to run directly on the module along with a controls system software interface. The FPGA is used to process samples from the ADCs and perform position calculations. A suite of peripherals including dual Ethernet ports, µSD storage, and an interface to the RHIC timing system are also included. A second revision board which includes ultra-low jitter ADC clock synthesis and distribution and improved power supplies is currently being commissioned.

## SYSTEM ARCHITECTURE

A VME form factor was chosen for the initial design of this BPM system (Fig. 1), which leverages off of the existing VME infrastructure found throughout RHIC and its injector facilities. A VME bus interface was not included in the design, only power and serial timing links are distributed on the VME backplane. The V301 name designates it as a third-generation VME design.

# ANALOG INPUT SECTION

# Analog Filters

pective authors

Much of the versatility available on this BPM platform is due to the options for the RF filtering section (Fig. 2). Three separate signal paths are available on the PCB which can be selected by altering soldered jumpers. For most ion beam measurements (RHIC), the pulse length which is received from the pickup electrodes is sufficient to be oversampled by the 400MSPS A/D converters, and so only a low-pass filter (nominally 39MHz) is used to remove unwanted high-frequency components. Filters with different cut-off frequencies can also be substituted,



Figure 1: BPM Electronics

such as a 200MHz low pass which has been used for the BLIP raster BPM. The second and third signal paths are used to connect band pass filters of different package styles, used for measuring very narrow electron beam pulses. When used for low-repetition rate or single-bunch e-beam measurements the ringing response of a 503MHz filter is used to extend the sampling period. Narrowband processing at other frequencies is also possible by choosing similar band pass filters in the same package style. A smaller footprint SAW type band pass filter centered at 707MHz is the third option which will find use in the LEReC project which uses a 704MHz bunch frequency.

Low pass filters (nominally 800MHz) are also included at the board input and around each gain stage, to remove any high-frequency noise and/or aliased frequencies.

# Gain and Attenuation

Each RF input channel has a set of two gain stages, nominally +20dB each. An Analog Devices RF amplifier (ADL5536) is used, which has a good response over the frequency range, and only uses a single +5V supply. Before each of the gain stages, a 7-bit programmable digital step attenuator allows adjustments in 1/4dB increments, from 0-31.75dB of attenuation. In addition, each gain stage can be bypassed by moving soldered jumpers. This is important for applications (such as RHIC stripline BPMs) where large input signals are present and high gain is not necessary.

<sup>\*</sup>Work supported by Brookhaven Science Associates, LLC under

Contract No. DE-AC02-98CH10886 with the U.S. Department of Energy † rhulsart@bnl.gov



Figure 2: Analog Signal Processing Chain.

# ANALOG TO DIGITAL CONVERSION

#### ADC's

Each analog channel is sampled by the Texas Instruments ADS5474 converter, which has a 14-bit width and can be clocked at sample rates up to 400MHz. The analog input of this device requires a differential signal, so RF transformers are used to couple to the single ended input. The internal 2.2V reference of the ADC is used in a bipolar configuration, where +/-1.1V of input voltage corresponds to a (signed) full scale digital output. These digital outputs consist of 14 sets of differential pair data lines terminated into 100 ohms, using the LVDS signalling standard. Careful consideration to impedance control and trace length matching was performed on over 128 traces on the printed circuit board in order to achieve reliable data transfers at the rated 400MSPS x 14bits (equivalent to 5.6Gbit/s).

#### ADC Clock Synthesis

The major addition to this design during its second hardware revision was an external PLL clock synthesis IC, used as a clock source for the ADCs. The Analog Devices AD9517-4 was chosen for its range of features including digitally controlled fine phase shifting ability and low additive jitter. A local oscillator or external input clock can be used as a reference for the PLL to lock to. The internal VCO has a frequency range of 1450-1800MHz, allowing a variety of ADC clock rates to be synthesized. The serial link carrier received over the VME P2 connector includes a RF synchronous 28MHz clock (for RHIC applications) which when used as a PLL reference allows the ADC's to be phase locked with the machine RF. Configuration of the PLL/Synthesizer can be accomplished though software (SPI bus) and changed as needed to suit many different accelerator BPM requirements.

# DIGITAL PROCESSING SYSTEM

#### System Architecture

The Zynq System on a Chip (SoC) device is a hybrid between an ARM dual-core microprocessor and a Field

Programmable Gate Array (FPGA). So far in this application all of the sampling and position calculations have been implemented as logic in the FPGA, and the processor is used to run a Linux derived operating system which allows communication with higher level accelerator controls systems, including some local data processing running as C++ code. A bank of 1GB DDR3 RAM serves as the main system memory, and a µSD card slot provides for non-volatile storage. Two Ethernet ports are included, one is connected directly to the ARM processor peripheral bus and is managed by the Linux OS as a standard Ethernet interface, which is used for communication with the rest of the accelerator controls network. The other Ethernet link is connected to FPGA logic, and can be used for custom high-speed data distribution for fast feedback systems.

#### Sample Processing

Up until this point in development, very little narrowband processing has been used in practice. Instead a high speed multiply-accumulator block controlled by a state machine loads a set of n-samples from the A/D inputs and performs a sum of squares. A small set of pipeline registers allows a software threshold to be set in order to start accumulating samples, or an external trigger signal can be used. When a sample counter expires the sums for each channel are passed to a calculation block. This operation can be sustained at up to the full 400MSPS sample rate, allowing steaming calculations to be performed with only a few sample gap necessary from time to time, in order to reset the accumulators.

## Position Calculation

The two accumulated sums of squares for a pair of channels, which usually corresponds to the horizontal or vertical plane of measurement (although this system can also be configured for a diagonally mounted BPM), are used to find the position of the beam by obtaining the normalized ratio of the difference over the sum. After converting to floating-point representation, the square root of each sum is taken, and then the difference and sum terms are used to form the ratio, which is 0 near the center and approaches unity near the extents of measurement. Both a linear and cubic scaling coefficient is applied to the ratio, which is determined by the particular BPM geometry. The entire calculation takes ~300ns to complete. In order to provide the ability to perform bunchby-bunch, turn-by-turn streaming measurements for a RHIC bunch spacing of 106.5ns, this block is pipelined. Since its longest path (square-root) is ~60ns, it can calculate position for each separate RHIC bunch.

#### Data Delivery

Another finite state machine manages the accumulation of arrays of bunch samples, which are then delivered as a set (or turn) depending on the application. Dedicated blocks of memory are used to store 1K or deeper sets of averaged turn-by-turn data. Another block uses a running average method to effectively low-pass the position data for slower data delivery. A memory mapped interface is used to transfer data to the software running on the ARM processor, at either a fast 1KHz or slow 1Hz depending on the data type. C++ software routines then package this data and deliver it via the ADO software interface used throughout the CAD complex at BNL. Additional signal processing algorithms such as FFTs have been added to the processing code as well. Another set of memory mapped registers allow settings and diagnostics of lower level logic functions on the FPGA from the software side.

One of the great advantages of using a SoC architecture is in this regard. New interfaces can be added or modified between the programmable logic and the microprocessor with relative ease, without modification of the hardware. This has allowed for continuous improvement of the BPM processing algorithms and opened the possibility for applications to other accelerators for which it was not originally designed.

#### **Boot Process**

Envisioning a future large-scale BPM deployment for an accelerator such as RHIC with hundreds of BPMs, a remote boot capability was designed into the system. Each V301 module looks to the root folder on its µSD card for a boot loader, which contains the kernel image of the Linux operating system. A non-volatile ram disk image of the file system is uncompressed into RAM and serves as the root file system. After the Ethernet network is established, a remote file system share (NFS) is used to load individual scripts based on the hostname of the BPM, which is the only specific information stored on the card. The FPGA bit file which contains its logic configuration is also loaded at runtime over a network share. This allows a single file stored on a controls server to be deployed to many BPM systems simultaneously. In addition, hardware replacement in the field is performed by simply swapping the µSD card from the failed module to a new one, and plugging it back in.

#### Ancillary Peripherals

A block of ten connectors can be found on the front panel of the V301. Two of these connections use highspeed buffers to connect to the FPGA, and are used as external trigger inputs (or outputs if desired). In addition there are a set of four isolated digital outputs, and another four digital inputs, using magnetically isolated buffers. These can be used for interlock applications, for example, where a position excursion beyond a set limit can generate an interlock output to a machine protection system. Inputs can also be used for responding to MPS faults, such as logging and delivering a post-mortem BPM data set.

A high speed data connector was also added during the second revision, which breaks out the remaining three high speed (MGT) serial links included as part of the FPGA. This connector can be used to communicate via a fiber optic link at gigabit speeds, or possibly interface with other A/D converters, DAC's, etc. A test cable to connect two BPM modules together has been used to effectively double the number of channels to eight, for future BPM applications such as CBETA where more than four buttons are required for a single BPM (rectangular beam pipes).

#### **VERSATILE APPLICATIONS**

#### RHIC BPM System

The existing BPM system at RHIC consists of over 700 individual measurement planes each connected to a two channel integrated electronics module, most of which have been in service since the late 1990's. Despite its excellent performance to date many of the components are obsolete and repairs are becoming more difficult. Much of the motivation for this new BPM design has been toward replacing the existing electronics with a low cost but equal or better performing next generation system. A great deal of use has been made with real RHIC beam in order to develop earlier prototypes of this system. The 39MHz low pass filter that was chosen has been shown to work well with the various RF frequencies that RHIC uses.

To date a few of these BPM electronics modules have been installed alongside existing hardware, so the results can be compared. Responding and delivering data when requested via the beam sync serial link has been another area of development. This will allow data from these modules to be synchronized with other BPM data, making its use in higher level orbit feedback and management systems possible.

One area in RHIC that could benefit from bunch by bunch position measurements is at the abort kicker and beam dump BPM. It is installed between the two, allowing measurements of the trajectory of each bunch as it is kicked into the dump. Due to fluctuations in the magnet voltage waveform the particles do not all receive the same kick, and follow slightly different paths. Data taken using a prototype of this hardware can be shown (Fig. 3) to closely match the expected positions (similar shape to the magnet voltage waveform).

# RHIC DX BPMs

At each of the six intersection regions of RHIC, there are a pair of BPMs that are located in the common beam



Figure 3: RHIC dump BPM bunch by bunch position.

pipe on either side of the collision point, which see both of the ion beams. Their longitudinal separation is such

that when the two rings are 'cogged' into collisions, the bunches at the DX BPMs are separated by approximately 50ns, half of the nominal 9MHz spacing with a single beam. Since the existing electronics can only process a single sample per turn (~100kHz), each DX BPM signal is split to two modules, each timed separately in order to capture the blue or yellow ring bunch. Calibration of both of these modules becomes critical when measuring the difference between the two ion beam positions within the same BPM pickup assembly.

With much higher sample rates, a single V301 module can be used to sample both beams. Using the same analog hardware channels eliminates the dependence on calibration when computing a position difference. A higher cut-off frequency low pass filter (200MHz) is used instead of the 39MHz, as it was found that there was interference between adjacent bunches (beams) when the lower frequency filter was used. A successful test run of the electronics at the 2 o'clock RHIC intersection region was completed this past year, and there are plans to install the additional modules to complete all of the DX BPM locations during the current shutdown period.

# BLIP Raster Upgrade BPM System

A dual plane BPM was installed in the BLIP beam line at the BNL Linac as a part of the BLIP Raster Upgrade Project [2]. The V301 was also chosen to process signals from the two split-can type pickups installed in the new beam line. One unique requirement was the ability to measure the beam motion while being scanned at a 5kHz rate in a circle. Each Linac pulse of ~450 $\mu$ s duration produces about 2.25 rotations in a circular pattern.

The same firmware used to measure bunch by bunch positions at RHIC is used to slice up the very long Linac pulse into smaller ~100 sample chunks by lowering the sample rate to 25MHz. Each pseudo-bunch is therefore spaced a few microseconds apart, and has a separate position calculated for it, which are put together into an array. When correlated against a similar array taken with the other plane's positions, a circle emerges.



Figure 4: BLIP raster BPM x-y position.

This display (Fig. 4) is used with a persistence mode showing the past few thousand pulses and has proven very useful in monitoring the rastering patterns at BLIP.

Future upgrades at the BNL Linac will also include installations of V301s, using existing BPM pickups.

#### Electron Beam Measurements

The Brookhaven Accelerator Test Facility (ATF) has been used on numerous occasions to prototype and test this new BPM hardware using electron beams as a signal source. Different pickups (buttons and striplines) have been installed in the beam line and connected to these electronics. The ATF can produce single bunch electron pulses at a 1Hz repetition rate. All tests so far have used a 503MHz band pass filter as the main analog processing component. This filter 'rings' and provides a series of 15-20 samples to compare with those of the other channel and compute position. Measurement noise levels are close to what is expected without heavy time-averaging and raster scans have been used to measure the linearity of a few BPM assemblies.

A future upgrade to RHIC, the LEReC project, will be installing a number of new BPM pickups [3], both in a RHIC warm-bore section and a new electron transport beam line. Many of these BPMs will be connected to V301 modules, which has configuration options for both electron and ion signals. In the cooling section of LEReC, relative phase and position measurements will be used to overlap the ion and electron beams to produce the cooling effect. Much work is ongoing to develop the firmware and software to support these capabilities.

# REFERENCES

- R. Michnoff, et al., "Preliminary Design of a Real-Time Hardware Architecture for eRHIC" *ICALEPCS 2015*, Melbourne, Australia (2015), paper THHB2O01.
- [2] R. Michnoff, et al., "The Brookhaven LINAC Isotope Production Facility (BLIP) Raster Scanning System First Year Operation with Beam" *IBIC 2016*, Barcelona, Spain (2016), paper MOPG28.
- [3] Z. Sorrell, *et al.*, "Beam Position Monitors for LEReC," presented at the *IBIC 2016*, Barcelona, Spain (2016), paper MOPG08.