# THE GLOBAL TRIGGER PROCESSOR: A VXS SWITCH MODULE FOR TRIGGERING LARGE SCALE DATA ACQUISITION SYSTEMS

S. Kaneta, C. Cuevas, H. Dong, W. Gu, E. Jastrzembski, N. Nganga, B. Raydo, J. Wilson, Jefferson Lab, Newport News, VA, U.S.A.

## Abstract

The 12 GeV upgrade for Jefferson Lab's Continuous Electron Beam Accelerator Facility requires the development of a new data acquisition system to accommodate the proposed 200 kHz Level 1 trigger rates expected for fixed target experiments at 12 GeV. As part of a suite of trigger electronics comprised of VXS switch and payload modules, the Global Trigger Processor (GTP) will handle up to 32,768 channels of preprocessed trigger data from multiple detector systems that surround the beam target at a system clock rate of 250 MHz. The GTP is configured with user programmable Physics trigger equations and when trigger conditions are satisfied, the GTP will activate the storage of data for subsequent analysis. The GTP features an Altera Stratix IV GX FPGA allowing interface to 16 Sub-System Processor modules via 32 5-Gbps links, DDR2 and flash memory devices, two gigabit Ethernet interfaces using Nios II embedded processors, fiber optic transceivers, and trigger output signals. The GTP's high-bandwidth interconnect with the payload modules in the VXS crate, the Ethernet interface for parameter control, status monitoring, and remote update, and the inherent nature of its FPGA give it the flexibility to be used large variety of tasks and adapt to future needs. This paper details the responsibilities of the GTP, the hardware's role in meeting those requirements, and elements of the VXS architecture that facilitated the design of the trigger system. Also presented will be the current status of development including significant milestones and challenges.

## **GTP RESPONSIBILITIES**

## Level 1 Trigger

As part of the 12 GeV trigger upgrade, the GTP is the central processor for the Level 1 trigger in Jefferson Lab's new data acquisition system. It is designed to simultaneously generate up to 32 independent, | programmable triggers with a fixed latency from the gevent occurrence at a 4 ns resolution.

The new system will essentially eliminate trigger dead time by pipelining the detector sampling and trigger decision calculations. The data pipeline replaces long spools of delay cable previously used and instead stores the data and time samples in memory devices on the VXS ADC and TDC payload boards. When triggers are received, the associated data are read out of these buffers for long-term storage.

The trigger calculations will also be pipelined and computed in parallel to maintain continuous availability of triggers. In order to ensure there is no trigger processing dead time, the triggers must be able to fire successively at the 250MHz global clock. The trigger must also reach the front end modules before the buffers run out of space and overwrite data which satisfy trigger conditions. Hardware limitations define a maximum buffer size of  $3.2 \ \mu s$  [1].

Synchronization of the trigger data is also critical to ensure triggers are calculated correctly. A clock synchronization signal is used to align the data of all transmitters so that the summation across channels uses samples from the same point in time. It also helps align transceiver channels themselves to remove skew caused by the deserializers. This is done by aligning the reading of the transceiver memory buffers after all channels have received valid data.

Figure 1 shows the flow and concentration of data through the trigger system data path. Each Crate Trigger Processor (CTP) accepts data from its TDC or flash ADC modules in one of two forms, either threshold crossing (hit bits) or energy sum which is determined by the detector type. Fiber optic cables link the CTP to the Subsystem processors (SSP) which align and combine these data by detector system and forward to the GTP for trigger calculation. Not shown is the Trigger Supervisor (TS) which makes the final decision among the multiple trigger signals. It is linked to the GTP via a 32-bit parallel bus operating at the trigger clock rate of 250 Mhz.

| $ \begin{array}{c ccccccccccccccccccccccccccccccccccc$                                                    |
|-----------------------------------------------------------------------------------------------------------|
| IADC 10 K SSF K   fADC 11 K SSP K   fADC 12 P SSP P   fADC 13 L SSP L   fADC 14 A SSP A   fADC 15 N SSP N |

Figure 1: Level 1 Trigger System Data Flow.

## User Interface

With an increase in complexity and quantity of trigger equations, the ability to intuitively configure them becomes more important. To help with the setup of trigger equations, a direct link to the GTP in the form of an Ethernet connection will be used. The interface will be a web page served from the GTP and will allow manipulation of coefficients and some adjustment to the form of the trigger equations. To provide greater levels of customization, reconfiguration of the FPGA is available using images in non-volatile memory. Users will be able to select between multiple pages already stored or upload new configurations.

#### **VXS ELEMENTS**

The VXS architecture has several advantages compared with other commercial off-the-shelf (COTS) chassis and backplane solutions. Making use of these elements has shortened the design process and reduced some of the associated risks.

### VME Extension

Development on VME systems over the years has produced a set of verified hardware and software blocks which can be used in VXS platforms. Connectors P1 and P2 on VXS Payload modules remain unchanged from their VME counterparts allowing backward compatibility with existing designs. Designs consisting of VME and VXS modules can coexist in VXS chassis.

It is important to note that since the VXS Switch module was not previously defined in the VME standard, it contains no direct connections to VME boards, requiring at least one VXS Payload with P0 to enable communication between all modules.

### VXS Architecture

VXS Payload modules include a high speed P0 connector in addition to the standard VME P1 and P2. This connector provides eight high speed differential signal paths and two single ended signals to each of the two VXS Switch modules specified by the VITA 41.0 VXS standard [2]. Several VXS Standards, including VITA 41.2, optionally refine these signals for specific communication standards as four transmit pairs and four receive pairs which are reversed on the backplane and two I<sup>2</sup>C single ended lines. If full duplex communication is required, this convention is convenient to help reduce pin mapping errors [3].

VXS has a dual star architecture specifying two Switch modules, each with high speed and single ended connections to up to 18 payload ports [2]. Figure 2 shows the connections between the two Switch cards and a Payload. Where applicable, the transmitter and receiver are specified with respect to Switch A. In addition, transmit, receive, and I<sup>2</sup>C designations are optional and specified in VITA 41.1 and 41.2 and are made to illustrate a possible configuration [4].



Figure 2: Example VXS Signal Mapping.

## HARDWARE IMPLEMENTATION

#### Ethernet

The inclusion of Ethernet in an embedded system requires several hardware and software components to implement. Full TCP/IP over Ethernet is normally processed by a microprocessor running an embedded operating system and an Ethernet stack, a collection of functions related to the processing of Ethernet packets. Depending on the level of Ethernet implementation required, a simpler protocol such as UDP (user datagram protocol) can be implemented, saving costs on development tools and licenses. Non-volatile memory is required to store microprocessor boot code while fast onchip and DDR2 memory is needed for many functions including instruction and data caching as well as dynamic memory allocation, program code space, and packet buffering. The electrical interface includes an Ethernet jack, magnetics, and a physical interface device (PHY). The Ethernet MAC (medium access controller) is available as an FPGA IP core and can be used with both TCP/IP and UDP protocols.

Given the flexibility that must be built into the design at its early stages, the GTP will use its FPGA resources, either Altera's embedded soft processor Nios II or logic elements, to provide Ethernet capabilities. A processor design helps deal with uncertainty inherent to TCP/IP networks including packet loss, retransmission and packet reordering. A pure logic element implementation of UDP benefits from very high throughput and efficiency but may prove impractical if the network quality impacts the data integrity.

VXS also allows for Gigabit Ethernet over the backplane using high speed signals. Due to the rates involved and FPGA capabilities, only simple circuitry is required. This consists of LVDS transceivers to buffer the backplane connection from the FPGA. A second processor may be instantiated in the FPGA to establish this link, minimizing the initial hardware investment.

## External Memory

The GTP prototype contains both volatile and nonvolatile memory in the form of DDR2 and NOR flash devices, respectively. A total of 512 Mb of flash storage and 2048 Mb of DDR2 memory are onboard.

The DDR2 memory is available in two 1024 Mb (128 MB) devices with fully independent address and data busses to accommodate two separate Ethernet interfaces, each with its own processor and memory buffers.

To hold multiple FPGA configurations, one flash device is logically segmented to contain several pages which are loaded using a CPLD (complex programmable logic device) configuration controller. The flash is programmed either via the Ethernet connection or through the front panel JTAG port. A second flash device is used to hold instructions and data for the CPU in addition to the Ethernet web server file structure.

#### Fiber Optic Transceivers

The CTP and GTP share many design aspects and could be used interchangeably when configured appropriately. To ensure full hardware compatibility, a four channel fiber optic transceiver was included on the GTP. These transceiver channels occupy four of the 36 lanes available on the FPGA. In cases where the additional functions of the GTP including Ethernet connectivity are required at the crate level, a GTP can be substituted. A firmware update would be required to provide CTP functionality in the FPGA.

## Front Panel

## Configuration

The CPLD's primary function is to provide safe configuration and reconfiguration control for the FPGA. It has access to a 256 Mb flash memory dedicated for storage of multiple FPGA images. The FPGA, which can also access the flash, can program images downloaded via the Ethernet interface and then command the CPLD to initiate reconfiguration. A safe FPGA image is stored without access by the user, allowing restoration of the GTP if the update over Ethernet fails.

VXS Backplane

#### Ethernet DDR2 Memory Ethernet 2 RGMII **RJ45** PHY SSP Data 1 Nios II Code Flash Link Up 1 Processor(s) Stratix IV GX Multi-Image Configuration SSPData 16 180 Configuration Controller Link Up 16 CPLD Flash JTAG Clock, Sync Trig1, Trig2 4 Fiber Transceiver Legend High Speed Serial 4 Trigger Out General Purpose 4xConfiguration Densishield 32

Figure 3: Simplified Global Trigger Processor Hardware Block Diagram.

## Transceivers

In order to support the bandwidth and quantity of high speed signals arriving from the 16 SSP modules, an FPGA with integrated transceivers is the perfect solution. Using discrete devices with large parallel busses would not be practical given the number of pins and traces that would be required.

The selection of the Altera Stratix IV GX was made based on several criteria. While rated for speeds greater than 6 Gbps on all speed grades, the clocking structure places some limitations on a subset of the transceiver channels when all are utilized. With 32 transceivers allocated for receiving data from SSPs, four channels remain for use with the fiber transceivers, using all 36 lanes available on the FPGA.

A modified version of the Xilinx Aurora protocol has been selected for communication between the SSP and GTP. It is an open standard designed to encapsulate high speed links [5]. Because of the synchronization of the trigger data rate with the transceiver link rate, some of the Aurora protocol overhead has been removed including flow control and error handling. Lane initialization, channel bonding, and 8B/10B encoding are maintained. Low bit error rates are being targeted to account for the lack of error handling.

## PCB Construction

While only a single GTP is required for each of the four experimental halls, efforts were made to keep the PCB costs to a minimum. The design was primarily driven by the interconnect required to tie all the components together and signal integrity, largely for the high speed signals and fast interfaces such as DDR2.

The FPGA requires almost a dozen different power rails between unique voltages and isolated supplies, driving the layer count for power planes. No power nets were run on signal layers and all are routed as split planes, minimizing impedance by providing wide plane connections. Power and ground on adjacent layers also adds high frequency decoupling for the FPGA and other sensitive components.

To maximize signal integrity, all signal and power layers are referenced to ground to provide close proximity, low impedance return paths. The result is a 16 layer board with six signal layers, four power layers and six ground layers. While conservative, all layers have critical signals which could be affected by inadequate or unbalanced routing. For example, matched DDR2 signals routed on two different layers could have increased skew and reduced margins without also matching return path.

With a transceiver link rate requirement of 5 Gbps, additional steps were taken to reduce stubs and minimize reflections. Since the concentration of trigger data requires data flow in only one direction, receive channels were given routing precedence. On incoming lanes from the SSP, all via stubs were eliminated by routing them on external layers rather than adding the cost of backdrilling. Only two full height vias were used, one for the VXS connector pin to the bottom layer and another from the bottom layer to the FPGA pad on top.

The VXS switch connector mapping distributes signals evenly by Payload Port across its four differential connectors. Since only two channels from each payload were used, this allowed all receive signals to be broken out on a single layer and all transmit lines on another.

### Current Status

The GTP prototype is complete, shown in Figure 4, and hardware evaluation has begun. FPGA vendors provide many tools to help test hardware, in some cases with minimal development effort. In particular, Altera's Quartus II software comes equipped with a Transceiver Toolkit which allows rapid testing of transceiver channels. This allows manipulation of settings and data patterns in addition to bit-error rate (BER) monitoring and eye opening measurement to optimize the link signal integrity. Using a pseudo-random bit sequence (PRBS), the SSP to GTP links are performing well at 5 Gbps with PRBS31, a rigorous test pattern containing long strings of consecutive ones and zeros.

Altera's DDR2 memory controller core and External Memory Interface Toolkit help verify external memory data, address and command signals. Using a JTAG link between the PC and FPGA, calibration and margining can performed and the results reported. At a memory speed of 666 MHz, both memory devices pass by wide margins on all data and strobe lines.

Ethernet hardware development and TCP/IP software development are complex compared with most other interfaces. In order to provide the most flexibility, the GTP was designed to integrate all protocol layers above the physical interface within the FPGA. Altera's Qsys system builder allows the creation of customized systems by placing components in FPGA logic using a graphical user interface (GUI) and defining the interconnect. The architecture which is normally contained in a special purpose IC must now be developed and debugged in order to begin the Ethernet hardware evaluation.



Figure 4: GTP Prototype.

## CONCLUSION

Jefferson Lab's new Global Trigger Processor is a powerful, flexible platform well suited to handle the processing and communication requirements of the upgraded trigger system. It leverages the VXS architecture to build on both the experience and hardware of legacy VME systems. However, the GTP is also capable of a myriad of other tasks given its interconnect with the payload boards within the VXS crate, Ethernet and fiber optic connections to external devices, and plentiful onboard storage. While still in development, the GTP promises to adapt to evolving demands throughout the duration of the 12 GeV experimental program.

### REFERENCES

- C. Cuevas, B. Raydo, and S. Kaneta, "Description and Requirements for the VXS Global Trigger Processor," D00000-16-08-S005, Jefferson Lab (2011).
- [2] "VXS VMEbus Switched Serial Standard," ANSI/VITA 41.0 (2006); http://www.vita.com
- [3] "VXS 4X Serial RapidIO<sup>TM</sup> Protocol Layer Standard," ANSI/VITA 41.2 (2006); http://www.vita.com
- [4] "VXS Overview," VMEbus International Trade Association, (2010); http://www.vita.com/home/MarketingAlliances/vxs/ VXS%20Overview.pdf
- [5] "Aurora 8B/10B Protocol Specification," SP002, v2.2, Xilinx (2010).