# **FGC3.2: A NEW GENERATION OF EMBEDDED CONTROLS COMPUTER FOR POWER CONVERTERS AT CERN**

S. Page, C. Ghabrous Larrea, Q. King, B. Todd, S. Uznanski, D. Zielinski, CERN. Geneva, Switzerland

# title of the work, publisher, and DOI Abstract

author(s).

to the

Modern power converters (power supplies) at CERN are controlled by devices known as Function Generator/Controllers (FGCs), which are embedded computer systems providing function generation, current and field regulation, and state control. FGCs were originally conceived for the Large Hadron Collider (LHC) in the early 2000s, though later generations are now increasingly being deployed in the LHC Injector Chain (Linac4, Booster, Proton Synchrotron and Super Proton Synchrotron).

attribution A new generation of FGC known as the FGC3.2 is curmaintain rently in development, which is intended to provide for the evolving needs of the CERN accelerator complex, and other High Energy Physics (HEP) laboratories via CERN's Knowledge and Technology Transfer programmes. This paper describes the evolution of FGCs, summarises tests performed to evaluate candidate components for the FGC3.2 and details the final hardware and software architectures chosen. The FGC3.2 will make use of a multi-core of ARM-based System-on-Chip (SoC) running an embedded 2019). Any distribution Linux operating system in contrast to earlier generations which combined a microcontroller and Digital Signal Processor (DSP) with software running on "bare metal".

# **EVOLUTION OF FGC POWER CONVERTER CONTROLLERS**

The first Function Generator/Controller, FGC1, was an licence (© evolution of the controls developed in the 1980s for the Large Electron Positron (LEP) collider power converters, with one small controller embedded in each converter. In 1997, the MCU was updated and a Texas Instruments C32 DSP was added to support digital regulation of the current. B This evolved into a second version, FGC2, used in the LHC with error corrected memory to improve radiation tolerthe ance. In 2007, development started on a third generation, terms of FGC3, with the same MCU + DSP architecture but with newer and more powerful components. The resulting FGC3.1 was put into operation in 2012 [1]. Some 2700 under the FGC3.1s have been produced to date, for use in the CERN accelerator complex.

# **NEW REQUIREMENTS**

è The FGC3.1 embeds a Renesas RX610 32-bit MCU runnay ning at 100MHz for global interfaces, a Texas Instruments C6727 DSP running at 300MHz for function generation sequence of the sequence of t work 1 bandwidth for the rejection of perturbations to around Content 1 kHz.

used

In 2018, the FGC3.2 development project was launched for two main reasons:

- The FGC3.1 uses components which are now end-1 of-life, making procurement progressively more difficult and expensive.
- Certain circuits at CERN need more than 1 kHz cur-2. rent regulator bandwidth.

The goal for the FGC3.2 is to be a plug-compatible replacement for the FGC3.1, ready for operation from 2022, that can be manufactured until at least 2027 and which can provide a significantly higher regulator bandwidth.

There is not a specific bandwidth requirement, rather the objective is to create the fastest possible controller for a similar price to the FGC3.1. The new design will also increase the size of the memory, upgrade the networking speed and will address other issues, such as the complexity of writing software for two different processors.

# HARDWARE CHOICES

It was evident that a multi-core System-on-Chip (SoC) would provide the processing resources in the FGC3.2. Extensive market research was carried out, followed by feasibility studies on a few selected SoCs. The following processor families were considered: ARM, Intel Atom, AMD Embedded (G) and High Performance (R), Power Architecture, Intel Quark, AVR32, AVR and PIC microcontroller. Some of these are better suited to small, low-performance embedded systems and only ARM and Intel SoCs were studied further. In total, twenty-one were compared using twenty-seven criteria. Only ARM-based SoCs were retained due to their widespread use in embedded systems, their extensive community support and their superior Thermal Design Power (TDP) to performance ratios. The ARM family consists of three main series: M, R and A. M and R series parts were predicted to be too slow for FGC3.2's requirements, thus leaving series A (Cortex) as the preferred choice. After comparing the ARM Cortex-based SoCs, the four listed in Table 1 were selected for further testing.

Table 1: SoCs and Evaluation Kits

| SoC                    | <b>Evaluation Kit</b> | <b>Primary Cores</b>      |
|------------------------|-----------------------|---------------------------|
| Xilinx Zinq<br>XCZU9EG | EK-U1-ZCU102-G        | 4 x ARM A53<br>@ 1.50 GHz |
| TI AM5728              | Beaglebone Black      | 2 x ARM A15<br>@ 1.50 GHz |
| NXP<br>LS1046A         | LS1046ARDB-PB         | 4 x ARM A72<br>@ 1.80 GHZ |
| HiSilicon<br>Kirin 960 | Lemaker<br>HiKey 960  | 4 x ARM A73<br>@ 2.36 GHz |

17th Int. Conf. on Acc. and Large Exp. Physics Control Systems ISBN: 978-3-95450-209-7 ISSN: 2226-0358

Evaluation kits for these four SoCs were purchased but the Kirin 960 SoC was later dropped due to a lack of support and suitable documentation. Of the others, the Xilinx SoC is a special case as it integrates the programmable logic of an FPGA with the four ARM cores. The TI SoC is the only one with 32-bit cores (A15), the A53 and A72 both being ARMv8 64-bit devices.

#### SoC Benchmarking

Benchmarking included nine CPU tests (simple iteration, PI calculation, bubble sort, quick sort, fast fourier transform, n-body problem, secure hash algorithms, Core-Mark, and Linpack) and twelve memory tests (read, write and copy operations, each for single and burst char and double access). All benchmarks were run twice, with a program compiled with no optimization (gcc flag -O0) and with optimization (-O2). The relative results of the CPU tests with optimisation for the three remaining SoCs are presented in Fig.1 (*simple iteration* was optimized away).



Figure 1: Relative SoC performance with -O2 optimisation (longer is better).

The total time of all benchmarks for the NXP, Xilinx and TI SoCs were 129, 303, and 335 seconds respectively, making NXP's LS1046A around 2.5 times faster than the other two SoCs.

Since the FGC3.2 requires an FPGA anyway, the Xilinx SoC would have been a very convenient solution, however, the performance of the A53 cores is insufficient compared to the A72 cores of the NXP LS1046A. It is worth mentioning that Xilinx is planning to release a new family called the *Versal Adaptive Computing Acceleration Platform* that will contain dual A72 cores and FPGA logic on a single chip. However, this will come to market too late for the FGC3.2 and is likely to be too expensive compared to the four-core LS1046A plus separate Xilinx Artix<sup>7</sup> FGPA.

To conclude, the NXP LS1046A was chosen because of the superior performance, the relatively new 64-bit ARM A72 cores, the wide range of integrated peripherals and the long 15-year lifetime guaranteed by NXP, thanks to the automotive version of the component.

#### FINAL SPECIFICATION

As mentioned, the FGC3.2 needs to be plug-compatible with the FGC3.1, so the I/O is essentially unchanged. A summary of the final specification follows:

#### Typical Environment

- Enclosed in a 3U 220 mm chassis with a DIN41612 connector that is pin-compatible with the FGC3.1.
- Powered by +15V, -15V and +5V supplies, limited to 6W, 2W and 17W respectively.
- Designed for industrial environments, for 0 40°C ambient temperature with natural convection cooling at 25W full-load.

#### Analogue Input / Output

• Four analogue input channels and two analogue output channels, supporting CERN accuracy class 3, as defined in Table 2 [2].

Table 2: Definition of CERN Power Converter Accuracy Class 3

| Parameter                             | Value | Definition     |
|---------------------------------------|-------|----------------|
| Resolution                            | 1     | ppm            |
| Initial uncertainty after calibration | 10    | 2 x r.m.s. ppm |
| Linearity                             | 10    | (max) ppm      |
| Stability (12 h)                      | 10    | (max-min) ppm  |
| Short term stability (20 m)           | 2     | 2 x r.m.s. ppm |
| Noise (500 Hz bandwidth)              | 15    | 2 x r.m.s. ppm |
| Repeatability                         | 50    | 2 x r.m.s. ppm |

# Digital Input / Output

- 16 status inputs, 8 command outputs.
- 2 interlock opto-coupled inputs and 2 interlock relay outputs.
- 2 diagnostic busses, supporting up to 32 diagnostic interface modules (DIM). Each DIM having 24 digital and 4 analogue inputs and one trigger input for the first-fault latch with 8 µs timestamping.
- 6 identification busses, based on 1-Wire.
- Local chassis broadcast and point-to-point serial communications.
- 2 banks of 5 general purpose digital signals.

#### Network

- 100 Mbit/s FGC-Ether [3], and 1 Gbit/s network support.
- Capable of communicating over the control network with an un-programmed FPGA.
- FPGA reprogrammable via the SoC.
- 1 Gbit/s Ethernet to communicate with other FGCs or devices such as fast orbit feedback controllers or field measurement devices.

and DOI

the work, publisher,

title of

**MOPHA106** 





Figure 2: FGC3.2 hardware prototype in 4.5U board dimensions embedding full controller functionality: Ethernet communication ports (left), main processing and memories in the middle, powering (top right) and all peripherals with gluelogic embedded in the Artix7 FGPA (bottom right).

#### HARDWARE PROTOTYPE

A hardware prototype FGC3.2 has been produced, a diagram of which is shown in Fig. 2, with the key aspects highlighted. The objective of the prototype was to validate the core design hardware choices for the critical aspects of the FGC3.2, in particular:

- Programmable logic implementation, programming strategy and powering.
- I/O implementation, including the rear-side connector circuits as well as networking and local non-volatile storage using a commercial SD card.
- SoC implementation, programming, powering, and exploitation together with a commercial DDR4 memory.

To facilitate these objectives, a larger form factor was used for this prototype. This implementation will be modified for a second prototype which is intended to adhere to the final dimensions, and address notably:

- Consolidation of powering into a single topology.
- Modification of the RJ-45 network ports to match the final preferences of the design.
- Calculations for the power dissipation and thermal performance of the constituent electronics.
- Integration of the analogue acquisition chain, implemented on a separate board which is in parallel development.

#### Production and Assembly

The NXP LS1046 chip is in a 23 x 23 mm 780 FC-PBGA package, requiring dense multi-layer PCB routing as well as highly integrated 0201 decoupling capacitors using via-

in-pad technology. The prototype PCB has been manufactured using 16 copper layers with tracks of 4 mils and clearances of 4 mils. This requires a high-quality PCB manufacturer.

The constraints this imposes are higher than those of the FGC3.1 and the final version of the FGC3.2 is likely to have a similar complexity to the prototype. This means the FGC3.2 will be more complex to assemble and rework and additional effort will be needed for the design-for-test phase and the addition of debugging features which are to be used to facilitate the production and assembly of the board.

#### Prototype Observations

The SoC, DDR4 RAM and Xilinx FPGA and its peripherals all require very specific powering.

The prototype has 14 different powering voltages ranging from 0.6V to 15V. Moreover, the sequencing of powering between the SoC, DDR4 and the FPGA are challenging. Avoiding parasitic powering of components through voltage translators and buffers is difficult and remains to be completely solved in the next prototype.

To facilitate debugging, power domains were decoupled to allow flexible sequencing and measurements, profiting from the larger PCB area available. The next hardware version will optimise the powering to achieve the 3U 220mm dimensions compatible with the FGC3.2 enclosure.

One critical aspect remains thermal management. The SoC 1.0V core voltage is rated at 14.3A under full load conditions. This will require a custom radiator to be designed for the enclosure.

The LTC3866 DC-DC converter has a QFN-24 package, and whilst it succeeds in meeting the electrical requirements of the processor, being highly integrated, it causes difficulties in debugging due to limited visual inspection of the QFN device soldering.

#### SoC Integration

Even though the SoCs from many manufacturers use the same ARM Cortex cores, there is a wide diversity of approaches for common tasks. With the NXP LS1046A, some very specific and rather uncommon solutions are used. For example, pin functionalities cannot be set from software. They must be configured using Reset Configuration Word (RCW) bits that are loaded before the bootloader. Moreover, different internal modules have different endianness and information about these is limited. The main reference manual doesn't have a global overview and sometimes different names and abbreviations are used for the same thing. Given the difficulty with the documentation, NXP and the user community are crucial for support, however, to date, neither have been very responsive.

Nevertheless, the prototype board has been proven to boot correctly and run Linux. At time of writing, several key components such as powering, serial communication, Ethernet controllers and DDR4 RAM have been validated.

#### **FIELDBUS AND INTEGRATION**

The FGC3.2 will be physically connected to the control system via an Ethernet-based fieldbus known as FGC\_Ether [3] using an RJ45 Ethernet port on the frontpanel. Up to 64 FGC3.2s can be connected to a single FGC\_Ether network, which is managed by a real-time front-end computer system known as the FGC gateway. The gateway integrates the FGCs into services across the wider control system such as middleware, access control, timing and alarms. The software of the FGC gateway is based upon a framework called FGCD, a more modular form of which will be used for the FGC3.2 (see Controls Software below).

# **OPERATING SYSTEM**

Two options were considered for an operating system: so-called *bare-metal* (ultimately improved with a simple OS, like RTOS) and Linux. During benchmarking, those two options were compared and no significant difference in performance was observed. The most important and desirable characteristics of the bare-metal environment are its deterministic behaviour and rapid interrupt response. However, those can be greatly improved on Linux using specialised techniques and Linux comes with numerous drivers, saving time, that otherwise would have had to be spent developing similar functionality in a bare-metal environment.

NXP offer an OS based upon the Ubuntu Linux distribution, however a more cut-down, minimalist distribution is more appropriate for an embedded system such as the FGC3.2. To create a working OS stack, a project called *Buildroot* was used. The other major popular alternative is *Yocto.* NXP provides their own build system called *Flexbuild*, however this seems to be complex and inflexible. *Buildroot* was chosen over *Yocto* for its simplicity, given it has sufficient functionality for the FGC project. The *U-Boot bootloader* was used. Using *Buildroot* proved to be simple and straightforward.

To summarise the configuration used, the first stage bootloader and the second stage bootloader are realized using *U-boot*, which then loads the Linux Kernel stored in the *ITB* file (a file containing the kernel and device tree blob). There is no specific Linux distribution used: the user-space is based on *Busybox* and *Buildroot* packages. All the images are stored and loaded from an SD card. A Python script was developed to automate configuration, creation and testing of all OS components.

#### **CONTROLS SOFTWARE**

The current software architecture of the FGC-based control system is composed of two main components: FGC embedded firmware written entirely in C and front-end computer software written mostly in C with C++ elements. As both components are now being completely reworked or rewritten, an opportunity is present to modernize both the technology and the approach. In the upcoming versions of those components, a modern C++ standard will be used with an aim to share as much code as possible between the FGC and the front-end computer and to make the whole software composed of plug-and-play modules in a common framework. The code is compiled using Linaro GCC.

Pluggable modules in the new version of the framework will include basic controls services such as logging, communication, fieldbus integration as node or master as well as services specifically for the operation of power converters. The latter will be based upon the CCLIBs libraries and will provide, amongst others, regulation, function generation, signal calibration and logging [4,5].

# **KNOWLEDGE TRANSFER**

CERN has a team who actively help groups to license their technologies to member-state companies and accelerator labs around the world. In 2016, they funded a project to allow FGC3 controls to be integrated into the EPICS framework at other labs. In 2019, a second phase was funded to integrate FGC controls with the TANGO framework [6].

Given that the FGC3.1 will soon no longer be able to be manufactured, requests for small quantities can be satisfied from the stock at CERN, but large quantities will need to wait for the FGC3.2 to become available in 2022. Furthermore, for high-bandwidth applications, the extra performance of the FGC3.2 is the only option.

#### **SUMMARY**

Developing new controls electronics and software using a modern SoC is challenging in part because of the increased complexity and feature set. To benefit from this more advanced silicon requires Linux and the drivers that it provides, which is a major shift compared to the bare

from

Conten

17th Int. Conf. on Acc. and Large Exp. Physics Control Systems ISBN: 978-3-95450-209-7 ISSN: 2226-0358 and DOI.

metal of the previous generation. It solves many problems but cannot hide the increase in overall complexity.

publisher. Once this shift to Linux is made, it should ease the way to future generations that will refresh the SoC and other components, but otherwise not change the architecture dratitle of the work. matically.

#### REFERENCES

[1] https://edms.cern.ch/item/EDA-03372-V1-0

[2] https://edms.cern.ch/document/1854843

- [3] S. T. Page, Q. King, H. Lebreton, and P. F. Semanaz, "Migration from WorldFIP to a Low-Cost Ethernet Fieldbus for Power Converter Control at CERN", in Proc. ICALEPCS'13, San Francisco, CA, USA, Oct. 2013, paper TUPPC096, pp. 805-808.
- [4] Q. King, S. T. Page, H. Thiesen, and M. Veenstra, "Function Generation and Regulation Libraries and their Application to the Control of the New Main Power Converter (POPS) at the CERN CPS", in Proc. ICALEPCS'11, Grenoble, France, Oct. 2011, paper WEPMN008, pp. 886-889.
- Content from this work may be used under the terms of the CC BY 3.0 licence (© 2019). Any distribution of this work must maintain attribution to the author(s), [5] Q. King, K. T. Lebioda, M. Magrans de Abril, M. Martino, R. Murillo-Garcia, and A. Nicoletti, "CCLIBS: The CERN Power Converter Control Libraries", in Proc. ICALEPCS'15, Melbourne, Australia, Oct. 2015, pp. 950-953. doi:10.18429/JACoW-ICALEPCS2015-WEPGF106
- S. T. Page, J. Afonso, C. Ghabrous Larrea, J. Herttuainen, Q. King, and B. Todd, "Adaptation of CERN Power Converter Controls for Integration into Other Laboratories using EPICS and TANGO", presented at the ICALEPCS'19, New York, NY, USA, Oct. 2019, paper MOPHA105, this conference.

**MOPHA106**