## POST-MORTEM ANALYSIS OF FAILURES IN ACCELERATOR SYSTEMS

Martin Heiniger, Steven Hunt, PSI, Villigen, Switzerland

#### Abstract

Accelerators consist of many highly complex interrelated systems. Identifying the causes of a failure, among many effects generated by that failure, can be difficult without good tools.

At the Swiss Light Source (SLS) a system has been developed to continuously record all analogue and digital inputs directly connected to the control system at high speed into a circular buffer.

On a significant event, such as beam loss, this buffer is frozen, and its contents saved for later off-line analysis. Analogue signals can be saved with a resolution  $10\mu s$ , and digital inputs with a resolution of  $1\mu s$ . Each channel has a memory depth of  $64\ k$  samples. In normal circumstances, if a beam loss occurs, this event is detected in an EPICS system connected to a parametric beam current transformer. An EPICS message is then sent to the SLS event system transmitter, which generates a beam loss event. This event, like all others, is received by all VME crates.

For those crates where post mortem analysis is activated, the timing receiver is programmed to generate a hardware output (TTL pulse) on receipt of this event, which is connected to all analogue and binary input cards. This signal freezes the circular buffer, but still allows normal reading of the latest value through the control system.

The event also generates a software interrupt causing the reading of the circular buffers as EPICS waveform records. These records are flagged to be archived on processing, which means they will be saved to the central archiver, from where they can be retrieved using the standard EPICS tools. Triggering the system using the timing system and hardware triggers ensures the memories are frozen in all crates at the same time ( $\pm$  1  $\mu$ s) so it is possible to compare signals in different crates.

### **MOTIVATION**

At a user facility, such as the Swiss Light Source, one of the primary goals is to maximise availability of beams for the experimenters. One problem faced is the occasional unexplained beam loss. It can be particularly difficult to differentiate between cause and effect in these cases, as loss of beam will, in itself, cause many sensors to change their value. It is therefore necessary to be able to record the history of a large number of sensors with sufficient time resolution to identify the "first fault". Ideally it is further necessary to ensure that the timestamps of this recorded data is synchronised to better than the time resolution of the readings. The control system must continue to function during and after beam loss.

#### HARDWARE

It was necessary to select standard hardware components for SLS [1] that could support the features that we required, even if they were not fully exploited in the initial phase of commissioning and operation. In addition to the requirements for post mortem analysis, components were chosen to support the features expected in a modern control system including

- High I/O density
- Hot swap
- Plug and play
- EMC protection

# Analogue Input modules

The analogue input modules used at the SLS are the IP-ADC-8401 industry pack analogue input modules from Hytec Electronics Ltd. [2] They are used with the Hytec VICB8002 or 8003 IP carrier boards to give the advantages of VME64x (Hot swap, EMC protection etc.), while having the flexibility of industry pack, only installing the number of channels required. These modules have the following features:

- 8 non-multiplexed channels per module
- 100 kHz sample rate
- 16 bit (15 bit + sign) resolution
- Differential input
- History buffer (64 k samples per channel)
- Internal or external clock.
- Continuous or programmed number of samples.

# Binary input modules

The binary input modules used at SLS are the VICB8001 binary input/output modules from Hytec Electronics Ltd. These modules, when used with appropriate rear transition boards, offer the following features:

- VME64x single width module
- 64 input/output channels
- 1 MHz (max) sample rate
- Rear I/O
- TTL or 24 volt isolated operation
- Debounce
- Change of state detection
- External trigger mode
- 64 sample (64 bit wide) history buffer.

# SYSTEM OPERATION

During normal operation (no beam loss) all analogue inputs are read at a programmed rate (typically 10 Hz), or are triggered by a hardware or software trigger on an event such as storage ring injection. Typically a number

of samples are read from the memory and averaged to reduce noise and optionaly increase resolution.

# Detecting beam loss

A parametric beam current transformer is used to detect beam loss. This device is normally used to measure beam current, from which the beam lifetime can be calculated. This measurement is made at ~3 Hz synchronized to the Linac Trigger, and so is too slow for beam loss detection. For this reason a second reading is made in parallel, but triggered at 100Hz. Each reading is compared to the last, and if the value drops to or near to zero a beam loss is assumed. When this occurs, an EPICS channel access message is sent to the crate containing the timing event generator, requesting the transmission of a beam loss message to all VME crates.

# Distributing the beam loss event

When the request is received in the timing system crate, the beam loss event is transmitted on the timing system. The timing system can send out 16 bit messages (events) consisting of a 8 bit event number, and an optional further 8 bits that can be used to distribute fiducial clocks or high priority events. Events come from a number of sources:

- Cyclically from a programmed memory,
- From a hardware input pulse, or
- Via an EPICS channel access request.

The output event stream is transmitted at 1 GHz over multi mode fiber optic cable. Fanout units are used to provide an output stream to each area (Linac, booster sector, ring sector, and beam line), where a further fanout unit provides a dedicated fiber to each VME crate.

## Receiving the event

Each VME crate is equipped with an event receiver card, programmed to detect and act on the arrival of certain events. Action taken can include:

- Generating a hardware pulse after a programmable delay.
- Generating a software event causing EPICS records to process

For post mortem analysis the boards are configured (by EPICS) on receipt of a beam loss event to generate a one second long, negative polarity hardware pulse, and an EPICS event which can trigger the processing of any number of EPICS records.

### Freezing the history buffer

The TTL output of the event receiver is fed in parallel to the front panel inhibit input of all 8002 carrier boards which have 8401 analogue input industry packs fitted. The ADC's have been configured for front panel inhibit enabled. When this signal is low, new values are not written to the memory, but the latest value register is still updated. The EPICS software driver reads the Control and Status Register (CSR) of the board before reading data from memory. If the inhibit active bit is set, the driver does not read data from memory, but from the latest value register. The signal is therefore still available to the

control system (Fig 1), but for this short time it will not have the improved noise rejection and increased resolution possibilities.



Figure 1: EPICS channel display

# Reading the data

Each analogue input channel has configured, in addition to its normal analogue input record, a waveform record to record the history data. These records are processed by the EPICS event generated by the event receiver at the same time that the TTL signal freezes the history buffers of each ADC channel. When a waveform record reading a Hy8401 channel processes, the requested number of samples are read from the ADC memory (Fig 2). After a post mortem event, when the memory is not being updated. The data will not be overwritten while the inhibit signal is held low by the timing card. In practice, inhibiting the memory update for 1 second allows reading of a 2000-point waveform from the 82 ADC channels in our most complicated crate into an EPICS waveform record. At this point, even after the inhibit signal is deactivated, the waveform records contain the post mortem data, which can be read using the normal EPICS tools.



Figure 2: EPICS waveform plot

## Saving the data

By configuring the EPICS archiver to save these waveform records using the monitor feature, whenever the records are processed, the waveforms are saved to permanent storage. Normal archiver tools (like archive export) can then be used to extract the data for analysis.



Figure 3: Extracted waveform data from archiver



Figure 4: Like Fig. 3 but in more detailed time resolution

### **STATUS**

This system has been put into operation, for analogue signals only, on one of the five SLS RF plants. It is performing as expected and data has been saved on each (intentional or unintentional) beam loss. The RF specialists are now able to access this data, and download it (Fig 3 and Fig 4). We are in the process of determining how best to analyse the information gathered. The system is limited to those subsystem taking advantage of the standard SLS hardware. Other signals, for instance machine interlock signals read through a PLC, cannot be integrated cleanly into the system.

#### **FUTURE IMPROVEMENTS**

The system will now be extended to all other RF plants. before adding other systems, particularly vacuum and beam pipe temperature readings. The system will also be extended to binary signals, but this will require the addition of EPICS waveform support to the Hy8001 EPICS driver. It is intended to develop analysis tools to aid users in identifying the first fault, as well as general correlations.

### **CONCLUSIONS**

By selecting hardware that supports our requirement for post mortem analysis, and making use of the features available in our timing system, we are able to provide a powerful tool for analysing the causes of beam loss at the SLS.

## REFERENCES

- [1] http://www.sls.psi.ch/controls/hardware
- [2] http://www.hytec-electronics.co.uk