DAQ2000

 
Home Research Classes

Control of large scale distributed DAQ/trigger systems in the networked PC era

This was a one day workshop that occurred inside the NAS2000 workshop. I was a scientific organizer and presented this paper as well.

Abstract

The HEP community is moving towards larger and more complex experiments, as exemplified by the coming LHC ATLAS and CMS detectors. The triggering and data acquisition hardware has matched this increase in complexity. With availability of cheap high power PC-based computing, many of the DAQ and higher level trigger components have been moving to a PC and network based model. These new trigger and DAQ systems are distributed, sometimes involving 100's of computers. The control and configuration requirements are larger than anything HEP has seen previously. These large scale systems must also be fault tolerant: 99% uptime does not mean all components will be working all the time. However, the DAQ/trigger cannot stop. The DZERO Run 2 upgrade DAQ and trigger are just such a distributed and fault tolerant trigger/DAQ system. A sophisticated trigger/DAQ resource manager has been implemented that can tolerate failures in the system and even reconfigure the system on the fly. As systems grow, monitoring the health of the system is more important than ever before. It is no longer acceptable to present a screen with an indicator light for every single component in the trigger/DAQ system. Instead, some effort must be made to intelligently present the information in compact and understandable form for shift personal. This information must aid them in quick diagnosis of hardware difficulties (especially those caused by other system the trigger/DAQ depends upon for operation) as well as early warning of problems. We present the control system, the monitor system, and the general software design and integration for the DZERO DAQ and Level 3 trigger, including our design plans for controlling the expansion of the farm and DAQ bandwidth with minimal work, as well as the ability for smart monitoring.

Presentations

Links

Research