Processing SCADA Alarm Data Records Offline with ELK

Open-source tools for industrial automation

Patrick Berry

Towards Data Science

· ~7 min read · April 28, 2022 (Updated: October 5, 2022) · Free: No

This article continues a series where we use open-source Data Science tools to analyze alarm and event logs produced by SCADA systems.

This is the third article in the series, previous articles included,

In Processing SCADA Alarm Data Offline with ELK, the following was presented,

An introduction to industrial control systems (a.k.a SCADA systems), their architecture, and purpose
The ElasticSearch stack (ELK) was introduced and installed
Alarm and event logs from a commercial SCADA system were cleaned and loaded into ELK
A simple analysis of the raw alarm events was completed using ElasticSearch Kibana and simple dashboards created

In SCADA Alarm Standards & Metrics the following was covered:

Alarm management standards
The alarm lifecycle
Alarm system performance metrics

That is where we have been, but where are we going? In this article, we will…

Examine the alarm life cycle
Process alarm events into alarm records
Augment the alarm records with information about the process area and module which is extracted from the alarm tag
Augment the alarm record with useful statistics

This will enable us in the subsequent article to create dashboards to analyze the alarm records using the industry-standard metrics described in SCADA Alarm Standards & Metrics.

SCADA Systems

As discussed in a previous article, SCADA is a generic term for a computer-based industrial control system. A SCADA (Supervisory Control and Data Acquisition) system can be used to monitor and control industrial processes which can include water treatment, manufacturing, power stations, food and beverage, etc. In fact, any industrial process with automated manufacturing equipment.

Alarms

Definition

The purpose of alarms is to alert operators to unusual or dangerous process conditions that require intervention.

Alarm Lifecycle

The diagram below presents a simplified alarm lifecycle that is aligned with that presented in the IEC standard.

For each alarm point, the monitored physical process contains two states, normal and abnormal.

So, each alarm will have two attributes, an alarm status (normal | alarm) and acknowledgment status (acknowledged | unacknowledged). Since the alarm has two attributes it can be in one of four states at any time. The allowable state transitions are shown in the diagram below,

Image by the Author

The timing diagram below indicates the typical interaction between the two signals and specifies three, timing values that will be used later in the analysis.

Image by the Author

Creating Alarm Records from Alarm Logs

The log files generated by our SCADA system describe each state transition (these were analyzed in Processing SCADA Alarm Data Offline with ELK). To be able to perform a more detailed alarm analysis (as described in SCADA Alarm Standards & Metrics the state transition events will be processed and we will generate an alarm record for the full lifecycle of each alarm.

Format of Alarm Log

The alarm log files that I have available have been generated by the Schneider Electric CitectSCADA product. The files were originally space-delimited text files, which we have previously converted to CSV. Below is an example of the first five records of a typical file,

Image by the Author

Format of Alarm Record

Below is the definition of the Alarm Record that we will store in the Elasticsearch database,

Image by the Author

Details of the individual fields as follows,

timestamp:

A timestamp for when the alarm was raised. The is the same value as raiseTime

desc:

Description of the device creating the alarm (e.g., Booster Pump #1 Current)

priority:

A numeric value indicating the alarm priority. For this data set, 1–5.

status:

The alarm acknowledgment status. (This is an internal variable of processing the alarm events and should not have been written to the alarm record).

p_state:

'Process State' corresponds to the Alarm Status. (This is an internal variable of processing the alarm events and should not have been written to the alarm record).

a_state:

Alarm state; A, B, C, or D aligning with the state transition diagram. (This is an internal variable of processing the alarm events and should not have been written to the alarm record).

raiseTime:

A timestamp for when the alarm became active.

ackTime:

A timestamp indicating when the alarm was acknowledged.

RTNTime:

A timestamp that indicates when the alarm transition back to the normal state (Return To Normal).

t_active

The length of time that the alarm was in the active state.

t_ack:

The length of time before the alarm was acknowledged.

t_interval:

The length of time from RTN to the tag again entering the alarm state.

chatter:

A Boolean indicating that this is a Chattering Alarm (i.e. an alarm that triggers >2 times per second)

fleeting:

A Boolean that indicates that this is a fleeting alarm (i.e. an alarm that returns to the normal state before the operator has a chance to act. Taken to be less than 1 second)

equip_code

module_no

process_no:

subprocess_no:

equipment_no:

Additional asset information is described below under Data Augmentation.

Tag:

An alphanumeric identifier for the alarm signal.

Data Augmentation

Many asset numbering schemes embed asset data into the asset identifier. Analysis of alarm records is made easier if the individual components of the asset identifier are extracted and placed into individual fields of the alarm record.

Below is the scheme used in the data being analyzed here,

Image by the Author

A simple regex expression was used to extract these components,

Code — Record Creation

The code to convert the CSV alarm event data to alarm records is contained in the file, csv-2-record.py which is available in the following Gist.

Main Function

The main function,

Processes that command line input parameters
Sets up loops to iterate over all data files
Reads each data file into a Pandas Dataframe
Process each row of the Dataframe using a state machine

Data Structure

As we need to process multiple alarm log entries before we have a complete Alarm Record entry that can be written out to file, we need an internal data structure to store the partially processed Alarm Records.

An AlarmRecord class has been created to store the data which includes a writeOut method to write the data out. AlarmRecord class is a Python @dataclass which is a useful library for managing classes that consist mainly of data.

A data variable is used which is a dictionary of AlarmRecords, using the alarm tag as the key.

AlarmRecords that are being processed are stored in the data variable (a Python Dictionary) which is indexed by the alarm tag (which is a unique identifier).

State Machine

To process that alarm log entries a state machine is required.

Image by the Author

Since there are only four states, it — then — else constructs have been used to create the state machine. This is about the limit that this would be viable, if there were any more states a more sophisticated implementation would need to be used (The GOF State pattern for example).

We will examine two transitions only.

If we process an alarm event where the alarm is active (on is true) and the alarm is unacknowledged (ack is false), then from our alarm state transition diagram we know that we should transition to state B.

This is depicted in the code below,

If we process an alarm event where the alarm is active (on is true) and the alarm is acknowledged (ack is true), then from our alarm state transition diagram we know that we should transition from state B to C.

This is depicted in the code below,

Loading into Elasticsearch

The file, csv-record-2-es.py (available in the Gist) is used to load the AlarmRecord CSV files into Elasticsearch.

The code below reads all records into a Pandas dataframe and then uses the bulk upload function in the Python Elasticsearch library to load the data into the database.

Future

To date, we have processed and analyzed our raw alarm log files and created alarm records. In SCADA Alarm Standards & Metrics we saw that the alarm management standards define 12 performance measures for alarm management systems. In the next installment in this series, we will create dashboards using ELK to evaluate our systems' performance against these 12 performance standards.

Resources

The full code is located in the following Gist.

Read Further

Thanks for reading, hope you enjoyed this article.

To explore further,

Subscribe to email notifications
Read Part 1 of this series, Processing SCADA Alarm Data Offline with ELK
Read Part 2 of this series, SCADA Alarm Standards & Metrics
Click on the 'follow' button at the top of the article
For all things Industry 4.0, check out my Industrial Digital Transformation & Industry 4.0 publication
Feel free to join my network on LinkedIn (remember to mention that you have read the article)

To support medium authors, consider a subscription.

Remember to click on the subscribe and follow button,

#scada #alarm-systems #industrial-automation #data-science #programming

Processing SCADA Alarm Data Records Offline with ELK

Open-source tools for industrial automation

SCADA Systems

Alarms

Definition

Alarm Lifecycle

Creating Alarm Records from Alarm Logs

Format of Alarm Log

Format of Alarm Record

Data Augmentation

Code — Record Creation

Main Function

Data Structure

State Machine

Loading into Elasticsearch

Future

Resources

Read Further

Reporting a Problem