Skip to content

How It All Began

In 2016, the OneBridge team was accepted into Microsoft's Machine Learning Accelerator program, an incubator to support the development of new products and technologies based on Microsoft's cloud and machine learning platforms. In that incubator, OneBridge was one of 10 companies that were selected out of thousands of applicants from all over the world. Here, they worked closely with Microsoft engineers and experts in the areas of data science, big data analytics, AI, and machine learning as they developed the core algorithms and the foundation of what became cognitive integrity management or CIM.

What was the initial problem they were trying to solve? Data alignment, specifically for inline inspections conducted by different vendors. A persistent problem that no one in the industry was willing or capable of reliably solving.

Around this time, P66 had built a software application internally to manage the workflow and analysis of inline inspection data, the operational side of the ILI data management, if you will. Together, they merged to become an early version of CIM that ingests, standardizes, and aligns inline inspections with previous inline inspections (regardless of vendor) and then allows the user to conduct automated analyses of the aligned and integrated data.

What started as a small team from Seattle and one partner client is now an enterprise-wide software solution utilized by 20 different pipeline operators and 500+ users. CIM has now “seen” millions of anomalies and hundreds of thousands of assessment miles. But what does CIM do? What started as a seemingly straightforward problem of data alignment has morphed into helping pipeline operators manage and analyze all of the data that's critical to asset integrity. 

What's the Big Problem? Big Data.

The way to effectively analyze large data sets is to validate and standardize the data in a way that ensures you can 1) trust the data and 2) analyze the data to create meaningful information and actionable insights to make data-informed decisions confidently.

If you have lots of data but cannot validate and standardize it so that it's turned into information, what good is the data doing?

And that's the "issue" with big data. Manual analysis processes that worked in the past are no longer sufficient. Automated algorithms, data science, and machine learning are not just preferred; they are required for processing big data. 

Machine learning provides the ability to normalize data by standardizing and validating its accuracy so that any insight created is associated with a higher level of confidence. – Matt Stevenson, Phillips 66

The First Step: Data Ingestion, Validation & Normalization

This is why the first process within CIM, called Assessment Planning, is a bit of a misnomer. Yes, users plan and manage integrity assessments within Assessment Planning. However, this is also where the *magic* happens or, more specifically, where data ingestion, validation, and normalization using machine learning occurs.

A series of classification models trained on over 5,000 ILI reports with hundreds of different formats and over 40 million reported anomalies (2022 numbers and therefore have increased) are utilized to review each ILI report. The classifier model "looks" at all the available data in the ILI report, i.e., the anomaly type, description, any additional comments, anomaly attributes (position, length, width, depth, orientation), and so on, provided by the report. It then decides what exactly is being described by the vendor and how it should be classified into a standard taxonomy.

Another critical part of this ingestion process is data validation, with hundreds of data quality checks performed as part of ingestion to help highlight gaps or inconsistencies in the data before moving on to the analysis process. This ensures that there’s confidence in the data being analyzed.

There are multiple ways that vendors can describe a given feature. CIM's machine learning classification model identifies and classifies all of the reported anomalies into a standard classification system. For example, instead of having external corrosion being reported in 4 different ways, as seen in Figure 1, i.e., *EXT ML, EXT Metal Loss, EXT ML, and Metal Loss EXTERNAL, it's reported one way in all five reports. This can provide an apples-to-apples understanding of what has been reported over time on this pipeline, what has changed, and whether a given inline inspection report is accurate or needs to be revised by the vendor.

CIM -Figure 1

Figure 1: Assessment Summary pre-analysis report showing the anomaly types and corresponding numbers per ILI report.

 

ILI to ILI Alignment 

Once data has been ingested into a data model and interpreted into a standardized structure, the next stage is alignment. This is accomplished via an automated alignment algorithm that uses distinctive patterns in joint lengths and anomaly geometry to allow the matching of welds and anomaly boxes between two or more inline inspections, which ultimately allows for calculating anomaly changes and growth rates.

Weld to Weld Matching

Girth welds are aligned by minimizing the length difference of joints that are matched between assessments while allowing for odometer calibration discrepancies, pipe changes (repairs and re-routes), and missing data. Through this alignment process, each pipe joint is assigned a master joint number that provides complete traceability for every joint in the system.

Such algorithms are able to identify common portions of the pipeline even in cases when ILI tools are launched from different locations or when traps have been added or removed, as long as there is enough distinctive pattern in the remaining joint lengths to identify the common segments. The alignment algorithm can also automatically detect flow direction and align datasets where, for example, the tool was run in the opposite direction, as well as handle re-routes and changes to line configuration.

Figure 2.1
Figure 2.1. ILI Alignment: Metal loss anomaly depths by distance for multiple ILI reports.
Figure 2.2
Figure 2.2 ILI Alignment: Metal loss anomaly orientation by distance for multiple ILI reports.

 

Anomaly to Anomaly Matching

Anomalies are assigned boxes based on their width, length, odometer, and orientation, and they are tested for overlap with anomaly boxes in other ILIs. The matching algorithm accounts for one-to-one and multi-matching, i.e., one-to-many and many-to-one. Common scenarios resulting in multi-matches are the alignment of a clustered MFL run to unclustered data, corrosion growth that merges multiple individual pits to single anomaly calls, and significant discrepancies from run to run (possibly vendor to vendor) about where to call anomaly boxes when the density of corrosion is high.

Screenshot 2025-02-26 155318

Figure 3: Joint view of anomaly boxes for two ILI reports from different vendors. 
 

Data Integration: Putting it all together 

In each of the previous steps, data ingestion and normalization, followed by alignment of each of those datasets, is working towards data integration. Multiple inline inspections can also be aligned with pipeline data, e.g., MAOP, CP, coating, depth of cover, type, etc., from GIS, as well as integrity data sets from other modules like external corrosion, e.g., Close Interval Survey. Anomalies in the current ILI are also automatically matched with any NDE measurements where the anomaly was previously evaluated and recoated/repaired. This information can be used to generate “historical unity plots” and API 1163 ILI performance evaluations.

Figure 4.1
Figure 4.1 Understanding corrosion depth over multiple ILI reports to determine corrosion growth but also how an anomaly was "called" in previous reports.
Figure 4.2
Figure 4.2 Comparing the depths of anomalies remaining in the pipeline that have been rendered inactive due to recoat or temporary repair to generate a "historical unity plot." This is utilized to determine ILI performance before analysis or selection of anomalies for further action.

This integration of pertinent integrity data allows for a deeper understanding of changes in an anomaly over time to understand corrosion growth and interacting features (a dent is called in a current data set where the metal loss was called in a previous data set) but also ILI performance/measurement bias and miss-calls or misclassifications. 

All of this data processing and pre-analysis reporting happens in the first step before the analysis of said data even begins. Stay tuned for Part 2 where we discuss the automated analysis! 

Recently, we've taken data standardization to a whole new level with the Vendor Portal - where ILI data is processed BEFORE it’s uploaded into CIM.

Click here to learn more or request a demo.