Technical Deep Dive Series Part 3 of 4
Data Science in manufacturing is really, really difficult. That’s not necessarily anybody’s fault; rather, it’s a by-product of years of PLC and SCADA companies masquerading as data software companies — and of data software companies refusing to adapt to advances in technology. It’s the worst-case nexus of poor network infrastructure in most manufacturing plants, expensive and low-functioning MES products, and data structures driven by the business domain that don’t properly aggregate time-series data.
A big part of the solution was covered in Part 2 of this series, where we discussed a system governed by a single MES Ontology. But that only covers the ingress of data into the system of record. Just as important to usability and data science is retrieval of the data from the system, as well as the ability to establish and iterate on Analysis Ready Datasets (ARD) in an efficient fashion — both in terms of human interaction and computational efficiency.
Querying Data and MES
Querying data from an MES is the Achilles Heel of the collective manufacturing software industry. Siloed data with massive queries, polluted by complex joins, have been countered by the hyper-scale cloud providers clamouring for manufacturing companies to “just put it all in the cloud and our data lake and our AI will solve everything!” Neither approach will ever produce the efficient outcome that is necessary for continuous improvement. Why? Several reasons.
First, they are both massively failure prone. Gartner estimates that almost 70% of all MES implementations experience some level of failure due to the project’s inability to deliver on its promises. The most common culprit seems to be failure of the system to query data in a fashion that is efficient and useful to the business.
It’s also important to remember that the theoretical dataset representing a manufacturing process is neither unknown nor should it be poorly formed. ISA-95 establishes a schema for a single MES ontology. . The result is silos of data that must be queried by complex SQL statements or, in the best case, retrieved by inefficient REST endpoints that require chaining or return excessive payloads that must be filtered.
How Data Science is Effective in Manufacturing
For data science to be effective in manufacturing, the dataset on which it runs must be well defined and consistent. The quality of the data must also be robust and not in need of cleansing. Whether the Analysis Ready Dataset comes from a SQL query that is 3200 columns wide and takes 4 hours to run or from a “just throw it in there” data lake environment doesn’t really matter. Both approaches are inefficient and unlikely to yield meaningful results. Data scientists tend to be quite bullish on their ability to create meaning from large data sets. But finding meaning that translates to profitability in manufacturing takes significantly more planning. And that is where the real value resides.
Libre Provides Solutions
Libre mitigates these failures by streamlining the data structure and improving the process of data retrieval. The first part of the solution was covered in Part 2: one ontology for the entire MES knowledge domain. This removes the nightmare that is caused by data lakes and replaces it with a consistent data set that is known to be well governed.
The other part of the solution comes from the use of GraphQL API to retrieve the data. We’ll dig into that in Part 4.