Technical Deep Dive Series – Part 4 of 4
Data Infrastructure and MES – The superiority of GraphQL: In Part 3, we discussed why existing MES offerings fall over when attempting to implement data science — and that by using the Manufacturing Knowledge Graph, we can overcome the problem of siloed data and massive, complex joins.
There’s more to the story though, and that’s data retrieval. How do we find and return the potentially massive data sets that constitute Analysis Ready Data (ARD) without needing a supercomputer? GraphQL.
GraphQL has been debated endlessly in the world of web application development. Which is superior, REST API or GraphQL? Is it worth it to take on the learning curve of a new query language for marginal gains in performance? In terms of your e-commerce app, we don’t know. But when it comes to manufacturing data, the answer is a resounding YES. It is absolutely, unquestionably worth it.
We’ll address the top three reasons:
Faster performance
REST APIs are chatty. If not built properly they can result in large, successive chained queries on the client to produce the result presented in the user interface. This often requires n+1 calls to the API, where n is the number of records to retrieve. Imagine trying to build an Analysis Ready Dataset (ARD) by retrieving all Electronic Batch Records (EBR) for a single line for a year. If the API is not properly built (which it probably isn’t), there will be n+1 GETs to the API for those records.
In addition, if a portion of the record is not present in the EBR but is joined through a key or series of keys, the client will need to add that API call to the first set of calls, producing a very long wait for your data set to be fetched.
GraphQL queries are not inherently more efficient that REST queries. Rather, GraphQL solves the issue through multiple means. First is that it allows the server to aggregate the dataset on the server side, using a single query, so that the call stack between the user interface and the back end is less “chatty.”
Second, GraphQL allows multiple entities to be fetched in a single query. Instead of fetching an entity and then using another call to fetch another related entity based on the returned dataset, you reduce the call stack with one, targeted query. Because RESTful APIs typically only allow you to fetch an “entity” and are not able to target individual attributes, this also contributes to solving the second issue of over fetching data.
More targeted retrieval
As we already stated, RESTful APIs generally fetch an “entity.” To get to the attributes of that entity, you need to wade through the entire dataset to pluck out the one or two useful bits that you were after. If you need an attribute of another entity based on the value of the attribute that you retrieved from the first entity, you get to make another call to the API and then parse that entity as well. Generally speaking, this results in the retrieval of too much data.
GraphQL allow us to target the data that we need in the query and get only what we need, seriously reducing the payload in the response and the computation time needed to parse the returned data. GraphQL uses declarative fetching, which results in a perfect solution for large ARD. We return an attribute-based dataset, gleaned from multiple entities, if necessary, aggregated on the server side and returned to the user interface in a fashion that is ready to be presented. The end result is a streamlined call stack that returns only the analysis-ready data.
Decoupling of the application
It could be argued that REST APIs serve the purpose of application decoupling, and to a certain degree they do. But not completely. If one were to design a RESTful API in a fashion that produced similar payload results to GraphQL, two things would happen. First, the API would need to be designed with methods to target attributes, not entities, making for a huge API. Second, the API design would produce methods so targeted to a single application that there would be no point in publishing it.
GraphQL and GraphQL Federation overcome these problems. We already discussed how GraphQL uses declarative fetching to target only the data necessary, but GraphQL Federation further decouples application by solving the problem that is the Achilles Heel of current MES data fetching.
The MKG is a knowledge graph of the domain of an organization, based on the schema set forth in the ISA-95 standard. GraphQL Federation allows us to extend our data structure to the entirety of the Enterprise. By wrapping the REST endpoints of non-graph-based datasets in GraphQL, we “federate” the schema of those data sources into our MKG, transforming it into an EKG – an Enterprise Knowledge Graph – the holy grail of manufacturing data science.
The result of the EKG is a single API endpoint for the entire enterprise, which allows any user interface to declaratively state its required dataset and have it aggregated on the server side so that it is presentation-ready.
Fewer API calls. Less data parsing. GraphQL is clearly a superior API.
BTW – You can subscribe to your GraphQL queries. We’ll get more into that later…