ARTICLE

Data Management for Manufacturing: from databases to Digital Twins

Data, data, data. Next to your personnel and products, data is one of your company’s most precious commodities. And as Industry 4.0, digital transformation, and smart technologies continue to re-shape the manufacturing landscape, data is key to staying competitive. As business leaders, it’s imperative to have a clear understanding of data management tools and how they impact manufacturing.

Not only does this ensure your business uses each correctly and effectively in your ecosystem, but it also allows you to maximize their benefits and enhance the value they bring to your business. 

The following is a data management primer for your reference. You can learn more about the various data management tools, how they’re used in the manufacturing landscape, and what value they bring.

Data storage definitions

Data management tools can be confusing for any manufacturing company. Let’s unpack some of the most common data solution terms. 

Traditional data management solutions

Data lake

A data lake is a central repository for storing data in its native format. Just like a lake, it has a flat architecture; data is simply dumped into it. Each piece of data is assigned a unique identifier that allows it to be sourced during queries.

Schema and data requirements are not defined until the data is used. Data lakes are frequently used as a holding area for data before it’s moved into a more structured storage container. Data lakes can be on-prem or in the cloud. 

Database

A database is an organized collection of information or data stored and accessed electronically. Databases allow information to be easily accessed, managed, modified, updated, controlled, and organized.

There are numerous types of databases: for example, relational, hierarchal, and object-oriented. Typically, they have a two-dimensional column-row structure. Database examples include Oracle, SQL Server, MS Access, FileMaker.

Data warehouse

A data warehouse is also a data repository. However, unlike a data lake, a data warehouse is structured.

Typically, data warehouses are subject-oriented, integrated, time-variant, nonvolatile (data can’t be changed or deleted once it’s in), and summarized. A business can have multiple data warehouses for multiple subjects. They store historical or summarized data. 

Innovative data management solutions for manufacturing

SCADA systems and Digital Twins aren’t usually viewed as typical data storage environments. However, we included them because they both capture real-time data. 

SCADA

This acronym stands for Supervisory Control And Data Acquisition. SCADA is a control system architecture that includes computers, networked data communications, and graphic user interfaces for high-level machine and process supervision, such as temperature control, energy use, machine speed, etc.

SCADA systems collect data via networked devices and sensors connected to PLCs and/or RTUs. The information is seen on a human-machine interface (HMI). SCADA systems collect and report real-time data. 

Digital Twin

A digital twin is a digitized replica of an object, product, process, supply chain, or complete business ecosystem. For manufacturers, Digital Twins can be built for assets, specific production lines, by end product, or for any other “real world” scenario within a production process. 

It’s important to note that Digital Twins are not limited to simulations: they use real-time data coming from the physical entities they represent. Data is run through a model that makes it more usable for analysis and understanding. In the case of Braincube’s Digital Twin, the model captures, maps, and structures process variables before adding them into a continuously updated database.

Accessing this database in a virtual environment, such as via an IIoT Platform, enables teams to modify and test changes in applications, models, or third-party programs. This enables teams to validate changes before introducing them to the physical system. Regardless of how you build a Digital Twin or what kind of Digital Twin you have, the end result is a digital representation you can use to obtain deeper visibility into your production. 

Your manufacturing company is likely already using most (if not all) of these data management tools. They are the backbone of a virtual environment.

Understanding how they can be optimized for different uses and tasks allows you to make the most of your technology investments now and in the future.  

Here’s a breakdown of these manufacturing data management solutions in four key value areas.

Cost of data management solutions

Technology is expensive and costs are always top of mind. Having a clear understanding of your data investments can help prepare for maximizing ROI.

Data lakes are on the low end of the investment scale. A data lake gives you a lot of data storage relatively cheaply; you get a large volume for a lower upfront cost. If you want to capture every piece of data your company produces, data lakes fit the bill. 

A downside of data lakes is that they are not designed to aggregate information quickly. Data lakes’ lack of structure makes retrieving data slow and cumbersome. What you save in initial pricing may be spent in time or other systems once you start trying to retrieve data from your lake. Oftentimes, companies use data lakes as a holding zone before moving data into another more structured system or to a team of data scientists to categorize it.

Databases are on the more affordable end of the pricing scale and they’re very good at storing large volumes of data. Their flexible storage costs typically ebb and flow with needs. Their simple structure (rows and columns) makes retrieving and using information relatively quick and easy. Every technology infrastructure has a database component, and some have more than one. 

There are additional, hidden costs associated with databases, particularly around data quality. As your databases are used, data quality typically erodes. Users don’t follow data entry protocols, customizations and workarounds are added in, and redundancies are created. These issues usually come to light during upgrades. Over time, this creates performance and latency issues that are costly to fix. 

Data warehouses are on the higher end of the cost scale. Costs are determined by data volume, which quickly drives up prices. However, that cost gives teams a highly structured data environment since you’re paying to have historical data sorted as it’s stored. This can result in cost savings later in terms of ease of use and accessibility (though it may still have some latency). 

Both SCADA and Digital Twins make data readily—if not immediately—available to teams. This capability isn’t something that other data storage options can do, and it has huge ramifications for manufacturers. The ability to “see” what’s happening on a machine or production line at the moment is key to efficiency, product quality, machine health, and cost savings. 

Although they are expensive, SCADA systems and Digital Twins bring valuable insights and savings to operations. They more than earn their keep and ROI.

Because these are highly specific systems, they sit at the top of the manufacturing data management tool price scale. Given the insights and savings they bring to production and business operations, though, they more than earn their keep and their ROI.

System agility and data accessibility

Data management systems aren’t just for storing data. You want to mine data for insights and trends that can keep your business competitive. Therefore, it’s important to understand each option’s agility and accessibility. How easy is it to access and manipulate the data? What level of expertise is needed to do this work? 

Data lakes are extremely agile but have limited accessibility. Their flat architecture and unstructured data make configuring and reconfiguring the data for data models, queries, and applications easy. 

However, because data is unstructured, it requires technical expertise to truly use it. This means data isn’t readily accessible to everyone. Instead, data access is limited to data scientists and data developers (or IT teams). This, in turn, creates a bottleneck for other users at your organization, including operators, engineers, plant managers, and the C-Suite.  

By comparison, databases and data warehouses have limited agility but high accessibility. Adjusting their structures requires technical expertise and can take significant time. It’s a slow, delicate process. However, their structured data protocols make data accessible to a wider range of users. 

In this era of citizen data scientists, accessibility is key to success. Employees at every level depend on data to be effective in the workplace. Using data management technology that facilitates data democratization allows manufacturers to harness the collective knowledge and expertise of all their personnel, instead of narrowing it to only technical teams. It’s a way of equipping process experts with self-service analytics and people-powered AI to drive discovery, without needing to write code.

SCADA systems and Digital Twins both facilitate access. With a seamless, real-time flow of information coming from both these technologies, teams are granted better access to data. However, just because both SCADA systems and Digital Twins facilitate better data access, it doesn’t mean that they grant equal data usability

Just because both SCADA systems and Digital Twins facilitate better data access, it doesn’t mean that they grant equal data usability

SCADA systems have simplistic visual interfaces (HMIs) for communicating with machines. SCADA systems make it easy to monitor and control what is happening on the shop floor, which is valuable. 

However, most SCADA systems lack the ability to do anything with data. Teams can’t use a SCADA system to make discoveries or improvements. Visibility from a SCADA system can go a long way, but visibility alone won’t generate transformational changes that move your organization towards optimal performance. 

Furthermore, it may not be possible to bring together data coming from different devices into a SCADA. Most SCADA systems require all the sensors and devices at your facility to be from the same brand or company in order to compile data from across the production line. Most manufacturers have different brands of machinery and equipment in their factories. This significantly impacts data usability because it means teams can’t access all their production data at the same time. Teams will likely continue working in silos, making it difficult to transform an entire organization using data.

Digital Twins, like those powered by Braincube, incorporate and enhance SCADA data for broader use cases across the organization. Digital Twins give better context to data, which can then be leveraged in apps or other systems.

Digital Twin data can be leveraged in Braincube’s apps, like the Advanced Analysis App, to make production optimizations.

For example, a SCADA system can pull in temperature, vibration, power usage, and other key process variables. Braincube-powered Digital Twins take this data and add context through metadata tagging, lag times, and other sources of variation (e.g. supplier information). This detailed level of data equips teams with meaningful information that can be put to use instead of stored and only used when needed.

Braincube’s advanced Business Intelligence (BI) applications make it even easier to access and use the data coming from a Digital Twin. These plug-and-play applications are designed specifically for manufacturing use cases and have user-friendly interfaces. Apps make Digital Twin data immediately accessible and usable to employees at every level of the organization. 

As a result, teams don’t need advanced coding, model, or analytical skills in order to make meaningful discoveries from data: the heart of the Citizen Data Scientist movement. Apps perform heavy, complex calculations so that employees can focus on using their intrinsic process knowledge to drive results. 

Processing speed

When it comes to data management systems, processing speeds vary. Given that the manufacturing industry moves at lightning speed, understanding the speed variables involved to access and use data helps manage expectations. It can also help you determine which system is appropriate for your needs and goals. 

Data lakes are extensive data repositories. Their ability to accept data in any format is convenient. As stated above, a flat architecture and unstructured data schema make it possible to use the information in a variety of helpful ways—but doing so will take time. When a query is performed, the system must look at every single piece of data one at a time to parse out the requested data. How long this takes will depend on the size of the data lake. If your data lake becomes a data swamp, which can happen easily, processing times increase.

Databases and data warehouses process data more quickly. Their structured data and architecture allow for more targeted searches; the system can go directly to the pertinent data area. Latency is minimal and results are, in most cases, instantaneous. This makes them good choices for daily business functions: reporting, analysis, visualization, etc. They can keep up with the demand. 

However, legacy systems and data quality can impact processing speeds for databases and data warehouses. Noticeable latency can indicate it’s time for an upgrade or data cleansing.

SCADAs and Digital Twins supply information in real-time; there’s even less lag time than other systems. SCADA systems have a web of interconnected devices that deliver a holistic view of your production line as it is running. Immediate data helps maintain efficiencies and product quality while staying ahead of machine health and maintenance. 

Digital Twins make data immediately available, usable, and ready for analysis. They facilitate faster decision-making, innovation, and troubleshooting.

Digital Twins go one step beyond SCADA systems: data is immediately available, usable, and ready for analysis. They make it easy to explore data coming from your SCADA system without impacting live operations. The only processing speed hindrance could be your company’s bandwidth. Even with bandwidth limitations, processing delays are usually a fraction of the other data solutions listed here. 

With Braincube-powered Digital Twins, teams have continuous access to contextualized data that can be used in software for data analysis and visualization, optimization, and other continuous improvement efforts. These software tools may include Braincube’s BI apps, Tableau, Altair, or a variety of other systems. This facilitates faster decision-making, innovation, and troubleshooting.

Scalability

The ability to scale a data management system is a vital aspect to consider when choosing which route to go. Ensuring data solutions are interoperable across your infrastructure is key to scaling successfully. 

Like other traditional data management systems—data lakes, data warehouses, and databases—SCADA systems run on servers and have a traditional storage architecture. These hardware limitations mean that a SCADA system’s performance will be drastically impacted as new users are added to the system and start pulling data. In other words, the system’s architecture means it can only handle so many requests at a given time. 

Additionally, since the data lives on physical servers located on-prem, it is difficult to pull SCADA data from remote locations. Teams may not have difficulty accessing data within the plant where they are working, but it will take more time to run reports or pull data from other plants. This makes it difficult to cross-reference performance, collaborate on shared issues, or look at collective data for company-wide initiatives or goals. In the eras of the Connected Worker and democratized data, these limitations can have serious implications on business outcomes.

In other words, it is immensely difficult to scale a SCADA system to encompass an entire enterprise. They lack the ability to easily and quickly pull in data from multiple plants simultaneously. SCADA systems work well at the plant level but don’t serve well as enterprise-wide solutions—a trend that we don’t see going away anytime soon. 

This key scalability limitation of SCADA systems is overcome by the use of Digital Twins. Digital Twins use (and produce) Internet of Things (IoT) data. IoT data can come from any relevant source without your organization, regardless of sensor or processor company. 

What’s the difference between IoT and IIoT?

Learn about the difference in this short article.

Since Digital Twins use IoT data, Digital Twins are not limited to using the same sensor brands or devices like SCADA systems. Digital Twins can accommodate data from any system or device, modifying it and making it readily available for anyone to use.

For example, IoT makes it possible to pull in MQTT (e.g. data exchanges between an MES and SCADA system), XMPP (e.g. communications coming from computerized equipment), CoAP (e.g. web-based data), or REST (e.g. data between software) data into your Digital Twin.

Leveraging the Internet also means there are fewer limitations in terms of the number of users, pull request volume, or other usage roadblocks typically associated with on-premise architectures. It is also easier to seamlessly access data from other plants, countries, or networks via an Internet connection—something that on-prem systems will always struggle to provide. 

If you’re weighing the scalability of SCADA systems and IIoT, consider this: in time, you may pay significantly more for multiple SCADA systems instead of one IIoT solution.

When the time comes to scale Digital Twins to other facilities, your third-party vendor handles almost everything. Here at Braincube, our Cloud and interoperability capabilities allow us to scale every aspect of your company’s data systems (including Digital Twins) quickly and easily—with very little involvement from your IT team. By comparison, on-prem solutions like SCADA systems require much more effort on behalf of your internal teams to achieve the same scalability. 

If you’re weighing the scalability options between a SCADA system and IoT, consider this: in time, you may pay significantly more for multiple SCADA systems instead of one IIoT (Industrial IoT) solution.

Conclusion

In reality, manufacturing infrastructures use a combination of these data management tools. Data warehouses store historical data. Databases handle daily functions. SCADA systems monitor floor operations. Digital Twins collect live data (including SCADA data), contextualize it, and make it readily available for exploring your production in new ways.

Understanding each data management tool’s capabilities and drawbacks ensures you use them to their advantage and make the most of their individual strengths. 

Most importantly, clearly understanding these data management solutions helps you optimize your greatest manufacturing asset: your personnel. Empowering teams with data keeps your production lines humming and your business moving forward. Aligning your data management tools to the employee and the task ensures efficiency and effectiveness.

Related resources: