Extracting valuable insights out of data collected from machine sensors can be hard, often requiring analyzing data from many sensors in parallel.
Due to the complexity, machine learning (ML) methods are becoming more and more popular to analyze these datasets.
In this article we will discuss why and how you can run ML models at the edge.
Why ML @ Edge?
There are two main use cases for ML in industrial IoT:
- Anomaly detection
- Extract higher-valued features, such as remaining uptime
For both these use cases there are many situations where it’s most natural to execute the Machine Learning model at the edge, close to the source of the data. The reasons typically fall into one of these three categories:
- Results are only needed locally
For an anomaly detector that analyses data from one machine and then triggers an action in another machine or a local system, sending data to the cloud for analysis just adds complexity and cost.
- Latency is critical
For machine-to-machine triggers latency is often critical and timing requirements may not be met if data is analyzed in the cloud.
- Data volume must be reduced
In many cases bandwidth to the cloud is limited, unreliable or costly, especially on mobile assets. Using ML to analyze the full dataset locally and then just send triggers or higher valued features to the cloud could be a more optimal solution.
The ML Workflow
The workflow for machine learning consists of two main steps, developing the model and executing the model. The first step is an off-line operation where stored data is used to train and tune a model. Once satisfactory results are achieved the trained model is deployed in an execution environment to make predictions based on real-time data. The edge is typically used only for executing the ML model. However, ML model development is an iterative process where the model may be optimized/improved over time, when more data becomes available or the architecture is refined. Hence you should expect the ML model in the edge to be updated several times during the life cycle.
Machine Learning for Streaming Data
When developing/training a ML model data is prepared so that it matches the requirements of the ML environment. Some common preparation steps are:
- - Remove outliers and invalid data, fill in blanks
- - Scale sensor values
- - Extract features, such as mean/variance or convert to frequency domain
- - Align values on time
When executing the ML model in a streaming environment all these operations must be applied before the data can be sent to the model. Especially the last operation, aligning data on time, requires some special attention.
When training a model data is usually stored in files or a database with all sensor values present for each time step, so that the model gets the same set of data each time. In a streaming environment sensor data is received serially, with each sensor sending data at repetitive intervals but independent of all other sensors and possibly at different repetition rates. Before we can deliver streaming data to a ML model we must therefore align the data on regular time boundaries and potentially repeat data from sensors that deliver data with a lower rate.
Machine Learning with Crosser
The Crosser Edge Streaming Analytics solution simplifies the development and maintenance of edge computing by offering a flow-based programming model, through the FlowStudio visual design tool, and central orchestration of edge nodes through the EdgeDirector. Both these tools are available through the Crosser Cloud service. In the edge the Crosser Edge Node software is installed as a single Docker container and flows are then easily deployed and updated through the cloud service on any group of nodes with a single operation.
In addition to the standard tools for cleaning and preparing data, the following features are available to support running ML models at the edge:
- Standard Python environment accessible from within flows
- Central Resource catalog to manage ML models and Python scripts
- Join module for aligning streaming data on time intervals
Bring Your Own AI
Python is the most common environment for developing ML models today. Even so, there are a large number of alternative setups being used. There is Python version 2 and 3, and a large number of ML frameworks, such as Scikit-learn, Tensorflow, Pytorch and several commercial options. To make sure we can host your ML model independent of the choices made by your data science team Crosser has decided to introduce the “Bring Your Own AI” concept.
In the Crosser Edge node you have access to a standard Python environment (version 2 or 3) and can then install any additional libraries/frameworks that can be installed using the standard Python tool chain. This environment is configured through a standard flow module, where you setup the libraries needed and the code you want to run. In this way you can be sure your model can run in the correct environment, as expected by your model developers.
Initiatives such as the ONNX format, initiated by Microsoft and Facebook, aims at providing a standard exchange format for ML models, so that any runtime environment that supports ONNX can execute a model, independent of which framework that was used to build the model. This will also make it easier to run ML models at the edge. Native ONNX support is scheduled for mid 2019 on the Crosser edge node.
Managing ML Resources
In order to execute a ML model at the edge, the trained model must be present in each edge node that needs it. In addition you may need some Python code for final adaptation of the streaming data and to map the model results into a format that can be used by local consumers.
The central Resource catalog in Crosser Cloud is used to manage all your ML resources. You can upload trained ML models as well as Python scripts and then easily reference them when building flows. When you deploy these flows onto edge nodes the system will make sure that all resources needed are downloaded into the relevant edge nodes.
When a flow is updated, e.g. by referencing a new ML model, all edge nodes are automatically updated with a single operation. Flow versioning is used to keep track of your changes.