Article

Machine Learning at the Edge

Extracting valuable insights out of data collected from machine sensors can be hard, often requiring analyzing data from many sensors in parallel.

Due to the complexity, machine learning (ML) methods are becoming more and more popular to analyze these datasets.

In this article we will discuss why and how you can run ML models at the edge.

Crosser Edge Analytics with ML Models

Why ML @ Edge?

There are two main use cases for ML in industrial IoT:

Anomaly detection
Extract higher-valued features, such as remaining uptime

For both these use cases there are many situations where it’s most natural to execute the Machine Learning model at the edge, close to the source of the data. The reasons typically fall into one of these three categories:

Results are only needed locally
For an anomaly detector that analyses data from one machine and then triggers an action in another machine or a local system, sending data to the cloud for analysis just adds complexity and cost.
Latency is critical
For machine-to-machine triggers latency is often critical and timing requirements may not be met if data is analyzed in the cloud.
Data volume must be reduced
In many cases bandwidth to the cloud is limited, unreliable or costly, especially on mobile assets. Using ML to analyze the full dataset locally and then just send triggers or higher valued features to the cloud could be a more optimal solution.

The ML Workflow

The workflow for machine learning consists of two main steps, developing the model and executing the model. The first step is an off-line operation where stored data is used to train and tune a model. Once satisfactory results are achieved the trained model is deployed in an execution environment to make predictions based on real-time data. The edge is typically used only for executing the ML model. However, ML model development is an iterative process where the model may be optimized/improved over time, when more data becomes available or the architecture is refined. Hence you should expect the ML model in the edge to be updated several times during the life cycle.

Crosser ML Workflow at the Edge

Machine Learning for Streaming Data

When developing/training a ML model data is prepared so that it matches the requirements of the ML environment. Some common preparation steps are:

- Remove outliers and invalid data, fill in blanks
- Scale sensor values
- Extract features, such as mean/variance or convert to frequency domain
- Align values on time

When executing the ML model in a streaming environment all these operations must be applied before the data can be sent to the model. Especially the last operation, aligning data on time, requires some special attention.

When training a model data is usually stored in files or a database with all sensor values present for each time step, so that the model gets the same set of data each time. In a streaming environment sensor data is received serially, with each sensor sending data at repetitive intervals but independent of all other sensors and possibly at different repetition rates. Before we can deliver streaming data to a ML model we must therefore align the data on regular time boundaries and potentially repeat data from sensors that deliver data with a lower rate.

Crosser Preparing Data ML at the Edge

Machine Learning with Crosser

The Crosser Edge Streaming Analytics solution simplifies the development and maintenance of edge computing by offering a flow-based programming model, through the FlowStudio visual design tool, and central orchestration of edge nodes through the EdgeDirector. Both these tools are available through the Crosser Cloud service. In the edge the Crosser Edge Node software is installed as a single Docker container and flows are then easily deployed and updated through the cloud service on any group of nodes with a single operation.

For more information on the Crosser Edge Streaming Analytics solution, go here →

In addition to the standard tools for cleaning and preparing data, the following features are available to support running ML models at the edge:

Standard Python environment accessible from within flows
Central Resource catalog to manage ML models and Python scripts
Join module for aligning streaming data on time intervals

Bring Your Own AI

Python is the most common environment for developing ML models today. Even so, there are a large number of alternative setups being used. There is Python version 2 and 3, and a large number of ML frameworks, such as Scikit-learn, Tensorflow, Pytorch and several commercial options. To make sure we can host your ML model independent of the choices made by your data science team Crosser has decided to introduce the “Bring Your Own AI” concept.

In the Crosser Edge node you have access to a standard Python environment (version 2 or 3) and can then install any additional libraries/frameworks that can be installed using the standard Python tool chain. This environment is configured through a standard flow module, where you setup the libraries needed and the code you want to run. In this way you can be sure your model can run in the correct environment, as expected by your model developers.

Crosser Machine Learning Models

Even though Python may be the most common choice for many machine learning projects today, and the obvious choice for the initial ML support on the Crosser edge node, there are other alternatives, such as R, Javascript, Java, Julia… Crosser intends to support the most relevant run-time environments to make running ML models at the edge as simple as possible

Initiatives such as the ONNX format, initiated by Microsoft and Facebook, aims at providing a standard exchange format for ML models, so that any runtime environment that supports ONNX can execute a model, independent of which framework that was used to build the model. This will also make it easier to run ML models at the edge. Native ONNX support is scheduled for mid 2019 on the Crosser edge node.

Managing ML Resources

In order to execute a ML model at the edge, the trained model must be present in each edge node that needs it. In addition you may need some Python code for final adaptation of the streaming data and to map the model results into a format that can be used by local consumers.

The central Resource catalog in Crosser Cloud is used to manage all your ML resources. You can upload trained ML models as well as Python scripts and then easily reference them when building flows. When you deploy these flows onto edge nodes the system will make sure that all resources needed are downloaded into the relevant edge nodes.

When a flow is updated, e.g. by referencing a new ML model, all edge nodes are automatically updated with a single operation. Flow versioning is used to keep track of your changes.

To know more about this solution and sign up for a demo or trial contact us here →

01 Feb 2019

About the author

Goran Appelquist (Ph.D) | CTO

Göran has 20 years experience in leading technology teams. He’s the lead architect of our end-to-end solution and is extremely focused in securing the lowest possible Total Cost of Ownership for our customers.

"Hidden Lifecycle (employee) cost can account for 5-10 times the purchase price of software. Our goal is to offer a solution that automates and removes most of the tasks that is costly over the lifecycle.

My career started in the academic world where I got a PhD in physics by researching large scale data acquisition systems for physics experiments, such as the LHC at CERN. After leaving academia I have been working in several tech startups in different management positions over the last 20 years.

In most of these positions I have stood with one foot in the R&D team and another in the product/business teams. My passion is learning new technologies, use it to develop innovative products and explain the solutions to end users, technical or non-technical."