When does it make sense to use Crosser as an alternative to Kafka?
Kafka has become almost synonymous with real-time streaming applications today, often being the starting point when building a new streaming application.
However, is it always the best approach? In this article, we will show that there are many applications where Kafka may not be the best choice, and that these applications can be implemented easier, faster, and cheaper using Crosser's Streaming Analytics solution.
Although the underlying technology is different between Crosser and Kafka there is quite some overlap between the use cases the platforms can address. Comparing the Technology:
At its core, Kafka is a message broker with persistence that can be scaled to handle huge data volumes. A message broker is a useful tool when building large applications where producers and consumers of data need to be decoupled. Persistence is also helpful when data needs to be correlated over long time spans or across multiple out-of-sync streams. However, all of these benefits come at a significant cost. Kafka requires multiple servers and is complex to configure and operate and each cluster needs to be locally managed one-by-one
Crosser is low code and an in-memory streaming analytics solution optimized to be lightweight, modular, flexible and fast. It has a built-in message broker and persistent storage targeted mainly for buffering purposes and to guarantee “at-least-once” delivery of messages. The Crosser Node (processing runtime) only requires a single server (two for redundancy) and is centrally managed through the Crosser Control Center.
As you can see in the table there are significant differences in the minimum requirements of running a redundant installation. The infrastructure complexity adds additional operational hurdles and costs on top of the cost associated with the hardware needed.
Illustration: Kafka and Crosser Technical Requirements Comparison. Reference article here →
Complexity and barriers vs Simplicity and Agility
Kafka uses a concept of Events, Topics, Partitions, ZooKeeper, Kafka Connect, Stream APIs etc that is engineered to be highly scalable, elastic, fault-tolerant, and secure. It is optimized for advanced requirements of processing ultra-high volumes of social media events and for securing processing of bank transactions and other requirements of the largest of companies.
This comes with a price of complexity and high requirements of development skills. Kafka clusters need to be deployed and configured by highly trained IT teams and developing and implementing use cases requires access to developers. These are teams that typically are in high demand and often become bottlenecks which slows down the implementation and increases cost.
Crosser is built around the concept of building event streaming pipelines we call Flows with a low-code, drag-and-drop visual design studio where non-developers easily connect existing building blocks we call modules into processing Flows. Simplicity and Agility are guiding principles for the Crosser Platform.
Illustration: Kafka and Crosser Organisational Comparison. Reference article here →
Use cases ideal for non-Kafka solutions
There are many applications where the high-end Kafka features are unnecessary, and you can get up and running much faster and at a lower cost using alternative tools like Crosser. Let's take a look at some examples of such applications:
Event-driven synchronization of changes between applications/data integration
Synchronizing changes between two systems, either in one direction or both ways, is a common use case. For example, updating order information from a CRM system to an ERP system, or keeping contact information synchronized between a CRM system and a marketing tool. In such cases, we deal with streams of changes that need to be transformed to apply them to the destination system.
These change events can either be pushed from the source system whenever there is a change or pulled from the source system at regular intervals. If events are pulled, persistence is typically unnecessary; we only need to keep track of the timestamp of the last successfully delivered event, and the events can be pulled again from the source system.
If events are pushed, however, we will need persistence, since we will receive the events only once and must ensure that they are delivered to the destination. Still, if the volume of events is reasonable, we can implement persistence without having to set up a full-blown Kafka system.
Analyze and take actions on real-time data/automation flows
Analyzing streams of data to trigger actions is another common use case for streaming, with anomaly detection being the most frequent scenario. This may involve simple range checks or the use of machine learning models to analyze the data.
The message broker functionality could be useful if multiple analysis applications are used to analyze the same streams. Persistence might be necessary to ensure that the analysis results are delivered correctly, but it is generally not required to store the input streams, as analysis is typically performed over short time periods. These requirements can be met without needing to set up a full-blown Kafka system.
Send Machine data to a Datalake/Warehouse
Machines, whether on the factory floor or in the field, can generate massive amounts of data. To facilitate large-scale analysis and machine learning model training, it is often desirable to collect this data in a central data lake or data warehouse, typically hosted in the cloud.
The data sources, such as sensors and counters, produce streams of data that need to be sent to the central storage as soon as it becomes available. Apart from basic cleaning, filtering, and transformations, and possibly aggregations, no further processing is needed. Since the data is sent from each source to a single destination, a message broker adds no value. The basic processing applied requires no correlation over long time spans, hence persistent queues are unnecessary.
The data may need to be temporarily buffered to ensure successful delivery, but that's all. As the data sources are naturally distributed (machines, process areas, sites, etc.), the same ingest application can be replicated in each location to achieve the necessary scaling.
Other use cases
Examples of other use cases where a lightweight and low code solution can be considered instead of Kafka:
- Event & Stream processing with lower volume requirements
- Non-mission-critical applications
- Use-cases where exactly-once delivery guarantee is not needed
- Event-driven application integration
- Streaming ETL and ELT pipelines for data warehouses and data lakes
Non-Kafka use cases
- Industrial Edge: Requires connectivity to new and legacy industrial protocols together with data transformation and harmonization before data can be used.
- Closed-loop edge optimization with ML: Analyzing machine data with ML models to derive optimized settings or detect anomalies typically requires low latency.
- Distributed Edge: Mass roll-out of the same application to many distributed edge locations.
Use cases where Crosser is not a good fit
- Fraud detection: Requires access to long sequences of data to detect fraudulent patterns.
- Web site analytics: Several independent and out-of-sync streams of data must be joined in order to derive insights.
Besides being an alternative for Kafka, Crosser can also complement Kafka by either ingesting data from Kafka or sending data to Kafka. Crosser has pre-built Kafka connectors that allow our customers to combine Crosser and Kafka.
- Running Crosser distributed in remote edge locations for data pre-processing and sending data to a central Kafka cluster, for instance for industrial IoT data
- Run Kafka as a central cluster where IT resources are available and data volumes are significant. Run Crosser in branch-offices where skills levels are lower.
- Use Crosser as a low code streaming analytics solution on top of Kafka message broker for easier and faster development of more advanced processing
- Run trained machine learning and AI models on data coming from Kafka
Read the full article: Why using Crosser and Kafka together makes sense →
Crosser Streaming Analytics
Smart. Lightweight. Easy-to-use.
Crosser’s solution can be used to implement many streaming use cases, like the ones above, without requiring an external platform like Kafka. There are many advantages with this:
All-in-one platform - Connectivity and Processing. Extensive library of pre-built connectors and processing modules, complement with your own custom modules.
Low-code design - Build ‘visual’ pipelines by combining modules. Interactive testing from within the design tool.
Centralized management, distributed processing - Manage all use cases centrally from our Control Center (hosted by Crosser or by you). Deploy a single Docker container for on-premise processing or host your pipelines on Crosser.
Low complexity - Crosser’s simple architecture translates into reduced cost for both infrastructure and operations.
As outlined above, Crosser is not a one-to-one replacement for Kafka, even though there are some overlaps, these systems address different problems. Kafka is a message broker with persistence, while Crosser’s focus is on stream processing and connectivity. You can do some stream processing with Kafka using KSQL, but for many use cases separate streaming applications must be used. In the same way Crosser has a message broker built in and some persistence, but is not designed to address the type of use cases that Kafka is designed for.
What we have tried to show with this article is that there are use cases which can be implemented in a much easier and cheaper way by choosing other technology. For some use cases it might also make sense to use Crosser in combination with Kafka, see here.
Learn more about the Crosser Platform for Intelligent Data Pipelines & Automations here →
Look into the rich library or connectors here →
|Resource Type||Kafka Min. Requirement||Crosser Min. Requirement|
|RAM Memory (GB)||245||0.6|
|Processing||read/write to storage||in-memory|
|Hardware cost||High||Very Low|
|Infrastructure complecity||High||Very Low|
Table: Comparing the system requirements for a redundant installation. Reference article here →
|Management||Local||Crosser Control Center|
|Development||Commands & Code||Low Code Flow Studio w. Drag-n-drop|
|IT team burden||High||Very Low|
Table: Comparing capabilities for management & development. Reference article here →