Tuesday, November 05, 2019

Cloud Native Event Message Transformation

When designing a cloud native architecture on an enterprise scale it is most likely that you cannot start with a green-field situation. In all reality it is very likely that the enterprise landscape has grown in an organically manner over the years and the “legacy” systems are a given for a longer period of time.

Even though building a strategy and an architecture for a complete green-field landscape is much easier than a transformation it is a very common reality and you are more likely to see a transformation than seeing a complete green-field.

Transform to event driven
Within enterprises there is a need to drive the standing architecture more to a real-time and event-based architecture principle. Different types of event driven architecture exist, such as; event carried state transfer, event sourcing and others. The type of event driven architecture is not a part of this post.

When moving to an event driven architecture the main change, from a high-level perspective, is that previously isolated systems will start to generate events and publish them to a central location where other systems can subscribe to the events and act upon them.

A commonly seen implementation is the use of Apache Kafka as a central “event hub” where (sub-)systems publish events and other (sub-)systems are subscribed to a specific topic containing events from N number of (sub-)systems.

The need for a single message format
Take the situation where you move from a point to point integration model between different systems towards a model where all systems publish events to a central location solidifies the need for a more standardized message format.

When developing point to point integration the fact that each (sub-)systems has a different or slightly different message format is commonly accepted. When moving to an event driven architecture the need for one single message format becomes crucial.

As an example; an enterprise might have multiple local sales systems, all used in their own region to register news sales. Due to the nature in which the enterprise has grown, both from an organizational as well as a technology point of view, each region has a different type of sales system.

When the need arises to integrate all local sales systems with a central shipping system you can apply an event driven architecture. All sales orders who change their status to “ready to ship” will have to publish an event so that it can be picked up by the central shipping system and in the future also by other central systems.

As we see a many to one relationship, we will have to ensure that all events from all different sales systems are the same to ensure we do not have to do any transformation and customization in the central system. In addition, every other system that will subscribe in the future to these events will only have to “understand” one single message format without having the need to understand each individual sales system.

Event transformation in the pre-publish state
As a good practice, an enterprise architecture statement could be that each sub-system can have their own data and message format within the boundaries of that specific sub-system. All external connections to other sub-systems or systems need to comply with the enterprise wide standards.

In other words, a sub-system has total freedom to work with its own (proprietary) formats, however, when crossing the borders of the sub-system it has to comply with standard rules. When a sub-system generates an event, it can only publish it when it is compliant with the enterprise standards.


This means, that for most (all) sub-systems (in our example local sales systems) there is the need for a transformation service which will take an event and transform this to an enterprise wide standard event and only after that is done publish it to the central message hub. By stating that the transformation is the responsibility of the individual system (team) and not the responsibility of a central team you ensure less dependencies between sub-systems and central systems.

By doing so you will be able to get a higher development velocity as each team can make their own transformation service and replacing one or more of the “legacy” sub-systems becomes easier due to the fact that this can be done in relative isolation as long as the new sub-system complies with the enterprise wide standards

Prevent inventing the wheel again and again 
The potential danger in stating each team is responsible for their own transformation function to transform application native event messages to an enterprise wide standard is that multiple teams will work on a component which in essence is largely the same for each sub-component.

In an ideal world, enterprise architecture teams will not only focus on defining the enterprise wide standard. In an ideal world the enterprise architecture team will also consult with all individual teams and design a common architecture for the transformation service which on one side is “pluggable” into each sub-system and on the other side is also highly configurable and custom extendible to ensure each sub-system specific requirement can be handled.

Providing the foundation 
In an ideal scenario the enterprise architecture team will provide the foundation for the enterprise wide event message format and an architecture for a common translation service. In addition to this a central team can be made responsible for the foundation of the translation service.

The difficult part of building a foundation for a translation service which needs to be pluggable into each individual sub-system is that you want it to as complete and rich as possible to prevent individual teams to create large custom parts and spend a lot of resources on development. On the other side, each individual sub-system will have unique requirements that should not be part of the foundation.

A solution to this is building a generic foundation service which can be easily re-configured for each specific case and where a sub-system team can add custom rules and logic. The below diagram provides a high-level insight in such a service.

This diagram is not showing a scaled model of the transformation function, however, when developing a generic foundation service for transformation it is good to consider the need to scale per individual component. This also requires that you take into account how each component communicates with the other components, REST or GRPC can be good candidates for inter-component based communication. As an example, if your logic requires a compute intensive transformation and you expect a high number of events it might be good to scale the actual transformer component to ensure multiple transformation can be done in parallel.

Also not included in the below diagram are all needed functionalities such a central configuration, logic of logging and metrics, retry logic, throttle logic and others.


We have to consider that a sub-system might or might not be able to send an event to the translation service or in some cases only be able to send an event notification however not the entire event message. For this reason, the example transformation service has both a “listener service” and a “pull and schedule service”

Listener service:
Able to receive an event message or an event notification. This could be a REST call, however, depending on your enterprise standards this could also be, for example, a GRPC call. In case of an event message the message will be forwarded to the “receiver”. In case of an event notification the notification will be send to the “pull and schedule service” for collecting the actual event message from the sub-system data store. This component needs to be fully configurable and contain a “framework” for adding custom rules and logic.

Pull and schedule service:
Able to pull information from the sub-system data store. The service should be able to support the most common ways of retrieving information from a data store. Depending on your enterprise landscape this could be a direct database connection using JDBC/ODBC like connections, however this can also be a RST or GRPC like call. The pull can be initiated by a pre-defined schedule or a notification from the listener service. The retrieved message will be forwarded to the receiver. This component needs to be fully configurable and contain a “framework” for adding custom rules and logic.

Receiver:
The receiver is one of the most standard components and is the “gateway” to provide the information in the right manner to the transformer. The receiver is also responsible to message throttling and buffering towards the transformer. This also includes retries and, if required balancing over multiple transformers in case there are more than one transformer components in the transformation function

Transfomer:
The transformer is one of the most customizable and extendible components in the transform function. It will receive messages from the receiver and need to apply the custom logic to it to transform it from a sub-system native format into an enterprise standard format. When developing a generic foundation service, you will need to provide as much as possible freedom to extend this component with custom logic in a way that it will be upgradable in the future. This is the “hard” part of designing such a generic foundation service and requires a clearly and well defined extension framework. All transformed messages will be provided to the validator. 

Validator:
The validator component is a component that should not be customizable. The intention of this component is to receive a message from the transformer and validate it against the rules for an enterprise standard message. Here it is important that the validator component is able to support multiple versions of the enterprise standard message as they might go through an evolution. The validator should, in an ideal scenario, be able to “fetch” the standards from a central repository and should not have the standards as a hard-coded part of the validator function. 

Publisher:
The publisher will only receive validated messages in a correct enterprise standard format. The role of the publisher is to publish the validated message to a central event hub. This component should carry all the logic need for the ensured delivery of the message to the correct event hub. Depending on your logic you might have a production event hub and others and there might be a need to differentiate between them in the publisher component. In an ideal situation all technical configuration is retrieved from a central location, this provides the benefit that all changes on, for example IP addresses and number of nodes for the event hub, will be automatically picked up by the publisher component without the need to manually do changes on each transformation function. 

Conclusion
Designing and developing a generic foundation service might be a challenge on some parts the benefits are directly visible. When done correctly all teams responsible for a sub-system will be able to use a generic foundation service while at the same time have all the freedom to implement the specific logic needed for their individual system. When designed correctly the transform function will be a highly reliable a resilient solution which will ensure a more standardized and unified way of publishing events in the wider enterprise landscape.