Tuesday, November 05, 2019

Cloud Native Event Message Transformation

When designing a cloud native architecture on an enterprise scale it is most likely that you cannot start with a green-field situation. In all reality it is very likely that the enterprise landscape has grown in an organically manner over the years and the “legacy” systems are a given for a longer period of time.

Even though building a strategy and an architecture for a complete green-field landscape is much easier than a transformation it is a very common reality and you are more likely to see a transformation than seeing a complete green-field.

Transform to event driven
Within enterprises there is a need to drive the standing architecture more to a real-time and event-based architecture principle. Different types of event driven architecture exist, such as; event carried state transfer, event sourcing and others. The type of event driven architecture is not a part of this post.

When moving to an event driven architecture the main change, from a high-level perspective, is that previously isolated systems will start to generate events and publish them to a central location where other systems can subscribe to the events and act upon them.

A commonly seen implementation is the use of Apache Kafka as a central “event hub” where (sub-)systems publish events and other (sub-)systems are subscribed to a specific topic containing events from N number of (sub-)systems.

The need for a single message format
Take the situation where you move from a point to point integration model between different systems towards a model where all systems publish events to a central location solidifies the need for a more standardized message format.

When developing point to point integration the fact that each (sub-)systems has a different or slightly different message format is commonly accepted. When moving to an event driven architecture the need for one single message format becomes crucial.

As an example; an enterprise might have multiple local sales systems, all used in their own region to register news sales. Due to the nature in which the enterprise has grown, both from an organizational as well as a technology point of view, each region has a different type of sales system.

When the need arises to integrate all local sales systems with a central shipping system you can apply an event driven architecture. All sales orders who change their status to “ready to ship” will have to publish an event so that it can be picked up by the central shipping system and in the future also by other central systems.

As we see a many to one relationship, we will have to ensure that all events from all different sales systems are the same to ensure we do not have to do any transformation and customization in the central system. In addition, every other system that will subscribe in the future to these events will only have to “understand” one single message format without having the need to understand each individual sales system.

Event transformation in the pre-publish state
As a good practice, an enterprise architecture statement could be that each sub-system can have their own data and message format within the boundaries of that specific sub-system. All external connections to other sub-systems or systems need to comply with the enterprise wide standards.

In other words, a sub-system has total freedom to work with its own (proprietary) formats, however, when crossing the borders of the sub-system it has to comply with standard rules. When a sub-system generates an event, it can only publish it when it is compliant with the enterprise standards.


This means, that for most (all) sub-systems (in our example local sales systems) there is the need for a transformation service which will take an event and transform this to an enterprise wide standard event and only after that is done publish it to the central message hub. By stating that the transformation is the responsibility of the individual system (team) and not the responsibility of a central team you ensure less dependencies between sub-systems and central systems.

By doing so you will be able to get a higher development velocity as each team can make their own transformation service and replacing one or more of the “legacy” sub-systems becomes easier due to the fact that this can be done in relative isolation as long as the new sub-system complies with the enterprise wide standards

Prevent inventing the wheel again and again 
The potential danger in stating each team is responsible for their own transformation function to transform application native event messages to an enterprise wide standard is that multiple teams will work on a component which in essence is largely the same for each sub-component.

In an ideal world, enterprise architecture teams will not only focus on defining the enterprise wide standard. In an ideal world the enterprise architecture team will also consult with all individual teams and design a common architecture for the transformation service which on one side is “pluggable” into each sub-system and on the other side is also highly configurable and custom extendible to ensure each sub-system specific requirement can be handled.

Providing the foundation 
In an ideal scenario the enterprise architecture team will provide the foundation for the enterprise wide event message format and an architecture for a common translation service. In addition to this a central team can be made responsible for the foundation of the translation service.

The difficult part of building a foundation for a translation service which needs to be pluggable into each individual sub-system is that you want it to as complete and rich as possible to prevent individual teams to create large custom parts and spend a lot of resources on development. On the other side, each individual sub-system will have unique requirements that should not be part of the foundation.

A solution to this is building a generic foundation service which can be easily re-configured for each specific case and where a sub-system team can add custom rules and logic. The below diagram provides a high-level insight in such a service.

This diagram is not showing a scaled model of the transformation function, however, when developing a generic foundation service for transformation it is good to consider the need to scale per individual component. This also requires that you take into account how each component communicates with the other components, REST or GRPC can be good candidates for inter-component based communication. As an example, if your logic requires a compute intensive transformation and you expect a high number of events it might be good to scale the actual transformer component to ensure multiple transformation can be done in parallel.

Also not included in the below diagram are all needed functionalities such a central configuration, logic of logging and metrics, retry logic, throttle logic and others.


We have to consider that a sub-system might or might not be able to send an event to the translation service or in some cases only be able to send an event notification however not the entire event message. For this reason, the example transformation service has both a “listener service” and a “pull and schedule service”

Listener service:
Able to receive an event message or an event notification. This could be a REST call, however, depending on your enterprise standards this could also be, for example, a GRPC call. In case of an event message the message will be forwarded to the “receiver”. In case of an event notification the notification will be send to the “pull and schedule service” for collecting the actual event message from the sub-system data store. This component needs to be fully configurable and contain a “framework” for adding custom rules and logic.

Pull and schedule service:
Able to pull information from the sub-system data store. The service should be able to support the most common ways of retrieving information from a data store. Depending on your enterprise landscape this could be a direct database connection using JDBC/ODBC like connections, however this can also be a RST or GRPC like call. The pull can be initiated by a pre-defined schedule or a notification from the listener service. The retrieved message will be forwarded to the receiver. This component needs to be fully configurable and contain a “framework” for adding custom rules and logic.

Receiver:
The receiver is one of the most standard components and is the “gateway” to provide the information in the right manner to the transformer. The receiver is also responsible to message throttling and buffering towards the transformer. This also includes retries and, if required balancing over multiple transformers in case there are more than one transformer components in the transformation function

Transfomer:
The transformer is one of the most customizable and extendible components in the transform function. It will receive messages from the receiver and need to apply the custom logic to it to transform it from a sub-system native format into an enterprise standard format. When developing a generic foundation service, you will need to provide as much as possible freedom to extend this component with custom logic in a way that it will be upgradable in the future. This is the “hard” part of designing such a generic foundation service and requires a clearly and well defined extension framework. All transformed messages will be provided to the validator. 

Validator:
The validator component is a component that should not be customizable. The intention of this component is to receive a message from the transformer and validate it against the rules for an enterprise standard message. Here it is important that the validator component is able to support multiple versions of the enterprise standard message as they might go through an evolution. The validator should, in an ideal scenario, be able to “fetch” the standards from a central repository and should not have the standards as a hard-coded part of the validator function. 

Publisher:
The publisher will only receive validated messages in a correct enterprise standard format. The role of the publisher is to publish the validated message to a central event hub. This component should carry all the logic need for the ensured delivery of the message to the correct event hub. Depending on your logic you might have a production event hub and others and there might be a need to differentiate between them in the publisher component. In an ideal situation all technical configuration is retrieved from a central location, this provides the benefit that all changes on, for example IP addresses and number of nodes for the event hub, will be automatically picked up by the publisher component without the need to manually do changes on each transformation function. 

Conclusion
Designing and developing a generic foundation service might be a challenge on some parts the benefits are directly visible. When done correctly all teams responsible for a sub-system will be able to use a generic foundation service while at the same time have all the freedom to implement the specific logic needed for their individual system. When designed correctly the transform function will be a highly reliable a resilient solution which will ensure a more standardized and unified way of publishing events in the wider enterprise landscape.



Thursday, October 31, 2019

Decomposing the last monolith - The UI-Monolith with web components

Enterprises tend to move away from the monolith based architectures and decompose existing solutions in smaller services. The use of functions and microservices is a common implementation model at this moment. However, a common observation is that enterprises tend to build (maintain) one last monolith even when adopting a modern architecture. In many of the microservice based applications the UI tends to be the last monolith.

Where all backend services are broken down into smaller pieces the frontend services tend to stay as one single monolith. In this model multiple teams are responsible for building and operating all their own individual (micro)service. Another team, the frontend team is responsible for consuming those services, potentially as REST services and create a appealing user experience with it.

The problem with this is that there is a tight coupling, and with this a dependency, between the backend teams and the frontend team. The below image, drawn by Johannes Giani shows the common model as well as the alternative where the frontend is also broken into much smaller pieces.


Micro-UI and domain driven architecture
breaking the UI-monolith into UI fragments will allow for full decoupling between teams and will push the UI responsibility into the team responsible for the backend service. This makes that you can achieve a much higher level of domain driven architecture.

As an example, the team responsible for the shopping-cart, originally responsible for the backend services, can now also take the responsibility of the front-end services. There will no longer be a direct shared, or delegated responsibility, with the UI team.

Technical components
decomposing the UI-monolith into UI-fragments can be done in multiple ways, all having their pros and cons. The most promising one is the use of web components. Web components are a set of web platform APIs that allow you to create new custom, reusable, encapsulated HTML tags to use in web pages and web apps. Custom components and widgets build on the Web Component standards, will work across modern browsers, and can be used with any JavaScript library or framework that works with HTML.

Web Components allow you to basically include rich, custom build, features by creating your own "HTML tags". While having the freedom of custom build tags the entire "framework" of web components is a W3C standard and the entire web component idea is driven by Google who is pushing for the general adoption of web components in the browsers.

The architecture view
From an architectural view you have the more traditional and common deployment view on how microservices are used to decompose a monolith in the backend while still maintaining a monolith in the UI layer. This is shown in the below architecture diagram:


If we look at the use of web components you will have an architecture drawing as shown below.


Here you can observe a number of things;

  • The UI elements are pushed down into the responsibility of the domain/service. This means that the team responsible for the backend services is now also responsible for the development and operation of the web component. This has become a complete and isolated service no longer depending on the UI team to provide the end-2-end service

  • A new layer is introduced, the stitching layer (for the lack of a better word). The stitching layer is still developed and operated by a central UI team. The responsibility of this team is to ensure that all the (web) components are jointly integrated in a single UI. 


On how the individual web components are being served and presented to the outside world is a discussion in itself and is primarily driven by the architecture views on routing and is not discussed in this specific post.

Conclusion
When decomposing a monolith, or building a greenfield microservice based application, it is a good thing to understand that the UI in itself can become a "new" monolith. Using technologies like web components can provide a solution to this and support a more clear domain driven architecture and provide less dependencies between teams while creating a higher level of end-2-end responsibility within an individual team



Wednesday, October 30, 2019

Add Oracle Linux 8 Vagrant box

When using Oracle Linux (a lot) for local development you most likely want to use Vagrant to quickly deploy new environments locally. Oracle originally supports the use of vagrant by providing Oracle Linux vagrant boxes. In line with the "traditions" Oracle provides a Vagrant box for the latest version, version Oracle Linux 8. Below are instructions to add the an Oracle Linux 8 box to your local machine.

Firstly you will have to check (can check) which boxes are available. The below commands show this:

louwersj-macbook:OL8Test louwersj$ vagrant box list
ol69     (virtualbox, 0)
ol73     (virtualbox, 0)
ol74     (virtualbox, 0)
oracle69 (virtualbox, 0)

As you can see in the above example there are a number of boxes already available on my local machine. To add Oracle Linux 8 you can use the below example:

louwersj-macbook:OL8Test louwersj$ vagrant box add --name ol80 https://yum.oracle.com/boxes/oraclelinux/ol80/ol80.box
==> box: Box file was not detected as metadata. Adding it directly...
==> box: Adding box 'ol80' (v0) for provider: 
    box: Downloading: https://yum.oracle.com/boxes/oraclelinux/ol80/ol80.box
==> box: Successfully added box 'ol80' (v0) for 'virtualbox'!

You will now have a new Oracle Linux 8 Vagrant box available for use. 

Wednesday, September 25, 2019

Creating a training set table for machine learning in Oracle Database

When building a machine learning model, you will require a learning / training set of data. To enable you to quickly create a set of training data you can make use of the SQL SAMPLE clause in a select statement. Using the SAMPLE clause you instruct the database to select from a random sample of data from the table, rather than from the entire table. This provides a very simple way of getting the random collection of records you require for training your model.

Situation
You do have a large (or small) table of data in your database, in our case we use an Oracle Autonomous Data Warehouse and intend to use a part of this as training data while you want to use the remaining part for testing your model.


Assume we have a table named louwersj.loans which we want to use for both our training data as well as our test data. A simple way of splitting it in a 70/30 fashion is to use the below commands:

Step 1:
Check the total number of records in the table:

SELECT
    COUNT(1)
FROM
    louwersj.loans;

In our case this will give us the result of 614 as we have 614 records in our dataset

Step 2:
Take a 70% of the total and use this to create a table, this is where we will use the sample clause in the SQL statement to ensure we get a random 70% of the records. By issuing the below command the table loans_traindata will be exactly the same as the original table loans with only a random subset of the original table.

CREATE TABLE louwersj.loans_traindata
    AS
        SELECT
            *
        FROM
            louwersj.loans SAMPLE ( 70 ) SEED ( 1 )


To validate if this gives us what we wanted we can do another count to see if we indeed get a training set which contains 70% of the original table, the below command will return 455.

SELECT
    COUNT(1)
FROM
    louwersj.loans_traindata

Step 3:
Next to the train data we need to have some test data to validate the working of our machine learning model after we have trained it. For this we can use the remaining 30% of the data from the original table. With the following command we create a new table which will contain exactly that:

CREATE TABLE louwersj.loans_testdata
    AS
        SELECT
            *
        FROM
            louwersj.loans
        MINUS
        SELECT
            *
        FROM
            louwersj.loans_traindata

Conclusion
Using the SAMPLE clause as part of a CREATE TABLE AS statement in the Oracle Database helps you to speed up creating a good training set and test for your machine learning model. No needing to extract data from the database and re-insert the data, you can do all within the database without any actual moving of the data. 

Tuesday, September 24, 2019

Groovy - AST Transformation

Groovy is a powerful language that gives the opportunity to its users to plugin into the compilation process to create what we call AST transformations, ie. the ability to customize the Abstract Syntax Tree representing your programs before the compiler walks this tree to generate Java bytecode.

When writing a lot of Groovy code, especially when you write it as part of a wider team it will be very beneficial to take some time to look into the inner workings of AST. As AST transformation can be build yourself to extend how Groovy is working it can help in ensuring that your code will be much more similar between different developers than it would be without using AST transformation.

When you are new to Groovy AST transformation the below talk can be a good starting point.

Monday, September 23, 2019

Google Cloud Function call Oracle ADW Rest end-point

When running an Oracle Autonomous Database, for example an Oracle Autonomous Data Warehouse (ADW for short) it is very likely that multiple applications and solutions do want to have access to the data available in the ADW.  A common scenario is that, a department in the enterprise has been developing an application in isolation and at one point in time requires some additional data from the data warehouse. In this case the data warehouse is the Oracle Autonomous Data Warehouse.

Call Oracle ADW from Google Cloud Functions
When developing an application in the Google Cloud you can make use of Google Cloud Functions. As Google Cloud Functions support development in Python you can write a generic function to retrieve, for example, customer details based upon a customer ID. We have deployed the Oracle ADW restfull data service in a previous blogpost. In this blogpost we want to call it with a GET request from the Google Cloud.

One generic function
When building an application using Google Cloud Functions which at several points need to interact with data in the Oracle ADW you do not want to code these multiple times. A more logical way of doing things is building one function to interact with Oracle ADW to obtain the needed data.


Every time your application calls the Google Cloud Function, with the propper JSON payload which contains a valid customer ID the Google Cloud function will call the Oracle ORDS endpoint which we developed as part of Oracle ADW. The return message from Oracle ADW will be the return message from the Google Cloud Function.

By building this "interaction layer" developers will only have to build the interaction with the Oracle Cloud based Oracle ADW once and after that they can work within Google Cloud to complete their specific Google Cloud based application.

Deploy a Google Cloud Function for Oracle Database
Deploying a Google Cloud function for Oracle Database starts with the same steps as deploying any cloud function. In our case we build a Python based application. The below image showcases the initial creation of the function:



We indicate that we want to use Python 3.7 and that the function inside our code, which is the entrypoint for execution, is named getCustomer.

The code used is shown below. Do note; when developing a production solution you most likely want to add additional security and a lot more error handling than shown in this example. This is just a very (very very) not production ready example. Additionally, the full URL of the Oracle ADW has been substituted with XXXX

import urllib.request

def getCustomerResponse(requistedCustomerId):
    """
    :param    :return:    """
    baseUrl = "https://XXXX.oraclecloudapps.com/ords/louwersj/parties/b2b/customers/"
    fullUrl = baseUrl + requistedCustomerId
    operUrl = urllib.request.urlopen(fullUrl)

    if(operUrl.getcode()==200):
        data = operUrl.read()
    else:
        data("Error receiving data from ADW", operUrl.getcode())
    return data

def getCustomer(request):
    """
    :param request:    :return:    """
    requestJson = request.get_json(silent=True)
    requestArgs = request.args

    if requestJson and 'customer_id' in requestJson:
        customerId = requestJson['customer_id']
    elif requestArgs and 'customer_id' in requestArgs:
        customerId = requestArgs['customer_id']
    else:
        customerId = 'ERROR'
    if customerId == "ERROR":
        responseData = "No customer_id provided"    else:

        responseData = getCustomerResponse(customerId)
    return responseData

Testing the function
Upon deployment you can test the google cloud function using the test functionality in the Google UI (or by calling it directly) from another location. If all is working you should receive a JSON style return message as shown in the below screenshot.



In the above screenshot the trigger event field contains our test JSON payload and the  function output contains a JSON response which originates for the Oracle ADW. 

Conclusion
When developing applications on multiple platforms, multiple clouds and multiple technologies and you require access to one central source of the truth you can use multiple technologies to connect to a centrally located Oracle Autonomous Data Warehouse. However, using a REST interface is in most cases a very simple and "fit for the job" kind of solution. 

When developing a solution like this it will require more strict error handling and it will require strict authentication and authorization however the base principle stands that hybrid multi-cloud applications can integrate with an Oracle Autonomous Data Warehouse in a very easy and cloud native manner. 

Create REST endpoint in Oracle Autonomous Database

Oracle provides, as part of the Oracle Cloud portfolio an Autonomous Database solution. The Autonomous Database is provided as an OLTP as well as a Data Warehouse deployment model. Without going into the technical details or the technical and operational benefits in this article we will focus on how to build REST interfaces in conjunction with oracle Autonomous Database. In this example we will use an Oracle Autonomous Data warehouse.

The example environment
For this example, we will have an Oracle Autonomous Data warehouse or ADW for short. As part of our example we will have a table called customers which will hold a generic structure of all our global customers and the parent / child relationship between customers in our table.

In the below screenshot you can see the table definition using the Oracle APEX object browser which is provisioned as part of the ADW deployment.



The example goal
The goal we will try to achieve in this example is providing a REST endpoint for applications to connect to and get some basic information about a customer as well as providing a REST endpoint which will enable an application to retrieve all subsidiaries from a given customer. All interactions are done based upon the customer ID which in our case is using a UUID.

Creating the first REST endpoint
The example is showing the entire creation of the REST endpoint by using the Oracle ADW APEX interface, however, this can also be achieved using any compatible SQL client and is not relying on the UI.

Creating a REST endpoint in Oracle ADW has to comply with a certain order of components. Restful Data Services require a module which can hold one or more templates (end points) and each template will hold one or more handlers. Handlers are responsible for handling the request for a certain request type, for example a POST or a GET request.

In our example we firstly create a module which in our case we name ADW.backend.parties with a base path called /parties/ .



When the module has been defined we can create the ORDS template, in this example we create a template for b2b/customers/:id in this annotation the intention is that :id will be substituted with a customer ID. As we have a component with a URI /parties the full path will become, as an example, /parties/b2b/customers/{some-customer-id}.



Having the template without any handlers to handle an incoming request will not provide any added functionality. As we want users to be able to get information based upon a customer id we will create a GET request handler which will be triggered on any GET request being executed against the end-point. The handler is also the location where the actual PL/SQL code will be defined to be executed when a GET request is being send. The below screenshot shows this.



Trigger the first REST endpoint. 
Having the first REST endpoint fully deployed we can test the endpoint by trying to execute a GET request against if from an external location. As this is a GET request we can do this from a browser, however you could use anything from cURL up until customer written Python code to call the endpoint with a GET request.

When providing the endpoint in a browser we get the below response:


For readability purposes we can format the message so it becomes more readable for humans.


Building the subsidiary endpoint
As stated we would build, as part of this example, also a way to lookup all subsidiaries of a given company. The previous endpoint provided the details of one company with the mention of the ID of the parent company. However, in some cases someone would like to retrieve a list of subsidiaries.

We already have an endpoint /parties/b2b/customers/{some-customer-id} and we can expand that with /subsidiaries which would make the endpoint /parties/b2b/customers/{some-customer-id}/subsidiaries

To achieve this we build a second ORDS endpoint specifically for this and we create a GET request handler for this newly created ORDS endpoint as well. The below screenshot shows the creation of the ORDS template to provide the required endpoint:


When the ORDS template is created we can create the GET handler. The GET handler is shown in the screenshot below and reacts to the :id which is part of the URI.


We now have created our second endpoint which will provide a JSON response containing all the subsidiaries for a given customer ID.  In case we call the endpoint and format the response we will see a message as shown below:


Conclusion
When you are using an Oracle Autonomous Database you automatically get a very simple way of building RESTfull data services in the form of REST endpoints. Even though the above example only scratches the surface of the possibilities and much more complex and much more secure implementations can be build it showcases the ease of use and showcases how quickly you can build a comprehensive REST interface while only leveraging the Oracle Cloud based solution in the form of an Oracle Autonomous Database.

Tuesday, May 07, 2019

Fn Project - quick install guide

The Fn project is an open source project originally started within Oracle as part of the drive for more enterprise open source solutions. The Fn project is an open source serverless compute platform. With Fn, you deploy your functions to an Fn server which automatically executes and manages them. Each function is executed in a Docker container enabling the platform to provide broad support for development languages including Java, JavaScript (Node), Go, Python, Ruby, and others. The Fn client and server are simple and elegant. You can run the server locally on your laptop, or on a server in your data center or in the cloud. The Fn project has a strong enterprise focus with emphasis on security, scalability, and observability.

With companies moving more and more to cloud native solutions and requiring more and more solutions that will not tie them into a single cloud platform you can observe a move away from vendor specific and proprietary serverless solutions.

The Fn project is a fully open source solution to build cloud native serverless solutions and you will be able to run it at every location. Oracle provides a managed service which allows you to consume Fn based serverless while at the same time you can run it within your own datacenter and on every cloud provider of your choice.

To get started quickly with the Fn project the fastest way is to install it on a virtual machine to experience it hands-on. The below presentation will give you a quick guide on how to get started with the Fn project on a local (virtual) machine.



More information can be found on the Fn project website and on other locations such as the Fn Slack channel and github

Monday, March 18, 2019

Oracle Linux & Cloud Automation - clean yum data with ansible

Building infrastructure, configuring servers and deploying software has been a manual task for many years. With the rise of cloud computing, virtual machines, containers and CI/CD processes we see more and more that those manual tasks are being diminished and fully automated. When building your infrastructure in the Oracle Cloud you can make use of a wide set of automation tools to make everything software defined and automated. Solutions like Ansible and Terraform provide some of the building blocks which can help you automate all the previously manual tasks. Ensuring you leverage solutions like this will increase speed and agility supported by Cloud computing.

In this  serie "Oracle Linux & Cloud Automation" we will go into more detail on how several solutions you can leverage to automate large parts of your IT footprint lifecycle. Please use the tagged label to find all posts on this subject.

Ansible
Ansible is an open-source software provisioning, configuration management, and application deployment tool. It runs on many Unix-like systems, and can configure both Unix-like systems as well as Microsoft Windows. It includes its own declarative language to describe system configuration.

Cleaning yum
Ansible has the easy possibility to install (remove or update) packages using yum. As a good practice you should clean your yum data from cached data so it is not taking any unneeded space on your local file system. Ansible is providing a module as a wrapper around the yum command, this means that you can use Ansible instead of directly interact with the yum command itself.

Missing from the module is a command to clean the cache. You can do so by putting the following in the Ansible playbook.

# clean all the yuym data.
  - name: Clean all yum data
    command: yum clean all
    args:
      warn: yes 

As you can see this is not using the yum module, it is using the command module instead. This is also the reason that we have the warn flag, currently set to yes however advisable to set to no. The warn flag will print the below message;

 [WARNING]: Consider using the yum module rather than running yum.  If you need to use 
command because yum is insufficient you can add warn=False to this command task or set 
command_warnings=False in ansible.cfg to get rid of this message.

The reason for the warning is that Ansible expects that all yum related commands are being done via the yum module and not the command module. As the yum module is missing the "clean all" option we have to do this via the command module and use the full command.

Oracle Linux tested
The below playbook is tested on an Oracle Linux 7 instance with Ansible 2.7.7.

show the playbook:
[root@localhost ansi]# cat webserver_playbook.yml 
- hosts: localhost
  tasks:
  - name: Ensure the latest yum-utils python package is available on the server
    yum:
      name: yum-utils
      state: latest
  - name: Ensure the latest python package is available on the server
    yum:
      name: python
      state: latest
  - name: Ensure the latest python-pip package is availabel on the server
    yum:
      name: python-pip
      state: latest
# clean all the yuym data.
  - name: Clean all yum data
    command: yum clean all
    args:
      warn: yes 

Show the output:
[root@localhost ansi]# ansible-playbook webserver_playbook.yml 
 [WARNING]: provided hosts list is empty, only localhost is available. Note
that the implicit localhost does not match 'all'


PLAY [localhost] ***************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [Ensure the latest yum-utils python package is available on the server] ***
ok: [localhost]

TASK [Ensure the latest python package is available on the server] *************
ok: [localhost]

TASK [Ensure the latest python-pip package is availabel on the server] *********
ok: [localhost]

TASK [Clean all yum data] ******************************************************
 [WARNING]: Consider using the yum module rather than running yum.  If you need
to use command because yum is insufficient you can add warn=False to this
command task or set command_warnings=False in ansible.cfg to get rid of this
message.

changed: [localhost]

PLAY RECAP *********************************************************************
localhost                  : ok=5    changed=1    unreachable=0    failed=0   

[root@localhost ansi]# 

Do also note that you have "changed=1" always. This is due to the fact that you always run the yum clean all. Even though this is not a config change on your system it is detected as a change. Technically it is a change as you clean the yum data, even in the cases that you do not download a package you still clean the meta-data.

Conclusion
Even though no native support for the clean is available in the Ansible yum module you can ensure that your system is not holding unnecessary data on the filesystem as shown in the example above. 

Friday, March 15, 2019

dash - changing the favicon

Dash is a Python framework for building analytical web applications. It can be used to very quickly develop small applications capable of running small analytical visualizations. As it is developers in Python you have a natural fit with solutions like Panda and the like.

When getting started with Dash, developed by Plotly, you might run into the following question; how do I change the favicon to show the one I want and not the one shipped by default.



The answer is, stop trying to code the solution. The only way (currently) is to create a directory named assets in the root of your project and add the desired favicon in to this location. This should result in the desired favicon showing.


Tuesday, March 05, 2019

Python - machine learning and clustering

Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups. In simple words, the aim is to segregate groups with similar traits and assign them into clusters.

Within machine learning we place clustering under unsupervised learning, clustering is used for example in recommendation systems, targeted marketing and customer segmentation.

The below outline is a simple starting point showing a basic form of clustering on a relatively small dataset. The dataset we will use is displayed in the scatter chart below. The objective we have is to determine 3 clusters in the data shown in scatter chart. In this example case the data is just random data, the data can however represent virtually everything. The data could for example be customers, demographic data, sensor data-points or anything else.


If you look at the data humans are by default driven by Apophenia to try and see patterns. Apophenia has come to imply a universal human tendency to seek patterns in random information, such as gambling. However, even though the human mind will try to see a pattern this is far from correct in many cases. To make a true valid clustering we will need to actually base the clustering on math and not the feeling of the human mind. 

By leveraging Python code we can devide the data into 3 distinct clusters, the found clusters are shown below in different colors.


We can now see the different clusters that are within the data. Finding the members of the cluster is done based upon K-means clustering,  K-means is a clustering algorithm that aims to partition n observations into k clusters. The main steps are:


  • Initialisation – K initial “means” (centroids) are generated at random
  • Assignment – K clusters are created by associating each observation with the nearest centroid
  • Update – The centroid of the clusters becomes the new mean

The result is that after the updates yuu will end up with (in our case) 3 centroids and the datapoint which is assoicated with this centroid based upon the most optimal (smallest) distance to the centroid.


The above scatter chart shows the centroids which form the backbone of the clustering. Normallyt hey will be hidden as they do not form an actual datapoint from the dataset. As you can see we have now 3 clusters from the bigger dataset.

Examples of clustering can be found on my Github project containing machine learning examples. 

Wednesday, February 20, 2019

Kubernetes - Minikube start dashboard for a Web UI

For those developing solutions that should run under Kubernetes, running a local version of Kubernetes leveraging Minikube can make your life much more easy. One of the question some people do have is how to ensure they can make use of the Kubernetes Web UI.

Running the Kubernetes Web UI while working with Minikube is relatively easy and you can start the Web UI with a single command of the minikube CLI. The below command showcases how to start the Web UI and also have your local browser open automatically to guide you to the correct URL.

Johans-MacBook-Pro:log jlouwers$ minikube dashboard
🔌  Enabling dashboard ...
🤔  Verifying dashboard health ...
🚀  Launching proxy ...
🤔  Verifying proxy health ...
🎉  Opening http://127.0.0.1:60438/api/v1/namespaces/kube-system/services/http:kubernetes-dashboard:/proxy/ in your default browser...

As you can see from the above example the only command needed is 'minikube dashboard'.


The above screenshot shows you the Kubernetes Web UI in the browser as started by the minikube command.

Friday, February 15, 2019

Python Pandas – consume Oracle Rest API data

When working with Pandas the most common know way to get data into a pandas Dataframe is to read a local csv file into the dataframe using a read_csv() operation. In many cases the data which is encapsulated within the csv file originally came from a database. To get from a database to a csv file on a machine where your Python code is running includes running a query, exporting the results to a csv file and transporting the csv file to a location where the Python code can read it and transform it into a pandas DataFrame.

When looking a modern systems we see that more and more persistent data stores provide REST APIs to expose data. Oracle has ORDS (Oracle Rest Data Services) which provide an easy way to build REST API endpoint as part of your Oracle Database.

Instead of extracting the data from the database, build a csv file, transport the csv file so you are able to consume it you can also instruct your python code to directly interact with the ORDS REST endpoint and read the JSON file directly.

The below JSON structure is an example of a very simple ORDS endpoint response message. From this message we are, in this example, only interested in the items it returns and we do want to have that in our pandas DataFrame.

{
 "items": [{
  "empno": 7369,
  "ename": "SMITH",
  "job": "CLERK",
  "mgr": 7902,
  "hiredate": "1980-12-17T00:00:00Z",
  "sal": 800,
  "comm": null,
  "deptno": 20
 }, {
  "empno": 7499,
  "ename": "ALLEN",
  "job": "SALESMAN",
  "mgr": 7698,
  "hiredate": "1981-02-20T00:00:00Z",
  "sal": 1600,
  "comm": 300,
  "deptno": 30
 }, {
  "empno": 7521,
  "ename": "WARD",
  "job": "SALESMAN",
  "mgr": 7698,
  "hiredate": "1981-02-22T00:00:00Z",
  "sal": 1250,
  "comm": 500,
  "deptno": 30
 }, {
  "empno": 7566,
  "ename": "JONES",
  "job": "MANAGER",
  "mgr": 7839,
  "hiredate": "1981-04-02T00:00:00Z",
  "sal": 2975,
  "comm": null,
  "deptno": 20
 }, {
  "empno": 7654,
  "ename": "MARTIN",
  "job": "SALESMAN",
  "mgr": 7698,
  "hiredate": "1981-09-28T00:00:00Z",
  "sal": 1250,
  "comm": 1400,
  "deptno": 30
 }, {
  "empno": 7698,
  "ename": "BLAKE",
  "job": "MANAGER",
  "mgr": 7839,
  "hiredate": "1981-05-01T00:00:00Z",
  "sal": 2850,
  "comm": null,
  "deptno": 30
 }, {
  "empno": 7782,
  "ename": "CLARK",
  "job": "MANAGER",
  "mgr": 7839,
  "hiredate": "1981-06-09T00:00:00Z",
  "sal": 2450,
  "comm": null,
  "deptno": 10
 }],
 "hasMore": true,
 "limit": 7,
 "offset": 0,
 "count": 7,
 "links": [{
  "rel": "self",
  "href": "http://192.168.33.10:8080/ords/pandas_test/test/employees"
 }, {
  "rel": "describedby",
  "href": "http://192.168.33.10:8080/ords/pandas_test/metadata-catalog/test/item"
 }, {
  "rel": "first",
  "href": "http://192.168.33.10:8080/ords/pandas_test/test/employees"
 }, {
  "rel": "next",
  "href": "http://192.168.33.10:8080/ords/pandas_test/test/employees?offset=7"
 }]
}

The below code shows how to fetch the data with Python from the ORDS endpoint and normalize the JSON in a way that we will only have the information about items in our dataframe.
import json
from urllib2 import urlopen
from pandas.io.json import json_normalize

# Fetch the data from the remote ORDS endpoint
apiResponse = urlopen("http://192.168.33.10:8080/ords/pandas_test/test/employees")
apiResponseFile = apiResponse.read().decode('utf-8', 'replace')

# load the JSON data we fetched from the ORDS endpoint into a dict
jsonData = json.loads(apiResponseFile)

# load the dict containing the JSON data into a DataFrame by using json_normalized.
# do note we only use 'items'
df = json_normalize(jsonData['items'])

# show the evidence we received the data from the ORDS endpoint.
print (df.head())
Interacting with a ORDS endpoint to retrieve the data out of the Oracle Database can be in many cases be much more efficient than taking the more traditional csv route. Options to use a direct connection to the database and use SQL statements will be for another example post. You can see the code used above also in the machine learning examples project on Github.

Wednesday, February 13, 2019

resolved - cx_Oracle.DatabaseError: ORA-24454: client host name is not set

When developing Python code in combiantion with cx_Oracle on a Mac you might run into some issues, especially when configuring your mac for the first time. One of the strange things I encountered was the ORA-24454 error when trying to connect to an Oracle database from my MacBook. ORA-24454 states that the client host name is not set.

When looking into the issue it turns out that the combination of the Oracle instant client and cx_Oracle will look into /etc/hosts on a Mac to find the client hostname to use it when initiating the connection from a mac to the database.

resolve the issue
A small disclaimer, this worked for me, I do expect it will work for other Mac users as well. First you have to find the actual hostname of your system, you can do so by executing one of the following commands;

Johans-MacBook-Pro:~ root# hostname 
Johans-MacBook-Pro.local

or you can run;

Johans-MacBook-Pro:~ root# python -c 'import socket; print(socket.gethostname());'
Johans-MacBook-Pro.local

Knowing the actual hostname of your machine you can now set it in /ect/hosts. This should make it look like something like the one below;

127.0.0.1 localhost
127.0.0.1 Johans-MacBook-Pro.local

When set this should ensure you do not longer encounter the cx_Oracle.DatabaseError: ORA-24454: client host name is not set error when running your Python code.

Tuesday, February 12, 2019

Python pandas – merge dataframes

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. When working with data you can load data (from multiple type of sources) into a designated DataFrame which will hold the data for future actions. A DataFrame is a Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In many cases the operations you want to do on data require data from more than one single data source. In those cases you have the option to merge (concatenate, join) multiple DataFrames into a single DataFrame for the operations you intend. In the below example, we merge two sets of data (DataFrames) from the World Bank into a single dataset (DataFrame) in one of the most basic merge manners.

Used datasets
For those interested in the datasets, the original data is coming from data.worldbank.org, for this specific example I have modified the way the .csv file is provided originally. You can get the modified .csv files from my machine learning examples project located at github.

Example code
The example we show is relative simple and is shown in the diagram below, we load two datasets using Pandas read_csv() into their individual DataFrame. When both are loaded we merge the two DataFrames into a single (new) Dataframe using merge().


The below is an outline of the code example, you can get the code example, including the used datasets from my machine learning examples project at github.

import pandas as pd

df0 = pd.read_csv('../../data/dataset_4.csv', delimiter=";",)
print ('show the content of the first file via dataframe df0')
print (df0.head())

df1 = pd.read_csv('../../data/dataset_5.csv', delimiter=";",)
print ('show the content of the second file via dataframe df1')
print (df1.head())

df2 = pd.merge(df0, df1, on=['Country Code','Country Name'])
print ('show the content of merged dataframes as a single dataframe')
print (df2.head())

Monday, February 11, 2019

Secure Software Development - the importance of dependency manifest files

When developing code, in this specific example python code, one thing you want to make sure is that you do not develop vulnerabilites. Vulnerabilities can be introduced primarily in two ways; you create them or you include them. One way of providing an extra check that you do not include vulnerabilties in your application is making sure you handle the dependency manifest files in the right way.

A dependency manifest file makes sure you have all the components your application relies upon are in a central place. One of the advantages is that you can use this file to scan for known security issues in components you depend upon. It is very easy to do an import or include like statement and add additional functionality to your code. However, whatever you include might have a known bug or vulnerability in a specific version.

Creating a dependency manifest file in python
When developing Python code you can leverage pip to create a dependency manifest file, commonly named as requirments.txt . The below command shows how you can create a dependency manifest file

pip freeze > requirements.txt

if we look into the content of this file we will notice a structure like the one shown below which lists all the dependencies and the exact version.

altgraph==0.10.2
bdist-mpkg==0.5.0
bonjour-py==0.3
macholib==1.5.1
matplotlib==1.3.1
modulegraph==0.10.4
numpy==1.16.1
pandas==0.24.1
py2app==0.7.3
pyobjc-core==2.5.1
pyobjc-framework-Accounts==2.5.1
pyobjc-framework-AddressBook==2.5.1
pyobjc-framework-AppleScriptKit==2.5.1
pyobjc-framework-AppleScriptObjC==2.5.1
pyobjc-framework-Automator==2.5.1
pyobjc-framework-CFNetwork==2.5.1
pyobjc-framework-Cocoa==2.5.1
pyobjc-framework-Collaboration==2.5.1
pyobjc-framework-CoreData==2.5.1
pyobjc-framework-CoreLocation==2.5.1
pyobjc-framework-CoreText==2.5.1
pyobjc-framework-DictionaryServices==2.5.1
pyobjc-framework-EventKit==2.5.1
pyobjc-framework-ExceptionHandling==2.5.1
pyobjc-framework-FSEvents==2.5.1
pyobjc-framework-InputMethodKit==2.5.1
pyobjc-framework-InstallerPlugins==2.5.1
pyobjc-framework-InstantMessage==2.5.1
pyobjc-framework-LatentSemanticMapping==2.5.1
pyobjc-framework-LaunchServices==2.5.1
pyobjc-framework-Message==2.5.1
pyobjc-framework-OpenDirectory==2.5.1
pyobjc-framework-PreferencePanes==2.5.1
pyobjc-framework-PubSub==2.5.1
pyobjc-framework-QTKit==2.5.1
pyobjc-framework-Quartz==2.5.1
pyobjc-framework-ScreenSaver==2.5.1
pyobjc-framework-ScriptingBridge==2.5.1
pyobjc-framework-SearchKit==2.5.1
pyobjc-framework-ServiceManagement==2.5.1
pyobjc-framework-Social==2.5.1
pyobjc-framework-SyncServices==2.5.1
pyobjc-framework-SystemConfiguration==2.5.1
pyobjc-framework-WebKit==2.5.1
pyOpenSSL==0.13.1
pyparsing==2.0.1
python-dateutil==2.8.0
pytz==2013.7
scipy==0.13.0b1
six==1.12.0
xattr==0.6.4

Check for known security issues
One of the most simple ways to check for known security issues is checking your code in at github.com. As part of the service provided by Github you will get alerts, based upon dependency manifest file, which dependencies might have a known security issue. The below screenshot shows the result of uploading a Python dependency manifest file to github.


As it turns out, somewhere in the chain of dependencies some project still has a old version of a pyOpenSSL included which has a known security vulnerability. The beauty of this approach is you have an direct insight and you can correct this right away.

Sunday, February 10, 2019

Python Matplotlib - showing or hiding a legend in a plot


When working with Matplotlib of visualize your data there are situations that you want to show the legend and in some cases you want to hide the legend. Showing or hiding the legend is very simple, as long as you know how to do it, the below example showcases both showing and hiding the legend from your plot.

The code used in this example uses pandas and matplotlib to plot the data. The full example of this is part of my machine learning example repository on Github where you can find this specific code and more.

Plot with legend
The below image shows the plotted data with a legend. Having a legend is in some cases very good, however in some cases it might be very disturbing to your image. Personally I think keeping a plot very clean (without a legend) is the best way of presenting a plot in many cases.
The code used for this is shown below. As you can see we use legend=True

df.plot(kind='line',x='ds',y='y',ax=ax, legend=True)


Plot without legend
The below image shows the plotted data without a legend. Having a legend is in some cases very good, however in some cases it might be very disturbing to your image. Personally I think keeping a plot very clean (without a legend) is the best way of presenting a plot in many cases.

The code used for this is shown below. As you can see we use legend=False
df.plot(kind='line',x='ds',y='y',ax=ax, legend=False)

Thursday, January 31, 2019

machine learning - matplotlib error in matplotlib.backends import _macosx


When trying to visualize and plot data in Python you might work with Matplotlib. In case you are working on MacOS and you use a venv, in some cases you might run into the below error message:

RuntimeError: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends. If you are using (Ana)Conda please install python.app and replace the use of 'python' with 'pythonw'. See 'Working with Matplotlib on OSX' in the Matplotlib FAQ for more information.

The reason for this error is that Matplotlib is not able to find the correct backend. The most easy way to resolve this in a quick and dirty way is to add the following line to your code.

matplotlib.use('TkAgg')

This should remove (in most cases) the error and your code should be able to run correctly.