Wednesday, September 25, 2019

Creating a training set table for machine learning in Oracle Database

When building a machine learning model, you will require a learning / training set of data. To enable you to quickly create a set of training data you can make use of the SQL SAMPLE clause in a select statement. Using the SAMPLE clause you instruct the database to select from a random sample of data from the table, rather than from the entire table. This provides a very simple way of getting the random collection of records you require for training your model.

Situation
You do have a large (or small) table of data in your database, in our case we use an Oracle Autonomous Data Warehouse and intend to use a part of this as training data while you want to use the remaining part for testing your model.


Assume we have a table named louwersj.loans which we want to use for both our training data as well as our test data. A simple way of splitting it in a 70/30 fashion is to use the below commands:

Step 1:
Check the total number of records in the table:

SELECT
    COUNT(1)
FROM
    louwersj.loans;

In our case this will give us the result of 614 as we have 614 records in our dataset

Step 2:
Take a 70% of the total and use this to create a table, this is where we will use the sample clause in the SQL statement to ensure we get a random 70% of the records. By issuing the below command the table loans_traindata will be exactly the same as the original table loans with only a random subset of the original table.

CREATE TABLE louwersj.loans_traindata
    AS
        SELECT
            *
        FROM
            louwersj.loans SAMPLE ( 70 ) SEED ( 1 )


To validate if this gives us what we wanted we can do another count to see if we indeed get a training set which contains 70% of the original table, the below command will return 455.

SELECT
    COUNT(1)
FROM
    louwersj.loans_traindata

Step 3:
Next to the train data we need to have some test data to validate the working of our machine learning model after we have trained it. For this we can use the remaining 30% of the data from the original table. With the following command we create a new table which will contain exactly that:

CREATE TABLE louwersj.loans_testdata
    AS
        SELECT
            *
        FROM
            louwersj.loans
        MINUS
        SELECT
            *
        FROM
            louwersj.loans_traindata

Conclusion
Using the SAMPLE clause as part of a CREATE TABLE AS statement in the Oracle Database helps you to speed up creating a good training set and test for your machine learning model. No needing to extract data from the database and re-insert the data, you can do all within the database without any actual moving of the data. 

Tuesday, September 24, 2019

Groovy - AST Transformation

Groovy is a powerful language that gives the opportunity to its users to plugin into the compilation process to create what we call AST transformations, ie. the ability to customize the Abstract Syntax Tree representing your programs before the compiler walks this tree to generate Java bytecode.

When writing a lot of Groovy code, especially when you write it as part of a wider team it will be very beneficial to take some time to look into the inner workings of AST. As AST transformation can be build yourself to extend how Groovy is working it can help in ensuring that your code will be much more similar between different developers than it would be without using AST transformation.

When you are new to Groovy AST transformation the below talk can be a good starting point.

Monday, September 23, 2019

Google Cloud Function call Oracle ADW Rest end-point

When running an Oracle Autonomous Database, for example an Oracle Autonomous Data Warehouse (ADW for short) it is very likely that multiple applications and solutions do want to have access to the data available in the ADW.  A common scenario is that, a department in the enterprise has been developing an application in isolation and at one point in time requires some additional data from the data warehouse. In this case the data warehouse is the Oracle Autonomous Data Warehouse.

Call Oracle ADW from Google Cloud Functions
When developing an application in the Google Cloud you can make use of Google Cloud Functions. As Google Cloud Functions support development in Python you can write a generic function to retrieve, for example, customer details based upon a customer ID. We have deployed the Oracle ADW restfull data service in a previous blogpost. In this blogpost we want to call it with a GET request from the Google Cloud.

One generic function
When building an application using Google Cloud Functions which at several points need to interact with data in the Oracle ADW you do not want to code these multiple times. A more logical way of doing things is building one function to interact with Oracle ADW to obtain the needed data.


Every time your application calls the Google Cloud Function, with the propper JSON payload which contains a valid customer ID the Google Cloud function will call the Oracle ORDS endpoint which we developed as part of Oracle ADW. The return message from Oracle ADW will be the return message from the Google Cloud Function.

By building this "interaction layer" developers will only have to build the interaction with the Oracle Cloud based Oracle ADW once and after that they can work within Google Cloud to complete their specific Google Cloud based application.

Deploy a Google Cloud Function for Oracle Database
Deploying a Google Cloud function for Oracle Database starts with the same steps as deploying any cloud function. In our case we build a Python based application. The below image showcases the initial creation of the function:



We indicate that we want to use Python 3.7 and that the function inside our code, which is the entrypoint for execution, is named getCustomer.

The code used is shown below. Do note; when developing a production solution you most likely want to add additional security and a lot more error handling than shown in this example. This is just a very (very very) not production ready example. Additionally, the full URL of the Oracle ADW has been substituted with XXXX

import urllib.request

def getCustomerResponse(requistedCustomerId):
    """
    :param    :return:    """
    baseUrl = "https://XXXX.oraclecloudapps.com/ords/louwersj/parties/b2b/customers/"
    fullUrl = baseUrl + requistedCustomerId
    operUrl = urllib.request.urlopen(fullUrl)

    if(operUrl.getcode()==200):
        data = operUrl.read()
    else:
        data("Error receiving data from ADW", operUrl.getcode())
    return data

def getCustomer(request):
    """
    :param request:    :return:    """
    requestJson = request.get_json(silent=True)
    requestArgs = request.args

    if requestJson and 'customer_id' in requestJson:
        customerId = requestJson['customer_id']
    elif requestArgs and 'customer_id' in requestArgs:
        customerId = requestArgs['customer_id']
    else:
        customerId = 'ERROR'
    if customerId == "ERROR":
        responseData = "No customer_id provided"    else:

        responseData = getCustomerResponse(customerId)
    return responseData

Testing the function
Upon deployment you can test the google cloud function using the test functionality in the Google UI (or by calling it directly) from another location. If all is working you should receive a JSON style return message as shown in the below screenshot.



In the above screenshot the trigger event field contains our test JSON payload and the  function output contains a JSON response which originates for the Oracle ADW. 

Conclusion
When developing applications on multiple platforms, multiple clouds and multiple technologies and you require access to one central source of the truth you can use multiple technologies to connect to a centrally located Oracle Autonomous Data Warehouse. However, using a REST interface is in most cases a very simple and "fit for the job" kind of solution. 

When developing a solution like this it will require more strict error handling and it will require strict authentication and authorization however the base principle stands that hybrid multi-cloud applications can integrate with an Oracle Autonomous Data Warehouse in a very easy and cloud native manner. 

Create REST endpoint in Oracle Autonomous Database

Oracle provides, as part of the Oracle Cloud portfolio an Autonomous Database solution. The Autonomous Database is provided as an OLTP as well as a Data Warehouse deployment model. Without going into the technical details or the technical and operational benefits in this article we will focus on how to build REST interfaces in conjunction with oracle Autonomous Database. In this example we will use an Oracle Autonomous Data warehouse.

The example environment
For this example, we will have an Oracle Autonomous Data warehouse or ADW for short. As part of our example we will have a table called customers which will hold a generic structure of all our global customers and the parent / child relationship between customers in our table.

In the below screenshot you can see the table definition using the Oracle APEX object browser which is provisioned as part of the ADW deployment.



The example goal
The goal we will try to achieve in this example is providing a REST endpoint for applications to connect to and get some basic information about a customer as well as providing a REST endpoint which will enable an application to retrieve all subsidiaries from a given customer. All interactions are done based upon the customer ID which in our case is using a UUID.

Creating the first REST endpoint
The example is showing the entire creation of the REST endpoint by using the Oracle ADW APEX interface, however, this can also be achieved using any compatible SQL client and is not relying on the UI.

Creating a REST endpoint in Oracle ADW has to comply with a certain order of components. Restful Data Services require a module which can hold one or more templates (end points) and each template will hold one or more handlers. Handlers are responsible for handling the request for a certain request type, for example a POST or a GET request.

In our example we firstly create a module which in our case we name ADW.backend.parties with a base path called /parties/ .



When the module has been defined we can create the ORDS template, in this example we create a template for b2b/customers/:id in this annotation the intention is that :id will be substituted with a customer ID. As we have a component with a URI /parties the full path will become, as an example, /parties/b2b/customers/{some-customer-id}.



Having the template without any handlers to handle an incoming request will not provide any added functionality. As we want users to be able to get information based upon a customer id we will create a GET request handler which will be triggered on any GET request being executed against the end-point. The handler is also the location where the actual PL/SQL code will be defined to be executed when a GET request is being send. The below screenshot shows this.



Trigger the first REST endpoint. 
Having the first REST endpoint fully deployed we can test the endpoint by trying to execute a GET request against if from an external location. As this is a GET request we can do this from a browser, however you could use anything from cURL up until customer written Python code to call the endpoint with a GET request.

When providing the endpoint in a browser we get the below response:


For readability purposes we can format the message so it becomes more readable for humans.


Building the subsidiary endpoint
As stated we would build, as part of this example, also a way to lookup all subsidiaries of a given company. The previous endpoint provided the details of one company with the mention of the ID of the parent company. However, in some cases someone would like to retrieve a list of subsidiaries.

We already have an endpoint /parties/b2b/customers/{some-customer-id} and we can expand that with /subsidiaries which would make the endpoint /parties/b2b/customers/{some-customer-id}/subsidiaries

To achieve this we build a second ORDS endpoint specifically for this and we create a GET request handler for this newly created ORDS endpoint as well. The below screenshot shows the creation of the ORDS template to provide the required endpoint:


When the ORDS template is created we can create the GET handler. The GET handler is shown in the screenshot below and reacts to the :id which is part of the URI.


We now have created our second endpoint which will provide a JSON response containing all the subsidiaries for a given customer ID.  In case we call the endpoint and format the response we will see a message as shown below:


Conclusion
When you are using an Oracle Autonomous Database you automatically get a very simple way of building RESTfull data services in the form of REST endpoints. Even though the above example only scratches the surface of the possibilities and much more complex and much more secure implementations can be build it showcases the ease of use and showcases how quickly you can build a comprehensive REST interface while only leveraging the Oracle Cloud based solution in the form of an Oracle Autonomous Database.

Tuesday, May 07, 2019

Fn Project - quick install guide

The Fn project is an open source project originally started within Oracle as part of the drive for more enterprise open source solutions. The Fn project is an open source serverless compute platform. With Fn, you deploy your functions to an Fn server which automatically executes and manages them. Each function is executed in a Docker container enabling the platform to provide broad support for development languages including Java, JavaScript (Node), Go, Python, Ruby, and others. The Fn client and server are simple and elegant. You can run the server locally on your laptop, or on a server in your data center or in the cloud. The Fn project has a strong enterprise focus with emphasis on security, scalability, and observability.

With companies moving more and more to cloud native solutions and requiring more and more solutions that will not tie them into a single cloud platform you can observe a move away from vendor specific and proprietary serverless solutions.

The Fn project is a fully open source solution to build cloud native serverless solutions and you will be able to run it at every location. Oracle provides a managed service which allows you to consume Fn based serverless while at the same time you can run it within your own datacenter and on every cloud provider of your choice.

To get started quickly with the Fn project the fastest way is to install it on a virtual machine to experience it hands-on. The below presentation will give you a quick guide on how to get started with the Fn project on a local (virtual) machine.



More information can be found on the Fn project website and on other locations such as the Fn Slack channel and github

Monday, March 18, 2019

Oracle Linux & Cloud Automation - clean yum data with ansible

Building infrastructure, configuring servers and deploying software has been a manual task for many years. With the rise of cloud computing, virtual machines, containers and CI/CD processes we see more and more that those manual tasks are being diminished and fully automated. When building your infrastructure in the Oracle Cloud you can make use of a wide set of automation tools to make everything software defined and automated. Solutions like Ansible and Terraform provide some of the building blocks which can help you automate all the previously manual tasks. Ensuring you leverage solutions like this will increase speed and agility supported by Cloud computing.

In this  serie "Oracle Linux & Cloud Automation" we will go into more detail on how several solutions you can leverage to automate large parts of your IT footprint lifecycle. Please use the tagged label to find all posts on this subject.

Ansible
Ansible is an open-source software provisioning, configuration management, and application deployment tool. It runs on many Unix-like systems, and can configure both Unix-like systems as well as Microsoft Windows. It includes its own declarative language to describe system configuration.

Cleaning yum
Ansible has the easy possibility to install (remove or update) packages using yum. As a good practice you should clean your yum data from cached data so it is not taking any unneeded space on your local file system. Ansible is providing a module as a wrapper around the yum command, this means that you can use Ansible instead of directly interact with the yum command itself.

Missing from the module is a command to clean the cache. You can do so by putting the following in the Ansible playbook.

# clean all the yuym data.
  - name: Clean all yum data
    command: yum clean all
    args:
      warn: yes 

As you can see this is not using the yum module, it is using the command module instead. This is also the reason that we have the warn flag, currently set to yes however advisable to set to no. The warn flag will print the below message;

 [WARNING]: Consider using the yum module rather than running yum.  If you need to use 
command because yum is insufficient you can add warn=False to this command task or set 
command_warnings=False in ansible.cfg to get rid of this message.

The reason for the warning is that Ansible expects that all yum related commands are being done via the yum module and not the command module. As the yum module is missing the "clean all" option we have to do this via the command module and use the full command.

Oracle Linux tested
The below playbook is tested on an Oracle Linux 7 instance with Ansible 2.7.7.

show the playbook:
[root@localhost ansi]# cat webserver_playbook.yml 
- hosts: localhost
  tasks:
  - name: Ensure the latest yum-utils python package is available on the server
    yum:
      name: yum-utils
      state: latest
  - name: Ensure the latest python package is available on the server
    yum:
      name: python
      state: latest
  - name: Ensure the latest python-pip package is availabel on the server
    yum:
      name: python-pip
      state: latest
# clean all the yuym data.
  - name: Clean all yum data
    command: yum clean all
    args:
      warn: yes 

Show the output:
[root@localhost ansi]# ansible-playbook webserver_playbook.yml 
 [WARNING]: provided hosts list is empty, only localhost is available. Note
that the implicit localhost does not match 'all'


PLAY [localhost] ***************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [Ensure the latest yum-utils python package is available on the server] ***
ok: [localhost]

TASK [Ensure the latest python package is available on the server] *************
ok: [localhost]

TASK [Ensure the latest python-pip package is availabel on the server] *********
ok: [localhost]

TASK [Clean all yum data] ******************************************************
 [WARNING]: Consider using the yum module rather than running yum.  If you need
to use command because yum is insufficient you can add warn=False to this
command task or set command_warnings=False in ansible.cfg to get rid of this
message.

changed: [localhost]

PLAY RECAP *********************************************************************
localhost                  : ok=5    changed=1    unreachable=0    failed=0   

[root@localhost ansi]# 

Do also note that you have "changed=1" always. This is due to the fact that you always run the yum clean all. Even though this is not a config change on your system it is detected as a change. Technically it is a change as you clean the yum data, even in the cases that you do not download a package you still clean the meta-data.

Conclusion
Even though no native support for the clean is available in the Ansible yum module you can ensure that your system is not holding unnecessary data on the filesystem as shown in the example above. 

Friday, March 15, 2019

dash - changing the favicon

Dash is a Python framework for building analytical web applications. It can be used to very quickly develop small applications capable of running small analytical visualizations. As it is developers in Python you have a natural fit with solutions like Panda and the like.

When getting started with Dash, developed by Plotly, you might run into the following question; how do I change the favicon to show the one I want and not the one shipped by default.



The answer is, stop trying to code the solution. The only way (currently) is to create a directory named assets in the root of your project and add the desired favicon in to this location. This should result in the desired favicon showing.


Tuesday, March 05, 2019

Python - machine learning and clustering

Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups. In simple words, the aim is to segregate groups with similar traits and assign them into clusters.

Within machine learning we place clustering under unsupervised learning, clustering is used for example in recommendation systems, targeted marketing and customer segmentation.

The below outline is a simple starting point showing a basic form of clustering on a relatively small dataset. The dataset we will use is displayed in the scatter chart below. The objective we have is to determine 3 clusters in the data shown in scatter chart. In this example case the data is just random data, the data can however represent virtually everything. The data could for example be customers, demographic data, sensor data-points or anything else.


If you look at the data humans are by default driven by Apophenia to try and see patterns. Apophenia has come to imply a universal human tendency to seek patterns in random information, such as gambling. However, even though the human mind will try to see a pattern this is far from correct in many cases. To make a true valid clustering we will need to actually base the clustering on math and not the feeling of the human mind. 

By leveraging Python code we can devide the data into 3 distinct clusters, the found clusters are shown below in different colors.


We can now see the different clusters that are within the data. Finding the members of the cluster is done based upon K-means clustering,  K-means is a clustering algorithm that aims to partition n observations into k clusters. The main steps are:


  • Initialisation – K initial “means” (centroids) are generated at random
  • Assignment – K clusters are created by associating each observation with the nearest centroid
  • Update – The centroid of the clusters becomes the new mean

The result is that after the updates yuu will end up with (in our case) 3 centroids and the datapoint which is assoicated with this centroid based upon the most optimal (smallest) distance to the centroid.


The above scatter chart shows the centroids which form the backbone of the clustering. Normallyt hey will be hidden as they do not form an actual datapoint from the dataset. As you can see we have now 3 clusters from the bigger dataset.

Examples of clustering can be found on my Github project containing machine learning examples. 

Wednesday, February 20, 2019

Kubernetes - Minikube start dashboard for a Web UI

For those developing solutions that should run under Kubernetes, running a local version of Kubernetes leveraging Minikube can make your life much more easy. One of the question some people do have is how to ensure they can make use of the Kubernetes Web UI.

Running the Kubernetes Web UI while working with Minikube is relatively easy and you can start the Web UI with a single command of the minikube CLI. The below command showcases how to start the Web UI and also have your local browser open automatically to guide you to the correct URL.

Johans-MacBook-Pro:log jlouwers$ minikube dashboard
🔌  Enabling dashboard ...
🤔  Verifying dashboard health ...
🚀  Launching proxy ...
🤔  Verifying proxy health ...
🎉  Opening http://127.0.0.1:60438/api/v1/namespaces/kube-system/services/http:kubernetes-dashboard:/proxy/ in your default browser...

As you can see from the above example the only command needed is 'minikube dashboard'.


The above screenshot shows you the Kubernetes Web UI in the browser as started by the minikube command.

Friday, February 15, 2019

Python Pandas – consume Oracle Rest API data

When working with Pandas the most common know way to get data into a pandas Dataframe is to read a local csv file into the dataframe using a read_csv() operation. In many cases the data which is encapsulated within the csv file originally came from a database. To get from a database to a csv file on a machine where your Python code is running includes running a query, exporting the results to a csv file and transporting the csv file to a location where the Python code can read it and transform it into a pandas DataFrame.

When looking a modern systems we see that more and more persistent data stores provide REST APIs to expose data. Oracle has ORDS (Oracle Rest Data Services) which provide an easy way to build REST API endpoint as part of your Oracle Database.

Instead of extracting the data from the database, build a csv file, transport the csv file so you are able to consume it you can also instruct your python code to directly interact with the ORDS REST endpoint and read the JSON file directly.

The below JSON structure is an example of a very simple ORDS endpoint response message. From this message we are, in this example, only interested in the items it returns and we do want to have that in our pandas DataFrame.

{
 "items": [{
  "empno": 7369,
  "ename": "SMITH",
  "job": "CLERK",
  "mgr": 7902,
  "hiredate": "1980-12-17T00:00:00Z",
  "sal": 800,
  "comm": null,
  "deptno": 20
 }, {
  "empno": 7499,
  "ename": "ALLEN",
  "job": "SALESMAN",
  "mgr": 7698,
  "hiredate": "1981-02-20T00:00:00Z",
  "sal": 1600,
  "comm": 300,
  "deptno": 30
 }, {
  "empno": 7521,
  "ename": "WARD",
  "job": "SALESMAN",
  "mgr": 7698,
  "hiredate": "1981-02-22T00:00:00Z",
  "sal": 1250,
  "comm": 500,
  "deptno": 30
 }, {
  "empno": 7566,
  "ename": "JONES",
  "job": "MANAGER",
  "mgr": 7839,
  "hiredate": "1981-04-02T00:00:00Z",
  "sal": 2975,
  "comm": null,
  "deptno": 20
 }, {
  "empno": 7654,
  "ename": "MARTIN",
  "job": "SALESMAN",
  "mgr": 7698,
  "hiredate": "1981-09-28T00:00:00Z",
  "sal": 1250,
  "comm": 1400,
  "deptno": 30
 }, {
  "empno": 7698,
  "ename": "BLAKE",
  "job": "MANAGER",
  "mgr": 7839,
  "hiredate": "1981-05-01T00:00:00Z",
  "sal": 2850,
  "comm": null,
  "deptno": 30
 }, {
  "empno": 7782,
  "ename": "CLARK",
  "job": "MANAGER",
  "mgr": 7839,
  "hiredate": "1981-06-09T00:00:00Z",
  "sal": 2450,
  "comm": null,
  "deptno": 10
 }],
 "hasMore": true,
 "limit": 7,
 "offset": 0,
 "count": 7,
 "links": [{
  "rel": "self",
  "href": "http://192.168.33.10:8080/ords/pandas_test/test/employees"
 }, {
  "rel": "describedby",
  "href": "http://192.168.33.10:8080/ords/pandas_test/metadata-catalog/test/item"
 }, {
  "rel": "first",
  "href": "http://192.168.33.10:8080/ords/pandas_test/test/employees"
 }, {
  "rel": "next",
  "href": "http://192.168.33.10:8080/ords/pandas_test/test/employees?offset=7"
 }]
}

The below code shows how to fetch the data with Python from the ORDS endpoint and normalize the JSON in a way that we will only have the information about items in our dataframe.
import json
from urllib2 import urlopen
from pandas.io.json import json_normalize

# Fetch the data from the remote ORDS endpoint
apiResponse = urlopen("http://192.168.33.10:8080/ords/pandas_test/test/employees")
apiResponseFile = apiResponse.read().decode('utf-8', 'replace')

# load the JSON data we fetched from the ORDS endpoint into a dict
jsonData = json.loads(apiResponseFile)

# load the dict containing the JSON data into a DataFrame by using json_normalized.
# do note we only use 'items'
df = json_normalize(jsonData['items'])

# show the evidence we received the data from the ORDS endpoint.
print (df.head())
Interacting with a ORDS endpoint to retrieve the data out of the Oracle Database can be in many cases be much more efficient than taking the more traditional csv route. Options to use a direct connection to the database and use SQL statements will be for another example post. You can see the code used above also in the machine learning examples project on Github.

Wednesday, February 13, 2019

resolved - cx_Oracle.DatabaseError: ORA-24454: client host name is not set

When developing Python code in combiantion with cx_Oracle on a Mac you might run into some issues, especially when configuring your mac for the first time. One of the strange things I encountered was the ORA-24454 error when trying to connect to an Oracle database from my MacBook. ORA-24454 states that the client host name is not set.

When looking into the issue it turns out that the combination of the Oracle instant client and cx_Oracle will look into /etc/hosts on a Mac to find the client hostname to use it when initiating the connection from a mac to the database.

resolve the issue
A small disclaimer, this worked for me, I do expect it will work for other Mac users as well. First you have to find the actual hostname of your system, you can do so by executing one of the following commands;

Johans-MacBook-Pro:~ root# hostname 
Johans-MacBook-Pro.local

or you can run;

Johans-MacBook-Pro:~ root# python -c 'import socket; print(socket.gethostname());'
Johans-MacBook-Pro.local

Knowing the actual hostname of your machine you can now set it in /ect/hosts. This should make it look like something like the one below;

127.0.0.1 localhost
127.0.0.1 Johans-MacBook-Pro.local

When set this should ensure you do not longer encounter the cx_Oracle.DatabaseError: ORA-24454: client host name is not set error when running your Python code.

Tuesday, February 12, 2019

Python pandas – merge dataframes

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. When working with data you can load data (from multiple type of sources) into a designated DataFrame which will hold the data for future actions. A DataFrame is a Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In many cases the operations you want to do on data require data from more than one single data source. In those cases you have the option to merge (concatenate, join) multiple DataFrames into a single DataFrame for the operations you intend. In the below example, we merge two sets of data (DataFrames) from the World Bank into a single dataset (DataFrame) in one of the most basic merge manners.

Used datasets
For those interested in the datasets, the original data is coming from data.worldbank.org, for this specific example I have modified the way the .csv file is provided originally. You can get the modified .csv files from my machine learning examples project located at github.

Example code
The example we show is relative simple and is shown in the diagram below, we load two datasets using Pandas read_csv() into their individual DataFrame. When both are loaded we merge the two DataFrames into a single (new) Dataframe using merge().


The below is an outline of the code example, you can get the code example, including the used datasets from my machine learning examples project at github.

import pandas as pd

df0 = pd.read_csv('../../data/dataset_4.csv', delimiter=";",)
print ('show the content of the first file via dataframe df0')
print (df0.head())

df1 = pd.read_csv('../../data/dataset_5.csv', delimiter=";",)
print ('show the content of the second file via dataframe df1')
print (df1.head())

df2 = pd.merge(df0, df1, on=['Country Code','Country Name'])
print ('show the content of merged dataframes as a single dataframe')
print (df2.head())

Monday, February 11, 2019

Secure Software Development - the importance of dependency manifest files

When developing code, in this specific example python code, one thing you want to make sure is that you do not develop vulnerabilites. Vulnerabilities can be introduced primarily in two ways; you create them or you include them. One way of providing an extra check that you do not include vulnerabilties in your application is making sure you handle the dependency manifest files in the right way.

A dependency manifest file makes sure you have all the components your application relies upon are in a central place. One of the advantages is that you can use this file to scan for known security issues in components you depend upon. It is very easy to do an import or include like statement and add additional functionality to your code. However, whatever you include might have a known bug or vulnerability in a specific version.

Creating a dependency manifest file in python
When developing Python code you can leverage pip to create a dependency manifest file, commonly named as requirments.txt . The below command shows how you can create a dependency manifest file

pip freeze > requirements.txt

if we look into the content of this file we will notice a structure like the one shown below which lists all the dependencies and the exact version.

altgraph==0.10.2
bdist-mpkg==0.5.0
bonjour-py==0.3
macholib==1.5.1
matplotlib==1.3.1
modulegraph==0.10.4
numpy==1.16.1
pandas==0.24.1
py2app==0.7.3
pyobjc-core==2.5.1
pyobjc-framework-Accounts==2.5.1
pyobjc-framework-AddressBook==2.5.1
pyobjc-framework-AppleScriptKit==2.5.1
pyobjc-framework-AppleScriptObjC==2.5.1
pyobjc-framework-Automator==2.5.1
pyobjc-framework-CFNetwork==2.5.1
pyobjc-framework-Cocoa==2.5.1
pyobjc-framework-Collaboration==2.5.1
pyobjc-framework-CoreData==2.5.1
pyobjc-framework-CoreLocation==2.5.1
pyobjc-framework-CoreText==2.5.1
pyobjc-framework-DictionaryServices==2.5.1
pyobjc-framework-EventKit==2.5.1
pyobjc-framework-ExceptionHandling==2.5.1
pyobjc-framework-FSEvents==2.5.1
pyobjc-framework-InputMethodKit==2.5.1
pyobjc-framework-InstallerPlugins==2.5.1
pyobjc-framework-InstantMessage==2.5.1
pyobjc-framework-LatentSemanticMapping==2.5.1
pyobjc-framework-LaunchServices==2.5.1
pyobjc-framework-Message==2.5.1
pyobjc-framework-OpenDirectory==2.5.1
pyobjc-framework-PreferencePanes==2.5.1
pyobjc-framework-PubSub==2.5.1
pyobjc-framework-QTKit==2.5.1
pyobjc-framework-Quartz==2.5.1
pyobjc-framework-ScreenSaver==2.5.1
pyobjc-framework-ScriptingBridge==2.5.1
pyobjc-framework-SearchKit==2.5.1
pyobjc-framework-ServiceManagement==2.5.1
pyobjc-framework-Social==2.5.1
pyobjc-framework-SyncServices==2.5.1
pyobjc-framework-SystemConfiguration==2.5.1
pyobjc-framework-WebKit==2.5.1
pyOpenSSL==0.13.1
pyparsing==2.0.1
python-dateutil==2.8.0
pytz==2013.7
scipy==0.13.0b1
six==1.12.0
xattr==0.6.4

Check for known security issues
One of the most simple ways to check for known security issues is checking your code in at github.com. As part of the service provided by Github you will get alerts, based upon dependency manifest file, which dependencies might have a known security issue. The below screenshot shows the result of uploading a Python dependency manifest file to github.


As it turns out, somewhere in the chain of dependencies some project still has a old version of a pyOpenSSL included which has a known security vulnerability. The beauty of this approach is you have an direct insight and you can correct this right away.

Sunday, February 10, 2019

Python Matplotlib - showing or hiding a legend in a plot


When working with Matplotlib of visualize your data there are situations that you want to show the legend and in some cases you want to hide the legend. Showing or hiding the legend is very simple, as long as you know how to do it, the below example showcases both showing and hiding the legend from your plot.

The code used in this example uses pandas and matplotlib to plot the data. The full example of this is part of my machine learning example repository on Github where you can find this specific code and more.

Plot with legend
The below image shows the plotted data with a legend. Having a legend is in some cases very good, however in some cases it might be very disturbing to your image. Personally I think keeping a plot very clean (without a legend) is the best way of presenting a plot in many cases.
The code used for this is shown below. As you can see we use legend=True

df.plot(kind='line',x='ds',y='y',ax=ax, legend=True)


Plot without legend
The below image shows the plotted data without a legend. Having a legend is in some cases very good, however in some cases it might be very disturbing to your image. Personally I think keeping a plot very clean (without a legend) is the best way of presenting a plot in many cases.

The code used for this is shown below. As you can see we use legend=False
df.plot(kind='line',x='ds',y='y',ax=ax, legend=False)

Thursday, January 31, 2019

machine learning - matplotlib error in matplotlib.backends import _macosx


When trying to visualize and plot data in Python you might work with Matplotlib. In case you are working on MacOS and you use a venv, in some cases you might run into the below error message:

RuntimeError: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends. If you are using (Ana)Conda please install python.app and replace the use of 'python' with 'pythonw'. See 'Working with Matplotlib on OSX' in the Matplotlib FAQ for more information.

The reason for this error is that Matplotlib is not able to find the correct backend. The most easy way to resolve this in a quick and dirty way is to add the following line to your code.

matplotlib.use('TkAgg')

This should remove (in most cases) the error and your code should be able to run correctly. 

Tuesday, January 29, 2019

Machine learning - Supervised machine learning and decision tree classifiers


When working with machine learning, and especially when you start learning machine learing one of the first things you will encounter is supervised machine learning and writing decision tree based classifiers. A supervised classifier which will leverage a decission tree to classify an object into a group will base itself on provided and already labled data.

The data will (in most cases) consist out of a number of features all describing labled objects in your training data. As an example, provided by Google, we could have a set of objects all havign the label apple or orange. In our case we have mapped apple to the numeric value 0 and orange to the numeric value 1.

The features in oour dataset are the weight of the object (it being an apple or an orange) and the type of skin. The skin of the object is either smooth (like that of an apple) or bumpy (like that of an orange). We have mapped bumpy to the value 0 and smooth to the value 1.

The below code showcases the implementation and also a prediction for a new object compared to the data we have in the learning data.

from sklearn import tree

features = [[140, 1], [130, 1], [150, 0], [170, 0]]
labels = [0, 0, 1, 1]

clf = tree.DecisionTreeClassifier()
clf = clf.fit(features, labels)

print clf.predict([[150, 0]])

As you can see in the above example we predict what the object with [150, 0] will be, will it be an apple or an orange. The above example is just a couple of lines and is already a first example of a simple machine learning implementation in Python. The reason it will only take this limited amount of lines is that we can leverage all the work already done by the developers of scikit-learn.

You can find the above code example and more examples on my Github project page.


[SOLVED] OSError: [Errno 2] "dot" not found in path.

Python Data Visualization
When trying to visualize data using pydot in Python you might run into an error where it is stated that “dot” is not found in the path. This is after you already ensured you installed Pydot and ensured you did an import of Pydot in your python code. The main reason for this is that your python code is unable to find the dot executable. The dot executable comes from the graphviz project. This means that even though you did an install of the Pydot you are still missing a critical component.

If we look at the Pydot Pypi page you can see already a hint on this as it will tell you the following; Pydot is an interface to Graphviz and can parse and dump into the DOT language used by Grapgiz. Pydot is written in pure Python.

To resolve this is we can use yum to install Graphviz on Linux, in our case we use Oracle Linux.

yum -y install graphviz

this command will ensure that Graphviz is installed on your local Oracle Linux operating system, to check if the installation has been completed as expected you can use the below command to check the version;

[vagrant@localhost vagrant]$ dot -V
dot - graphviz version 2.30.1 (20180223.0356)

Now, if you run your python code and try something like pydot.graph_from_dot_data to work with dot data and visualize it at a later stage you will see you no longer have the issue you faced in the form of the OSError: [Errno 2] "dot" not found in path error faced before.

Friday, January 18, 2019

Oracle JET - data-bind as text or html

When using the KnockoutJS data-bind option while developing Oracle JET applications you will have the option to use the data-bind in combination with a text option or a html option. In most cases developers will use the text option to bind a javascript view/model variable in the html view code. In most cases variables will contain a value that needs to be displayed in the context of the html code, however, in some cases the variable can also contain html code itself. When a variable contains html markup code you will have to handle this differently when defining the data-bind in the view.

The below example screenshot displays the same variable used in a data-bind once with a text option and once with a html option. The first is using the text option and due to this the html code for the underline is not rendered. In the second line we use the html option and the underline markup code is rendered and the line is underlined in the view.


The below code example is showing the Oracle JET knockoutJS html view code, the code can also be seen as a gist on github.


The below code example is showing the Oracle JET knockoutJS html viewModel code, the code can also be seen as a gist on github

Thursday, January 17, 2019

Oracle JET - Knockout data-bind

When using Oracle JET for building your application you automatically make use of Knockout.js as this is a vital part of how Oracle JET works. Part of Knockout is the data-bind option. Knockout’s declarative binding system provides a concise and powerful way to link data to the UI. It’s generally easy and obvious to bind to simple data properties or to use a single binding. Understanding data-bind is one of the basic parts you need to grasp to be able to develop an application with Oracle JET.

You can bind variables defined in the viewModel (javascript) and make sure they become part of the view (HTML) code.  In the below diagram you can see the binding between the Oracle JET View and the Oracle JET View/Model


Knockout data-bind example
As you can see in the below example screenshot we have displayed the value of someVariable0. The value of someVariable0 is set in the view/model

The below example code showcases the view/model, within IncidentsViewModel funtion you can see we assign the value to the variable which has been defined outside of the IncidentsViewModel function. You can view the example code as on GitHub via this Gist link.



To ensure we bind the IncidentsViewModel variable IncidentsViewModel defined in the view/model .js file we have to make sure we bind this in the html code of the view. The below example showcases how this binding is done as part of a HTML span using the data-bind option. You can view the example code as on GitHub via this Gist link.



In effect, the above example showcases the most basic form of a knockout based Oracle JET binding between the view and the view/model.

Tuesday, January 15, 2019

UX as part of your Enterprise Architecture

Digitalization within enterprises is still growing rapidly, enterprises are more and more adopting digitalization in every aspect of the daily processes and are moving to more intelligent and integrated systems. Even though a lot of work is being done in the backend systems and a lot of systems are developed and modernized to work in the new digital era a large part of the work has to do with UX User experience.

A large number of enterprises are still lacking in building a good and unified user experience for internal users. It has been thought for long that user experience was more applicable for the external systems such as websites, webshops and mobile applications. It is however evenly important to have a good and clear view on the internal user experience.

Internal user experience
Internal users, your employees, will use the systems developed on a daily basis. Ensuring the systems are simple to use, do what they promise and provide an intuitive experience will add to the productivity. Additionally, ensuring that systems are easy to work with and provide a good experience will ensure that your employees are more motivated and adoption of new systems will be higher

UX as an enterprise architecture component
In the past, it was common that every system within an enterprise would have a different experience. Menu structures, screen structures and the way a system behaved was different per application. As an employee normally interacts with multiple systems this can become overwhelming and complex. Additionally, it is relatively common that all internal enterprise user experiences are, to put it mildly, not that good. Most common, every system has a suboptimal interface and an interface design which is different from the rest.

An advised solution is to include standards for UX and interface design into the Enterprise Architecture repository and ensure, depending on your enterprise size, you have dedicated people to support developers and teams to include your enterprise UX blueprints within the internal applications.

When UX and interface design is a part of the enterprise architecture standards and you ensure all applications adhere to the standards the application landscape will start to become uniform. The additional advantage is that you can have a dedicated group of people who build UX components such as stylesheets, icons, fonts, javascripts and other components to be easily adopted and included by application development teams. At the same go, if you have dependency management done correctly, a change to a central UX component will automatically be adopted by all applications.

Having a Unified Enterprise UX is, from a user experience and adoption point of view one of the most important parts to ensure your digital strategy will succeed. 

Add UX consultants to your team
Not every developer is a UX consultant and not every UX consultant is a developer. Ensuring that your enterprise has a good UX team or a least a good UX consultant to support development teams can be of a large advantage. As per Paul Boag the eight biggest advantages of a UX consultant for your company are the following:
  1. UX Consultants Help Better Understand Customers
  2. UX Consultants Audit Websites
  3. UX Consultants Prototype and Test Better Experiences
  4. UX Consultants Will Establish Your Strategy
  5. UX Consultants Help Implement Change
  6. UX Consultants Educate and Inspire Colleagues
  7. UX Consultants Create Design Systems
  8. UX Consultants Will Help Incrementally Improve the Experience

Adopt a UX template
Building a UX strategy from scratch is complex and costly. A common seen approach for enterprises is that they adopt a template and strategy and use this as the foundation for their enterprise specific UX strategy.

As an example of enterprise UI and UX design, Oracle provides Alta UI which is a true enterprise grade user experience which you can adopt as part of your own enterprise UI and UX strategy. An example is shown below:

The benefit of adopting a UX strategy is that, when selected a mature implementation, a lot of the work is already done for you and as an enterprise you can benefit from a well thought through design. Style guides and other components are ready to be adopted and will not require a lot of customizations to be used within your enterprise so you can ensure all your applications have the same design and the same user experience. 


The above shown presentation from Andrejus Baranovskis showcases Oracle Alta UI Patterns for Enterprise Applications and Responsive UI Support