Johan Louwers - Tech blog: February 2019

Wednesday, February 20, 2019

Kubernetes - Minikube start dashboard for a Web UI

For those developing solutions that should run under Kubernetes, running a local version of Kubernetes leveraging Minikube can make your life much more easy. One of the question some people do have is how to ensure they can make use of the Kubernetes Web UI.

Running the Kubernetes Web UI while working with Minikube is relatively easy and you can start the Web UI with a single command of the minikube CLI. The below command showcases how to start the Web UI and also have your local browser open automatically to guide you to the correct URL.

Johans-MacBook-Pro:log jlouwers$ minikube dashboard

🔌 Enabling dashboard ...

🤔 Verifying dashboard health ...

🚀 Launching proxy ...

🤔 Verifying proxy health ...

🎉 Opening http://127.0.0.1:60438/api/v1/namespaces/kube-system/services/http:kubernetes-dashboard:/proxy/ in your default browser...

As you can see from the above example the only command needed is 'minikube dashboard'.

The above screenshot shows you the Kubernetes Web UI in the browser as started by the minikube command.

Friday, February 15, 2019

Python Pandas – consume Oracle Rest API data

When working with Pandas the most common know way to get data into a pandas Dataframe is to read a local csv file into the dataframe using a read_csv() operation. In many cases the data which is encapsulated within the csv file originally came from a database. To get from a database to a csv file on a machine where your Python code is running includes running a query, exporting the results to a csv file and transporting the csv file to a location where the Python code can read it and transform it into a pandas DataFrame.

When looking a modern systems we see that more and more persistent data stores provide REST APIs to expose data. Oracle has ORDS (Oracle Rest Data Services) which provide an easy way to build REST API endpoint as part of your Oracle Database.

Instead of extracting the data from the database, build a csv file, transport the csv file so you are able to consume it you can also instruct your python code to directly interact with the ORDS REST endpoint and read the JSON file directly.

The below JSON structure is an example of a very simple ORDS endpoint response message. From this message we are, in this example, only interested in the items it returns and we do want to have that in our pandas DataFrame.

{
 "items": [{
  "empno": 7369,
  "ename": "SMITH",
  "job": "CLERK",
  "mgr": 7902,
  "hiredate": "1980-12-17T00:00:00Z",
  "sal": 800,
  "comm": null,
  "deptno": 20
 }, {
  "empno": 7499,
  "ename": "ALLEN",
  "job": "SALESMAN",
  "mgr": 7698,
  "hiredate": "1981-02-20T00:00:00Z",
  "sal": 1600,
  "comm": 300,
  "deptno": 30
 }, {
  "empno": 7521,
  "ename": "WARD",
  "job": "SALESMAN",
  "mgr": 7698,
  "hiredate": "1981-02-22T00:00:00Z",
  "sal": 1250,
  "comm": 500,
  "deptno": 30
 }, {
  "empno": 7566,
  "ename": "JONES",
  "job": "MANAGER",
  "mgr": 7839,
  "hiredate": "1981-04-02T00:00:00Z",
  "sal": 2975,
  "comm": null,
  "deptno": 20
 }, {
  "empno": 7654,
  "ename": "MARTIN",
  "job": "SALESMAN",
  "mgr": 7698,
  "hiredate": "1981-09-28T00:00:00Z",
  "sal": 1250,
  "comm": 1400,
  "deptno": 30
 }, {
  "empno": 7698,
  "ename": "BLAKE",
  "job": "MANAGER",
  "mgr": 7839,
  "hiredate": "1981-05-01T00:00:00Z",
  "sal": 2850,
  "comm": null,
  "deptno": 30
 }, {
  "empno": 7782,
  "ename": "CLARK",
  "job": "MANAGER",
  "mgr": 7839,
  "hiredate": "1981-06-09T00:00:00Z",
  "sal": 2450,
  "comm": null,
  "deptno": 10
 }],
 "hasMore": true,
 "limit": 7,
 "offset": 0,
 "count": 7,
 "links": [{
  "rel": "self",
  "href": "http://192.168.33.10:8080/ords/pandas_test/test/employees"
 }, {
  "rel": "describedby",
  "href": "http://192.168.33.10:8080/ords/pandas_test/metadata-catalog/test/item"
 }, {
  "rel": "first",
  "href": "http://192.168.33.10:8080/ords/pandas_test/test/employees"
 }, {
  "rel": "next",
  "href": "http://192.168.33.10:8080/ords/pandas_test/test/employees?offset=7"
 }]
}

The below code shows how to fetch the data with Python from the ORDS endpoint and normalize the JSON in a way that we will only have the information about items in our dataframe.

import json
from urllib2 import urlopen
from pandas.io.json import json_normalize

# Fetch the data from the remote ORDS endpoint
apiResponse = urlopen("http://192.168.33.10:8080/ords/pandas_test/test/employees")
apiResponseFile = apiResponse.read().decode('utf-8', 'replace')

# load the JSON data we fetched from the ORDS endpoint into a dict
jsonData = json.loads(apiResponseFile)

# load the dict containing the JSON data into a DataFrame by using json_normalized.
# do note we only use 'items'
df = json_normalize(jsonData['items'])

# show the evidence we received the data from the ORDS endpoint.
print (df.head())

Interacting with a ORDS endpoint to retrieve the data out of the Oracle Database can be in many cases be much more efficient than taking the more traditional csv route. Options to use a direct connection to the database and use SQL statements will be for another example post. You can see the code used above also in the machine learning examples project on Github.

Wednesday, February 13, 2019

resolved - cx_Oracle.DatabaseError: ORA-24454: client host name is not set

When developing Python code in combiantion with cx_Oracle on a Mac you might run into some issues, especially when configuring your mac for the first time. One of the strange things I encountered was the ORA-24454 error when trying to connect to an Oracle database from my MacBook. ORA-24454 states that the client host name is not set.

When looking into the issue it turns out that the combination of the Oracle instant client and cx_Oracle will look into /etc/hosts on a Mac to find the client hostname to use it when initiating the connection from a mac to the database.

resolve the issue
A small disclaimer, this worked for me, I do expect it will work for other Mac users as well. First you have to find the actual hostname of your system, you can do so by executing one of the following commands;

Johans-MacBook-Pro:~ root# hostname 
Johans-MacBook-Pro.local

or you can run;

Johans-MacBook-Pro:~ root# python -c 'import socket; print(socket.gethostname());'
Johans-MacBook-Pro.local

Knowing the actual hostname of your machine you can now set it in /ect/hosts. This should make it look like something like the one below;

127.0.0.1 localhost
127.0.0.1 Johans-MacBook-Pro.local

When set this should ensure you do not longer encounter the cx_Oracle.DatabaseError: ORA-24454: client host name is not set error when running your Python code.

Tuesday, February 12, 2019

Python pandas – merge dataframes

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. When working with data you can load data (from multiple type of sources) into a designated DataFrame which will hold the data for future actions. A DataFrame is a Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns).

In many cases the operations you want to do on data require data from more than one single data source. In those cases you have the option to merge (concatenate, join) multiple DataFrames into a single DataFrame for the operations you intend. In the below example, we merge two sets of data (DataFrames) from the World Bank into a single dataset (DataFrame) in one of the most basic merge manners.

Used datasets
For those interested in the datasets, the original data is coming from data.worldbank.org, for this specific example I have modified the way the .csv file is provided originally. You can get the modified .csv files from my machine learning examples project located at github.

Example code
The example we show is relative simple and is shown in the diagram below, we load two datasets using Pandas read_csv() into their individual DataFrame. When both are loaded we merge the two DataFrames into a single (new) Dataframe using merge().

The below is an outline of the code example, you can get the code example, including the used datasets from my machine learning examples project at github.

import pandas as pd

df0 = pd.read_csv('../../data/dataset_4.csv', delimiter=";",)
print ('show the content of the first file via dataframe df0')
print (df0.head())

df1 = pd.read_csv('../../data/dataset_5.csv', delimiter=";",)
print ('show the content of the second file via dataframe df1')
print (df1.head())

df2 = pd.merge(df0, df1, on=['Country Code','Country Name'])
print ('show the content of merged dataframes as a single dataframe')
print (df2.head())

Monday, February 11, 2019

Secure Software Development - the importance of dependency manifest files

When developing code, in this specific example python code, one thing you want to make sure is that you do not develop vulnerabilites. Vulnerabilities can be introduced primarily in two ways; you create them or you include them. One way of providing an extra check that you do not include vulnerabilties in your application is making sure you handle the dependency manifest files in the right way.

A dependency manifest file makes sure you have all the components your application relies upon are in a central place. One of the advantages is that you can use this file to scan for known security issues in components you depend upon. It is very easy to do an import or include like statement and add additional functionality to your code. However, whatever you include might have a known bug or vulnerability in a specific version.

Creating a dependency manifest file in python
When developing Python code you can leverage pip to create a dependency manifest file, commonly named as requirments.txt . The below command shows how you can create a dependency manifest file

pip freeze > requirements.txt

if we look into the content of this file we will notice a structure like the one shown below which lists all the dependencies and the exact version.

altgraph==0.10.2
bdist-mpkg==0.5.0
bonjour-py==0.3
macholib==1.5.1
matplotlib==1.3.1
modulegraph==0.10.4
numpy==1.16.1
pandas==0.24.1
py2app==0.7.3
pyobjc-core==2.5.1
pyobjc-framework-Accounts==2.5.1
pyobjc-framework-AddressBook==2.5.1
pyobjc-framework-AppleScriptKit==2.5.1
pyobjc-framework-AppleScriptObjC==2.5.1
pyobjc-framework-Automator==2.5.1
pyobjc-framework-CFNetwork==2.5.1
pyobjc-framework-Cocoa==2.5.1
pyobjc-framework-Collaboration==2.5.1
pyobjc-framework-CoreData==2.5.1
pyobjc-framework-CoreLocation==2.5.1
pyobjc-framework-CoreText==2.5.1
pyobjc-framework-DictionaryServices==2.5.1
pyobjc-framework-EventKit==2.5.1
pyobjc-framework-ExceptionHandling==2.5.1
pyobjc-framework-FSEvents==2.5.1
pyobjc-framework-InputMethodKit==2.5.1
pyobjc-framework-InstallerPlugins==2.5.1
pyobjc-framework-InstantMessage==2.5.1
pyobjc-framework-LatentSemanticMapping==2.5.1
pyobjc-framework-LaunchServices==2.5.1
pyobjc-framework-Message==2.5.1
pyobjc-framework-OpenDirectory==2.5.1
pyobjc-framework-PreferencePanes==2.5.1
pyobjc-framework-PubSub==2.5.1
pyobjc-framework-QTKit==2.5.1
pyobjc-framework-Quartz==2.5.1
pyobjc-framework-ScreenSaver==2.5.1
pyobjc-framework-ScriptingBridge==2.5.1
pyobjc-framework-SearchKit==2.5.1
pyobjc-framework-ServiceManagement==2.5.1
pyobjc-framework-Social==2.5.1
pyobjc-framework-SyncServices==2.5.1
pyobjc-framework-SystemConfiguration==2.5.1
pyobjc-framework-WebKit==2.5.1
pyOpenSSL==0.13.1
pyparsing==2.0.1
python-dateutil==2.8.0
pytz==2013.7
scipy==0.13.0b1
six==1.12.0
xattr==0.6.4

Check for known security issues
One of the most simple ways to check for known security issues is checking your code in at github.com. As part of the service provided by Github you will get alerts, based upon dependency manifest file, which dependencies might have a known security issue. The below screenshot shows the result of uploading a Python dependency manifest file to github.

As it turns out, somewhere in the chain of dependencies some project still has a old version of a pyOpenSSL included which has a known security vulnerability. The beauty of this approach is you have an direct insight and you can correct this right away.

Sunday, February 10, 2019

Python Matplotlib - showing or hiding a legend in a plot

When working with Matplotlib of visualize your data there are situations that you want to show the legend and in some cases you want to hide the legend. Showing or hiding the legend is very simple, as long as you know how to do it, the below example showcases both showing and hiding the legend from your plot.

The code used in this example uses pandas and matplotlib to plot the data. The full example of this is part of my machine learning example repository on Github where you can find this specific code and more.

Plot with legend
The below image shows the plotted data with a legend. Having a legend is in some cases very good, however in some cases it might be very disturbing to your image. Personally I think keeping a plot very clean (without a legend) is the best way of presenting a plot in many cases.

The code used for this is shown below. As you can see we use legend=True

df.plot(kind='line',x='ds',y='y',ax=ax, legend=True)

Plot without legend
The below image shows the plotted data without a legend. Having a legend is in some cases very good, however in some cases it might be very disturbing to your image. Personally I think keeping a plot very clean (without a legend) is the best way of presenting a plot in many cases.

The code used for this is shown below. As you can see we use legend=False

df.plot(kind='line',x='ds',y='y',ax=ax, legend=False)