Johan Louwers - Tech blog: September 2019

Wednesday, September 25, 2019

Creating a training set table for machine learning in Oracle Database

When building a machine learning model, you will require a learning / training set of data. To enable you to quickly create a set of training data you can make use of the SQL SAMPLE clause in a select statement. Using the SAMPLE clause you instruct the database to select from a random sample of data from the table, rather than from the entire table. This provides a very simple way of getting the random collection of records you require for training your model.

Situation
You do have a large (or small) table of data in your database, in our case we use an Oracle Autonomous Data Warehouse and intend to use a part of this as training data while you want to use the remaining part for testing your model.

Assume we have a table named louwersj.loans which we want to use for both our training data as well as our test data. A simple way of splitting it in a 70/30 fashion is to use the below commands:

Step 1:
Check the total number of records in the table:

SELECT
    COUNT(1)
FROM
    louwersj.loans;

In our case this will give us the result of 614 as we have 614 records in our dataset

Step 2:
Take a 70% of the total and use this to create a table, this is where we will use the sample clause in the SQL statement to ensure we get a random 70% of the records. By issuing the below command the table loans_traindata will be exactly the same as the original table loans with only a random subset of the original table.

CREATE TABLE louwersj.loans_traindata
    AS
        SELECT
            *
        FROM
            louwersj.loans SAMPLE ( 70 ) SEED ( 1 )

To validate if this gives us what we wanted we can do another count to see if we indeed get a training set which contains 70% of the original table, the below command will return 455.

SELECT
    COUNT(1)
FROM
    louwersj.loans_traindata

Step 3:
Next to the train data we need to have some test data to validate the working of our machine learning model after we have trained it. For this we can use the remaining 30% of the data from the original table. With the following command we create a new table which will contain exactly that:

CREATE TABLE louwersj.loans_testdata
    AS
        SELECT
            *
        FROM
            louwersj.loans
        MINUS
        SELECT
            *
        FROM
            louwersj.loans_traindata

Conclusion
Using the SAMPLE clause as part of a CREATE TABLE AS statement in the Oracle Database helps you to speed up creating a good training set and test for your machine learning model. No needing to extract data from the database and re-insert the data, you can do all within the database without any actual moving of the data.

Tuesday, September 24, 2019

Groovy - AST Transformation

Groovy is a powerful language that gives the opportunity to its users to plugin into the compilation process to create what we call AST transformations, ie. the ability to customize the Abstract Syntax Tree representing your programs before the compiler walks this tree to generate Java bytecode.

When writing a lot of Groovy code, especially when you write it as part of a wider team it will be very beneficial to take some time to look into the inner workings of AST. As AST transformation can be build yourself to extend how Groovy is working it can help in ensuring that your code will be much more similar between different developers than it would be without using AST transformation.

When you are new to Groovy AST transformation the below talk can be a good starting point.

Monday, September 23, 2019

Google Cloud Function call Oracle ADW Rest end-point

When running an Oracle Autonomous Database, for example an Oracle Autonomous Data Warehouse (ADW for short) it is very likely that multiple applications and solutions do want to have access to the data available in the ADW. A common scenario is that, a department in the enterprise has been developing an application in isolation and at one point in time requires some additional data from the data warehouse. In this case the data warehouse is the Oracle Autonomous Data Warehouse.

Call Oracle ADW from Google Cloud Functions
When developing an application in the Google Cloud you can make use of Google Cloud Functions. As Google Cloud Functions support development in Python you can write a generic function to retrieve, for example, customer details based upon a customer ID. We have deployed the Oracle ADW restfull data service in a previous blogpost. In this blogpost we want to call it with a GET request from the Google Cloud.

One generic function
When building an application using Google Cloud Functions which at several points need to interact with data in the Oracle ADW you do not want to code these multiple times. A more logical way of doing things is building one function to interact with Oracle ADW to obtain the needed data.

Every time your application calls the Google Cloud Function, with the propper JSON payload which contains a valid customer ID the Google Cloud function will call the Oracle ORDS endpoint which we developed as part of Oracle ADW. The return message from Oracle ADW will be the return message from the Google Cloud Function.

By building this "interaction layer" developers will only have to build the interaction with the Oracle Cloud based Oracle ADW once and after that they can work within Google Cloud to complete their specific Google Cloud based application.

Deploy a Google Cloud Function for Oracle Database
Deploying a Google Cloud function for Oracle Database starts with the same steps as deploying any cloud function. In our case we build a Python based application. The below image showcases the initial creation of the function:

We indicate that we want to use Python 3.7 and that the function inside our code, which is the entrypoint for execution, is named getCustomer.

The code used is shown below. Do note; when developing a production solution you most likely want to add additional security and a lot more error handling than shown in this example. This is just a very (very very) not production ready example. Additionally, the full URL of the Oracle ADW has been substituted with XXXX

import urllib.request

def getCustomerResponse(requistedCustomerId):
    """
    :param    :return:    """
    baseUrl = "https://XXXX.oraclecloudapps.com/ords/louwersj/parties/b2b/customers/"
    fullUrl = baseUrl + requistedCustomerId
    operUrl = urllib.request.urlopen(fullUrl)

    if(operUrl.getcode()==200):
        data = operUrl.read()
    else:
        data("Error receiving data from ADW", operUrl.getcode())
    return data

def getCustomer(request):
    """
    :param request:    :return:    """
    requestJson = request.get_json(silent=True)
    requestArgs = request.args

    if requestJson and 'customer_id' in requestJson:
        customerId = requestJson['customer_id']
    elif requestArgs and 'customer_id' in requestArgs:
        customerId = requestArgs['customer_id']
    else:
        customerId = 'ERROR'
    if customerId == "ERROR":
        responseData = "No customer_id provided"    else:

        responseData = getCustomerResponse(customerId)
    return responseData

Testing the function

Upon deployment you can test the google cloud function using the test functionality in the Google UI (or by calling it directly) from another location. If all is working you should receive a JSON style return message as shown in the below screenshot.

In the above screenshot the trigger event field contains our test JSON payload and the function output contains a JSON response which originates for the Oracle ADW.

Conclusion

When developing applications on multiple platforms, multiple clouds and multiple technologies and you require access to one central source of the truth you can use multiple technologies to connect to a centrally located Oracle Autonomous Data Warehouse. However, using a REST interface is in most cases a very simple and "fit for the job" kind of solution.

When developing a solution like this it will require more strict error handling and it will require strict authentication and authorization however the base principle stands that hybrid multi-cloud applications can integrate with an Oracle Autonomous Data Warehouse in a very easy and cloud native manner.

Create REST endpoint in Oracle Autonomous Database

Oracle provides, as part of the Oracle Cloud portfolio an Autonomous Database solution. The Autonomous Database is provided as an OLTP as well as a Data Warehouse deployment model. Without going into the technical details or the technical and operational benefits in this article we will focus on how to build REST interfaces in conjunction with oracle Autonomous Database. In this example we will use an Oracle Autonomous Data warehouse.

The example environment
For this example, we will have an Oracle Autonomous Data warehouse or ADW for short. As part of our example we will have a table called customers which will hold a generic structure of all our global customers and the parent / child relationship between customers in our table.

In the below screenshot you can see the table definition using the Oracle APEX object browser which is provisioned as part of the ADW deployment.

The example goal
The goal we will try to achieve in this example is providing a REST endpoint for applications to connect to and get some basic information about a customer as well as providing a REST endpoint which will enable an application to retrieve all subsidiaries from a given customer. All interactions are done based upon the customer ID which in our case is using a UUID.

Creating the first REST endpoint
The example is showing the entire creation of the REST endpoint by using the Oracle ADW APEX interface, however, this can also be achieved using any compatible SQL client and is not relying on the UI.

Creating a REST endpoint in Oracle ADW has to comply with a certain order of components. Restful Data Services require a module which can hold one or more templates (end points) and each template will hold one or more handlers. Handlers are responsible for handling the request for a certain request type, for example a POST or a GET request.

In our example we firstly create a module which in our case we name ADW.backend.parties with a base path called /parties/ .

When the module has been defined we can create the ORDS template, in this example we create a template for b2b/customers/:id in this annotation the intention is that :id will be substituted with a customer ID. As we have a component with a URI /parties the full path will become, as an example, /parties/b2b/customers/{some-customer-id}.

Having the template without any handlers to handle an incoming request will not provide any added functionality. As we want users to be able to get information based upon a customer id we will create a GET request handler which will be triggered on any GET request being executed against the end-point. The handler is also the location where the actual PL/SQL code will be defined to be executed when a GET request is being send. The below screenshot shows this.

Trigger the first REST endpoint.
Having the first REST endpoint fully deployed we can test the endpoint by trying to execute a GET request against if from an external location. As this is a GET request we can do this from a browser, however you could use anything from cURL up until customer written Python code to call the endpoint with a GET request.

When providing the endpoint in a browser we get the below response:

For readability purposes we can format the message so it becomes more readable for humans.

Building the subsidiary endpoint
As stated we would build, as part of this example, also a way to lookup all subsidiaries of a given company. The previous endpoint provided the details of one company with the mention of the ID of the parent company. However, in some cases someone would like to retrieve a list of subsidiaries.

We already have an endpoint /parties/b2b/customers/{some-customer-id} and we can expand that with /subsidiaries which would make the endpoint /parties/b2b/customers/{some-customer-id}/subsidiaries

To achieve this we build a second ORDS endpoint specifically for this and we create a GET request handler for this newly created ORDS endpoint as well. The below screenshot shows the creation of the ORDS template to provide the required endpoint:

When the ORDS template is created we can create the GET handler. The GET handler is shown in the screenshot below and reacts to the :id which is part of the URI.

We now have created our second endpoint which will provide a JSON response containing all the subsidiaries for a given customer ID. In case we call the endpoint and format the response we will see a message as shown below:

Conclusion
When you are using an Oracle Autonomous Database you automatically get a very simple way of building RESTfull data services in the form of REST endpoints. Even though the above example only scratches the surface of the possibilities and much more complex and much more secure implementations can be build it showcases the ease of use and showcases how quickly you can build a comprehensive REST interface while only leveraging the Oracle Cloud based solution in the form of an Oracle Autonomous Database.