When working with Pandas the most common know way to get data into a pandas Dataframe is to read a local csv file into the dataframe using a read_csv() operation. In many cases the data which is encapsulated within the csv file originally came from a database. To get from a database to a csv file on a machine where your Python code is running includes running a query, exporting the results to a csv file and transporting the csv file to a location where the Python code can read it and transform it into a pandas DataFrame.
When looking a modern systems we see that more and more persistent data stores provide REST APIs to expose data. Oracle has ORDS (Oracle Rest Data Services) which provide an easy way to build REST API endpoint as part of your Oracle Database.
Instead of extracting the data from the database, build a csv file, transport the csv file so you are able to consume it you can also instruct your python code to directly interact with the ORDS REST endpoint and read the JSON file directly.
The below JSON structure is an example of a very simple ORDS endpoint response message. From this message we are, in this example, only interested in the items it returns and we do want to have that in our pandas DataFrame.
The below code shows how to fetch the data with Python from the ORDS endpoint and normalize the JSON in a way that we will only have the information about items in our dataframe.
When looking a modern systems we see that more and more persistent data stores provide REST APIs to expose data. Oracle has ORDS (Oracle Rest Data Services) which provide an easy way to build REST API endpoint as part of your Oracle Database.
Instead of extracting the data from the database, build a csv file, transport the csv file so you are able to consume it you can also instruct your python code to directly interact with the ORDS REST endpoint and read the JSON file directly.
The below JSON structure is an example of a very simple ORDS endpoint response message. From this message we are, in this example, only interested in the items it returns and we do want to have that in our pandas DataFrame.
{ "items": [{ "empno": 7369, "ename": "SMITH", "job": "CLERK", "mgr": 7902, "hiredate": "1980-12-17T00:00:00Z", "sal": 800, "comm": null, "deptno": 20 }, { "empno": 7499, "ename": "ALLEN", "job": "SALESMAN", "mgr": 7698, "hiredate": "1981-02-20T00:00:00Z", "sal": 1600, "comm": 300, "deptno": 30 }, { "empno": 7521, "ename": "WARD", "job": "SALESMAN", "mgr": 7698, "hiredate": "1981-02-22T00:00:00Z", "sal": 1250, "comm": 500, "deptno": 30 }, { "empno": 7566, "ename": "JONES", "job": "MANAGER", "mgr": 7839, "hiredate": "1981-04-02T00:00:00Z", "sal": 2975, "comm": null, "deptno": 20 }, { "empno": 7654, "ename": "MARTIN", "job": "SALESMAN", "mgr": 7698, "hiredate": "1981-09-28T00:00:00Z", "sal": 1250, "comm": 1400, "deptno": 30 }, { "empno": 7698, "ename": "BLAKE", "job": "MANAGER", "mgr": 7839, "hiredate": "1981-05-01T00:00:00Z", "sal": 2850, "comm": null, "deptno": 30 }, { "empno": 7782, "ename": "CLARK", "job": "MANAGER", "mgr": 7839, "hiredate": "1981-06-09T00:00:00Z", "sal": 2450, "comm": null, "deptno": 10 }], "hasMore": true, "limit": 7, "offset": 0, "count": 7, "links": [{ "rel": "self", "href": "http://192.168.33.10:8080/ords/pandas_test/test/employees" }, { "rel": "describedby", "href": "http://192.168.33.10:8080/ords/pandas_test/metadata-catalog/test/item" }, { "rel": "first", "href": "http://192.168.33.10:8080/ords/pandas_test/test/employees" }, { "rel": "next", "href": "http://192.168.33.10:8080/ords/pandas_test/test/employees?offset=7" }] }
The below code shows how to fetch the data with Python from the ORDS endpoint and normalize the JSON in a way that we will only have the information about items in our dataframe.
import json from urllib2 import urlopen from pandas.io.json import json_normalize # Fetch the data from the remote ORDS endpoint apiResponse = urlopen("http://192.168.33.10:8080/ords/pandas_test/test/employees") apiResponseFile = apiResponse.read().decode('utf-8', 'replace') # load the JSON data we fetched from the ORDS endpoint into a dict jsonData = json.loads(apiResponseFile) # load the dict containing the JSON data into a DataFrame by using json_normalized. # do note we only use 'items' df = json_normalize(jsonData['items']) # show the evidence we received the data from the ORDS endpoint. print (df.head())Interacting with a ORDS endpoint to retrieve the data out of the Oracle Database can be in many cases be much more efficient than taking the more traditional csv route. Options to use a direct connection to the database and use SQL statements will be for another example post. You can see the code used above also in the machine learning examples project on Github.
No comments:
Post a Comment