Thursday, January 22, 2009

Oracle retrieve remote XML files

Oracle databases have the ability to read store and process remote XML files, many developers of 'traditional' PL/SQL code are not aware of this I have noticed. However, having the ability to read a remote XML file via the HTTP protocol can open lots of possibilities and can make your live a lot easier. In some cases you do not have to develop all kinds of interfaces to have the information send to a server to be processed and placed in the database. You can simply point at a location on a web server and retrieve a XML file. There are thousands of possible fields where this can be used, think about getting stock market information, scientific readings, machine information, research data, rss feeds and for example weather data.

When you need information in your database and it is available in XML and can be accessed via HTTP you can very simply read, parse and store this information in your oracle database. As a example I will show you a project of reading weather data from the National Oceanic and Atmospheric Administration's servers in a XML format into a Oracle database where it later was used to tune weather models.

The National Oceanic and Atmospheric Administration's NOAA is providing the readings of the weather stations in a XML format on this website. The request was to be able to read and store all the raw XML files in a Oracle database. The parsing part was done later. This can be a very good example for for example retrieving and storing stock market information which can also be done in a similar way. At a later stage the raw XML files in the database are processed.

First step is to make sure you have all the information about all the reading stations, for this we create a table to store the available static information:

CREATE TABLE noaa_weather_obsloc
(
location_name VARCHAR2(100) NOT NULL,
location_code varchar2(5) NOT NULL,
location_state varchar(4) NOT NULL,
location_detail_url sys.uritype,
location_xml_url sys.uritype NOT NULL,
location_rss_url sys.uritype,
location_latitude varchar2(10),
location_longitude varchar2(10),
CONSTRAINT noaa_weather_obsloc_pk PRIMARY KEY
(
location_code
)
ENABLE
);

You can see that all the columns where we will store information about URL's are in the format sys.uritype. URIType provides a means to store a URI in the database, rather than the document referenced by the URI. In the same way that the methods of the XMLType object can be used to insert or create XML document objects, methods of URIType objects can be used to insert or query URIs.

The next step is to insert data into the table which is done for a single record as a example with the statement below. The latitude and longitude are intentionally blank we can fill this information in a later stage because this information will be in the XML file we will be retrieving.

INSERT INTO noaa_weather_obsloc VALUES
('Atka Airport',
'PAAK',
'AK',
sys.UriFactory.getUri('http://weather.noaa.gov/weather/current/PAAK.html'),
sys.UriFactory.getUri('http://www.weather.gov/data/current_obs/PAAK.xml'),
sys.UriFactory.getUri('http://www.weather.gov/data/current_obs/PAAK.rss'),
null,
null
);

As you can see sys.UriFactory.getUri is used to insert the data in the URIType fiels. There are also other methods like sys.httpuritype.createHttpuri however I do prefer in this case to use the sys.UriFactory.getUri methode. According to Oracle this is the difference in use: If the URI starts with http://, then getURI() creates and returns an HTTPUri. If the URI starts with either /oradb/ or /dburi/, then getURI() creates and returns a DBUri. Otherwise, getURI() creates and returns an XDBUri.

So now we have the location data stored in a table together with the remote location of the XML file we can start retrieving the XML file to store it in the database. To store it we however first have to create a table which is capable of storing raw XML data in a XML format. We want to use the XMLTYPE because then we will be able to directly query the XML file in the table when needed. I will not go into details on that in this post, in later posts I will come back on this subject.

CREATE TABLE NOAA_WEATHER_XMLDATA
(
location_code varchar2(5) NOT NULL,
store_timestamp date,
weather_xml_data xmltype
);

Now we have the table noaa_weather_xmldata and we can start retrieving the XML data and store it in this table. The example given below is just a manual insert for a single location. As you can imagine you would like to develop a more flexible way of retrieving and storing the data. You have a list locations you want to monitor in the table noaa_weather_obsloc and you have some time intervals on which new data is retrieved. So, what I will not be handling in this blogpost is how you can automate this process, I will also not be handling the topic on how you can write some more fun queries and functions to help you do that. I do however promiss I will write some things about it in the near (very near) futures. So if you are interested keep a eye on this blog or if the post is already some days old try to search this weblog.

insert into NOAA_WEATHER_XMLDATA values
(
'PAAK',
sysdate,
(select sys.xmltype.createXML(locs.location_xml_url.getClob()) from noaa_weather_obsloc locs where location_code = 'PAAK')
);

As you can see I use a subquery to retrieve the remote XML file in combination with the data in the noaa_weather_obsloc table. the sys.xmltype.createXML is used to create and store the XML file in the XMLTYPE. By putting all those bits together you can now retrieve and store a remote XML file in your Oracle 11G database. As I already promised I will come back in later posts on how I sceduled the retrieval and how to parse and extract XML data in a Oracle database.

2 comments:

Anonymous said...

I've been looking for an example of how to do this for years. This, for me, is gold. Thank you

Anonymous said...

Fascinating. I copied your examples and ran them on my server. Amazingly, they ran on the first try.