Saturday, February 01, 2014

Push API solutions for open and linked data

The internet has been growing at a rapid speed which was most likely not foreseen by a lot of people in the beginning. When Tim Berners-Lee added HTML and HTTP to the internet mix and by doing so opened the possibilities for the world wide web as we know it today he most likely was also not foreseeing what this would become, not even in his most wildest dreams.

As we are now becoming more and more used to the daily influence of the internet and the possibilities that it is offering us we are also trying to find ways on how to improve to the current state of it. We have had the web2.0 revolution which came down to a more social and a more interactive internet where people could more easily add to the content instead of only consuming it as a reader of web-pages. Many see the next big paradigm shift for the internet in the Semantic Web. The idea of the Semantic Web includes a lot of components from open-data, linked data, micro-formats and the internet of things / the internet of everything till big-data concepts. It all comes down to how we think about data and how data will play a role in the next version of the web.

In the below video you see what the vision from Tim Brenners-Lee is on the next version of the internet which he posed during one of his talks at TED.


 What it comes down to is the desire to provide data in an open format. Not necessarily in a nice and flashy website or a closed website however rather in the pure and raw format. This in an open format like XML or any other usable format.

By unlocking this data there will be directly involved the unlocking the potential of a lot of new options. When people do have access to raw data you can create new applications and come to interesting insights. currently the most data is still in closed silo's and not accessible to the public or not accessible in an open way that you can use it in a way you can build new applications around it. When the world starts with adopting the open-data model and the linked data model there will be a sprawl of new options coming to the public.

Even though all the above mentioned things are coming our way and will change the way the internet is used and seen by the public one important architectural part is missing in my opinion, or rather, is not given the attention it deserves. The missing, or rather under appreciated part is the role that push interfaces and API's will play in this new revolution. If we look at many of the new solutions, models and formats they are all based, or primarily based, upon a pull architecture.

If you draw a high level architectural representation of the current state in which applications are commonly are developed you will get something similar like the drawing below.


The custom application could for example be a web application that tracks flight information from a major airport and shows this to visitors of the website. The website itself is represented as the "Presentation layer" in the above diagram. Every time someone is logging in to the website the code will query for the most recent flights in the local database. However, to ensure that the local database contains this information it needs to be filled with data that comes from the "data owner". What you commonly see is that there is a scheduled program to download the basic information from the "data owner" data API and it is stored in the local database. The moment a user clicks on a flight the web application code will do a "online" query to the "data owner" to get the more detailed and latest information about this specific flight. 

In essence there is nothing wrong with this model and it is serving its purpose. However there are more and better solutions to do this which makes the development much more easy and ensures that applications are becoming faster, less prone to issues and less resource intensive on both the "custom application" side as well as the "data owner" side. 

A more favorable way of doing this is by making use of a push API rather then a pull API. In the below diagram the same solution is shown however no making use of a push API initiated by the data owner. 


The implementation of this model makes a lot of sense as it is removing a lot of unneeded communication between the customer application(s) and the data owner. Also missing an update is less likely to happen and new information, in our example flight information, is in real-time available to all who custom applications that have subscribed to the push interface from the data owner. 

In the above implementation, as soon as a new flight is known at the data owner it will send out a message to all the subscribed applications. By doing so the data owner knows that all the other applications are now aware of the new flight and might, if needed, query additional information via the "direct remote data access" method to the more traditional data access API. In some cases the real-time push API will remove the need for a more traditional API however in most cases there will still be a need for this. 

The hurdle for implementing this model is the need to develop standardization for communication between the data owner and the custom applications. Ideally the communication between both parties is arranged based upon the HTTP protocol and JSON or XML messages are exchanged between the 2 parties as shown in the image below. 


From a technical point of view it is not hard to implement the communication from both the "data owner" side for a push data API nor is it for the receiving side complicated to create a data update receiver who will receive and process the data send by the data owner. 

The biggest hurdle in this process is to ensure that the construction of the message is correct and contains all the needed information that the receiver of the receivers might need and that this is done in a functional format that is not only for a single receiving application usable but for a large set of external applications. If we take for example a look at the work done in the field of ontology for annotating offerings and other aspects of e-commerce on the Web by GoodRelations or the work done by schema.org you will see that developing a good standard is something that is not to be thought of lightly. However the below illustration might show the need for such standardization of message formats on both a more functional level and a technical level.


Commonly a custom application, for example a web portal showing flight information, is not making use of a single source. If, for example, you like to build a web portal showing all the incoming and outgoing flights in real-time or near real-time for the city of London you have to work with 11 different data providers. London is having ten "normal" airports, Luton, Stansted, Northolt, Stapleford, Heathrow, London city, Biggin Hill, Blackbushe, Farnborough and Gatwick. They also have a helicopter only airport named Battersea.

From a technical point of view it would be a great benefit if all those airports do make use of the same way of providing a push API interface to all push the information in the same manner. This way the developers of the custom applications that will connect to it will only have to develop a single data update receiver and simply subscribe to all the 11 airport push interfaces instead of developing 11 different receivers. Next to this the data providers, in this case the airports, can make use of an already developed standard data format and can potentially decide that they will develop a push interface together so they can split costs.

From a functional point of view a standard makes sense because you will know exactly what data you will get and which format. From a functional point of view this is not the technical format but rather the functional format. For example, if the data owner is also sending out weather information per flight, it would be good if all data owners do this in the same manner and all the information is in the same functional format. Meaning, temperature is for all data owners in Celsius or it is for all data owners, who comply to the standard, in Fahrenheit. This way you always know from a functional point of view what the data means and how to interpret this.

In conclusion, push API implementations will fuel the next generation of the internet in my mind and they will ensure that the web becomes (even) more real-time and data rich. The flow of data will help sprawl new initiatives and will help create new companies and great projects. For this however there will be a need to ensure that standards will be developed, not only on a technical implementation level, however even more important on a functional level where the standards will define which information in which format will be provided to the data consumers. 

No comments: