Wednesday, July 25, 2012

Oracle web Cache routing only mode

Oracle Web Cache provides not only the option to act as a web cache server, it is also providing you the option to do load balancing over multiple nodes. Having a web cache server balance load over multiple nodes is a logical thing to do as in most cases where you will use web cache you will be serving most likely a high end website which needs to be up and performing all the time.

A less know option from the Oracle Web Cache server is that it can also act only a load balancer and ignoring the web cache part of the solution. In some cases you do explicitly not want to use the web cache however does want to use the load balance options.

To be able to set the web cache in load balance only mode you will have to change the configuration of your webcache.xml file. The .xml configuration file can be located at:

(UNIX) ORACLE_INSTANCE//config/WebCache/webcache_name
(Windows) ORACLE_INSTANCE\\config\WebCache\webcache_name

Within the configuration you will have to set the ROUTINGONLY option to yes. After setting this you will have to restart the web cache server to activate the new settings.

INSTANCENAME="instance_name" COMPONENTNAME="component_name"  ORACLEINSTANCE="instance" HOSTNAME="web_cache_host_name" 
ORACLEHOME="directory" NAME="web_cache_name" 

When you have done so you can check in the web cache manager if the routingonly part is set. You should see the below message in the administration part informing you that you are running in routing only mode.

pharmaceutical map reduce

The website is diving into the results or a report recently published by Oracle on how the pharmaceutical industry is working with data and what the current bottlenecks are.

Around 93% said their organisation is collecting and managing more business information today than two years ago, by an average of 78% more. However 29% of life sciences executives give their company a 'D' or 'F' in preparedness to manage the data deluge; only 10% of the latter group give their organisation an 'A'.

Life sciences respondents say they are unable to realise, on average, 20% of additional revenue per year, translating to about $98.5 million per year, "by not being able to fully leverage the information they collect." Of those surveyed from the pharma/biotech sector, 30% are "most frustrated with their inability to give business managers access to the information they need without relying on the IT team".

Interesting to see is that there is an increase in the amount of data collected and the data potentially available to analysts and business users. Secondly it is interesting to see that there is frustration within the companies that they need IT to be involved when they need to use the collected data. What you see commonly in most companies is that the business has a need for a report or analysis and / or a report and for this they turn to IT. The request for change will be added to the list of work for the developers within IT and they will build and deliver the new reporting functionality to the user who requested it.

When there is an urgent business need to have a new report created this can be frustrating that there is a lead time and the business has to wait until the report is generated. Users in general would like to have a framework where they can quickly and easily build their own reporting and by doing so no longer be depending on the IT department.

Such tools and platforms are available within the market however not commonly deployed due to a couple of reasons.

a) The use of such tooling is blocked by the IT department as they are afraid that it will decrease their value within the company as a department

b) IT claims that the reports and the resulting queries build by the business users are sub-optimal and could cause performance issues

c) The use of the “self build” reporting tool is considered to have a steep learning curve by some people and due to this tooling is not deployed.

Point C is something you can discuss and will depend on the level of employees and their feeling with IT. Also it depends on the tool(s) selected if this is indeed true. However point A and point B can be tackled and should not be holding your company back from enabling users to build their own reports.

Reason A is something that will have to be tackled in the political arena of your company, if management backing is available the IT management should be persuaded to provide the needed support in getting the project started. This will inevitably lead in a decrease of work for the IT department in the form of building new reports, however will increase the need to support the new platforms and can open a whole new area of services for IT. This new area can include also building the more complex and challenging reports.

Reason B is something that is heavily depending on a couple of factors. One of them is how much understanding will the users have about what their questions to the system will do performance wise and how well are they trained in using the tool in a correct manner. Secondly it will depend on the selected tool, how “smart” will the tool create the queries based upon what the user is building with a drag and drop interface. One last factor will be the size of the data you will have available. If you have to query a couple terabytes this will be faster than when you have to query multi petabytes of data.

To remove the reason not to deploy such tools as stated in B involves a more detailed thought and plan. It will depend partially on the tool selection however it will also depend on how you will organize your data. When we look at the rate in which companies are gathering data you can state that for a large number of companies it would be beneficial to look at solutions in the field of big-data. Solutions developed and deployed in the field of big-data look at a different way, a more distributed way, of storing and handling data. If you take the design of your hardware and the way you access data and compute it into consideration you can deploy a platform which is ideal for users who deploy their own written reports and queries.

In a traditional setup as shown below you will store all your data in a single data source and you will have a single node which will take care of the computing of the results and to communicate with the users. For small sets of data this is a way that will work, however, when working with large sets of data this can become problematic as the available resources to the computing node can become a bottleneck. When lost of users deploy their custom written queries on this performance can drop to a no longer accepted level. Due to the nature of the setup scaling out in a vertical way is not an option and you can only do horizontal scaling by adding more CPU’s to your computing node.

In a more parallel way of doing things and within the thinking of how to handle big data you can create a cluster of smaller sub-sets of your data and dedicate a computing node to each set of data. When a user starts a request all nodes will be working a small amount of time on this request and send back the result a node who will collect all the answers and provide it in a consolidated way to the end user. This way of working is providing you a faster way of computing your results and provides at the same time the option to do horizontal scaling by adding more computing and data nodes when your data grows or when the need for more performance arises.

Popular ways of deploying such a strategy is by deploying a implementation of the map/reduce programming paradigm. Companies like for example Oracle and Pentaho are adopting the map/reduce paradigm by implementing hooks to the Hadoop framework who will do this for you.

When selecting a tool that will enable your users to build their own reports and queries it is advisable to look at how this tool is using the map/reduce programming paradigm and how scalable it is for data growth. By taking this into considerations you can safeguard the usability of the tooling for the future when data is growing and the demand on the system is growing.

Friday, July 20, 2012

Infection pattern analysis in Oracle

Quite recently some customers have asked me a somewhat similar question. They all have applications residing in an Oracle database. The application is a standard application build by a third party. During the past years they have been building additions to those applications. Now they come to a moment in time that the third party vendor is providing upgrades to the software. The update will involve a whole set of newly developed or already existing however changed database objects.

The past couple of years they have been building extensions which have caused a version lock-in. Question that is now popping up is how intertwined is are the custom build extensions which the standard code and with the parts of the code that will change. To give a clear answer on this you will have to check the dependencies that are within the application (including extensions).

When unraveling such a puzzle you will have to look at infection path analysis. For example we have the below application which includes an extension. A, B, C, D & E are the standard components of the application. 1, 2, 3 & 4 are custom objects build as an extension. You can see the dependencies towards standard components visualized with the lines coming from the custom objects towards the standard objects.

In the above example all the red objects will change during an upgrade. This means that the customized objects you should look and, based upon this view, are object 1 (depending on changing object A) and object 3 (depending on changing object A & D). 

This is a first generation infection path which only shows you the direct relations between custom objects and changing objects. You should take this a step deeper. In the below example we have gone a dependency level deeper and you can see that 4 is depending on 3 and 1 is depending on 2.

As we potentially have to change 3 to cope with the changes in A & D we also have to look at the potential code change of object 3. And if 3 is changed this might affect object 4.

Object 1 is depending on object 2 and in this level of the infection 2 is not changed so this not changing anything on the list of objects to check.

With every level you go deeper into an infection pattern you will see more objects are potentially “infected” by the change and should have a proper look at by a developer. You can also create “resistant islands “which are in no way affected by change. Potentially you can have your entire database analyzed with a proper infection pattern algorithm. If this is wise is for debate because it can cloud the usability and the correctness of your outcome. In general I do tend to think a 3 level of 4 level deep infection pattern analysis is proper to be used within Oracle databases.

When you are trying to develop your own infection pattern algorithm for a database it is good to have a look at a couple of things.

Within the database dependencies are stored in sys.dependency$ and more information about the objects are stored in dba_objects. Combining the 2 in a query will give you a headstart in building your algorithm. As a simple example if I wanted to know something about object 123321 I could fire of the query;

      objects2.owner as dep_owner,
      objects2.object_name as dep_object_name,
      objects2.object_type as dep_object_type
      sys.dependency$ depen,
      dba_objects objects2
      depen.p_obj# = 123321
      and objects2.owner not like 'SYS'
      and objects2.object_type not in ('SYNONYM')
      and objects2.object_id = depen.d_obj#
order by 1, 2, 3;

If you build this into a more profound PL/SQL script and you would add a list of changing components to start with you could create a dependency list. The easiest way is to output it to a flat file and provide it to your developers and consultants as a reference of things to look into as they are most likely to be hit by the upgrade.

However, as we are humans and humans are somewhat visually and like to see things in pictures a great way to do this is not to output it to a flat text file however build as a secondary output a file that you can parse via DOT. DOT is a markup language used to plot relation diagrams as the ones above used in the examples. As DOT is a free and opensource way of doing this and it saves you hours and hours of building diagrams in MS Visio I do think it is worth looking into the DOT Language documentation.