Wednesday, January 30, 2013

Big Data is sometimes Fast Data

One of the current hot topics within IT is big data. When we look at big data normally it is shown as the diagram below where we talk about volume, velocity, variety and value.  

When talking about big data the most common areas are volume, variety and the potential value it can have for a business. The thing that is not been discussed that much as it should be is velocity. We do speak about the speed which is needed to process a high volume of data, what is not discussed is the velocity in which data is created and is coming in to your application.

When we talk about velocity we also have to talk about the data lifecycle and the lifetime of data value. If we take for example twitter, twitter is generating a lot of data, when you connect to the twitter firehose you will get a lot of data into your application.

Storing the data is within your application is volume, however when we talk about the capability to catch up with the stream coming in we are talking about velocity.  When talking about velocity we are not talking about big data as such… the new term some people like to introduce is fast data. Meaning, when you hear the term fast data we are talking about the velocity part of the big data way of thinking. 

Monday, January 28, 2013

Oracle Active Session History

Within Oracle Enterprise Manager you did have an option to monitor your database top activities via a graphs. Oracle has renamed the "top activities" into Active Session History within the new release of Oracle Enterprise Manager. Also some enhancements are made to the new ASH in comparison with the "top activities" options.

The Active Session History for Oracle databases is providing you with a great tool to dive into the details of your database performance and will turn out to be a great tool to investigate user claims of bad performance. You can drill into all details for a certain moment in time to locate the reason for the bad performance of your database system.

In the below video you can get a quick overview of what the options are and this should give you a starting point to look for yourself in your own Oracle Enterprise Manager installation to find the added value of ASH.

Saturday, January 26, 2013

Oracle LRM-00123: invalid character 0 found in the input file

Normally when you want to start your oracle database you simply use the startup command. However in some cases you do want to start your database with some specific parameters set and you are stating them in your pfile. Meaning you initiate some of your database parameters from your pfile.

When starting your database with a pfile you can use the following command while connected via sqlplus as sysdba:

startup pfile=/u01/app/oracle/product/11.2.0/db_1/dbs/initDB11G.ora

this will result in something like below

SQL> startup pfile=/u01/app/oracle/product/11.2.0/db_1/dbs/initDB11G.ora
ORACLE instance started.

Total System Global Area 2142715904 bytes
Fixed Size 1337688 bytes
Variable Size 671090344 bytes
Database Buffers 1459617792 bytes
Redo Buffers 10670080 bytes
Database mounted.
Database opened.

In some cases you might run into an error when trying to do so. Some of the common error messages (error message combinations are below)

LRM-00123: invalid character 0 found in the input file
ORA-01078: failure in processing system parameters

The combination of a LRM-00123 and the ORA-01078 indicates most likely that you have specified the spfile in your startup command instead of your pfile. 

LRM-00109: could not open parameter file '/u01/app/oracle/.........'
ORA-01078: failure in processing system parameters

Combination of LRM-00109 and ORA-01078 indicates that you most likely have made a typo in the path to your pfile.

Wednesday, January 23, 2013

[INS-06101] IP address of localhost could not be determined.

When installing an Oracle database you might come across the following error; "". This error message looks quite simple to resolve and when you do a search on the web you will find numerous posts and discussions that go into detail on this subject. Most of the discussions go around the fact that you will have to check your /etc/hosts file for an entry of localhost.

However, in some cases there are reports of people who do have a line stating localhost and they can resolve localhost. The issue with this error message is that it is not precise. What it is doing is that it is looking to resolve your localhost name and NOT localhost. Meaning, if you named your machine for example testserver the installation process is not trying to resolve localhost it is trying to resolve testserver.

meaning, if you try to debug it and do a ping localhost you will get a positive result, however if you do a ping testserver you most likely will not get a positive result. When encountering a INS-06101 error this might be the first thing to check,

Tuesday, January 22, 2013

Number to binary conversion Oracle

In some cases when developing PL/SQL code within an Oracle database (and within a lot of other programming languages) you might want to refer to a programming style in which you use the binary number format (base-2).

In mathematics and computer science, the binary numeral system, or base-2 numeral system, represents numeric values using two symbols: 0 and 1. More specifically, the usual base-2 system is a positional notation with a radix of 2. Numbers represented in this system are commonly called binary numbers. Because of its straightforward implementation in digital electronic circuitry using logic gates, the binary system is used internally by almost all modern computers and computer-based devices such as mobile phones.

When working this way it can be handy to have a small function ready which converts a number to the binary numeral system. Some years ago Tom Kyte has developed this small function that can come in handy.

create or replace
FUNCTION num_to_bin( i_num IN PLS_INTEGER )
       l_Num      PLS_INTEGER;
       l_bit      PLS_INTEGER;
       l_binary   VARCHAR2(128);
       l_num := i_num;
      WHILE l_num > 1 LOOP
         l_bit := MOD(l_num,2);
         l_binary := TO_CHAR(l_bit)||l_binary;
         l_num := FLOOR(l_num / 2);
      END LOOP;
      IF l_num = 1 THEN
          l_binary := '1'||l_binary;
      END IF;
      RETURN l_binary;
   END num_to_bin;

Tuesday, January 15, 2013

Oracle Database login audit

In some cases you are required as an Oracle database administrator to provide audit information on the usage of your database. This can include providing information on who logged on to the database both successful and unsuccessful. Also in some cases you might be required to help users to find out why they are unable to connect to your database. In those cases it is good to know that Oracle is creating an audit trail for all login attempts both successful and unsuccessful. You can query this audit trail via DBA_AUDIT_SESSION.

when you query DBA_AUDIT_SESSION you will note that you can view all moments a session is doing a LOGON or LOGOFF action. This can be seen in the action_name column. In most cases you are not that interested however in all connections that where successful, commonly the unsuccessful attempts are much more of interest to you.

Every unsuccessful attempt has a return code which is not equal to 0. the return code 0 is a successful attempt to create a connection. If you wan to query all unsuccessful ones the query below might be of interest to you. This is also providing you a not more information then only the returncode, it is telling you the meaning of the code. Next to this it is showing you the username used, the username of the person who is attempting to connect on his local workstation, the name of the local workstation used to try and establish a connection and a timestamp (of the database server)

      decode( returncode
            , 00911, 'Invalid Character'
            , 00988, 'Missing or invalid password(s).'
            , 01004, 'Logon denied.'
            , 01005, 'Null Password'
            , 01017, 'Invalid username/password.'
            , 01031, 'No Privilege'
            , 01045, 'User string lacks CREATE SESSION privilege; logon denied.'
            , 01918, 'No Such UserID'
            , 01920, 'No Such Role'
            , 09911, 'Incorrect user password.'
            , 28000, 'The account is locked.'
            , 28001, 'Your password has expired.'
            , 28002, 'Your account will expire soon; change your password now.'
            , 28003, 'The password is not complex enough.'
            , 28007, 'Password cannot be reused.'
            , 28008, 'Invalid old password.'
            , 28009, 'Connection to sys should be as sysdba or sysoper.'
            , 28011, 'Your account will expire soon; change your password now.'
            , 28221, 'The original password was not supplied.') as returncode
     returncode != 0 
        extended_timestamp desc

Saturday, January 12, 2013

When machine learning goes wrong

Human emotions and understanding human emotions in text and speech is one of the most challenging fields in computer science at the moment. The ability to understand what a human means by a certain text or a certain sentence is challenging. Humans are able to pick up the subtle things like sarcasm or humor while developing an algorithm to understand this has proven to be a challenge. Being able to do so, having a system that understands this will provide the option to have a much more human like interaction with technology and will provide options to have machines make decisions based upon feelings of real people.

One other part that is difficult is slang, street language, the language used by certain groups of people and the meaning that it has. This was already adressed for a bit in the video from Michael Karasick, VP and Director at IBM research, in my post big data sentiment analysis.

Now IBM has tried to have Watson understand slang language. Watson is an artificial intelligence computer system capable of answering questions posed in natural language, developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci. Watson was named after IBM's first president, Thomas J. Watson. The machine was specifically developed to answer questions on the quiz show Jeopardy!. In 2011, Watson competed on Jeopardy against former winners Brad Rutter, and Ken Jennings. Watson received the first prize of $1 million.

"In order to boost Watson’s aptitude with everyday lingo, its software engineers began teaching it the Urban Dictionary, that massive online depository of current slang. There was just one small problem: Watson couldn’t differentiate “clean” terms from profanity. It’s one thing when a supercomputer uses “OMG,” but quite another when it starts cursing like a Quentin Tarantino character—hopefully not in the middle of a “Jeopardy” episode, although that could prove memorable for everyone involved." From slashdot

This shows that understanding slang and everyday lingo is difficult however even more difficult is having a algorithm to understand when it is appropriate to use it and when not. This research field will be very interesting in the upcoming years and will prove to be challenging however, when mastered it can be a big addition to how we communicate with technology. 

Friday, January 11, 2013

Oracle SQL EAN-13 check digit

When working with products in your company it is of importance that in your database you have the ability to uniquely identify a product by a product ID. A lot of companies do maintain a product ID specially used within that company. The products ID's are used in warehousing, sales, purchasing and finance. Meaning, the ID is used for a wide range of actions in your systems and  business processes. What you see is that a lot of companies do maintain their own product ID however link them to the standard product numbering system EAN. Reason for this is that EAN numbers are used across companies to identify a product uniquely.

An EAN-13 barcode (originally European Article Number, but now renamed International Article Number even though the abbreviation EAN has been retained) is a 13 digit (12 data and 1 check) barcoding standard which is a superset of the original 12-digit Universal Product Code (UPC) system developed in the United States. The EAN-13 barcode is defined by the standards organization GS1.

The EAN-13 barcodes are used worldwide for marking products often sold at retail point of sale. The numbers encoded in EAN-13 bar codes are product identification numbers, which are also called Japanese Article Number (JAN) in Japan. All the numbers encoded in UPC and EAN barcodes are known as Global Trade Item Numbers (GTIN), and they can be encoded in other GS1 barcodes.

The less commonly used EAN-8 barcodes are used also for marking retail goods; however, they are usually reserved for smaller items, for example confectionery.

2-digit (EAN 2) and 5-digit (EAN 5) supplemental barcodes may be added for a total of 14 or 17 data digits. These are generally used for periodicals (to indicate the current year's issue number), or books and weighed products like food (to indicate the manufacturer suggested retail price or MSRP), respectively.

Making sure that the EAN number you entered is a correct EAN number is vital to ensure that your database will only hold valid EAN numbers and no mistakes are made during the process of entering the number. The EAN numbering holds a mechanism to check if the number is valid. A EAN-13 number consists out of 12 data and 1 check digits. You can use the check digit to verify that the number entered is a valid EAN-13 number by using the check-digit algorithm.

To do the check you have to do a "matrix" calculation on the data digits from the EAN-13 number which are the first 12 digits. In this example we take the EAN-13 number for a MacBook Air which is 0885909592449.

Every position in the EAN-13 number has a weight associated with it, either a 1 or a 3. You will have to multiply every value of the position with the weight of the position and sum all the results up to one value.

In our case this is the final value 111. Take the nearest (upper) multiple of 10 that is equal or higher than the sum, the sum is in our case 111, meaning the nearest (upper) multiple of 10 is 120. Substract the sum from the nearest (upper) multiple of 10 and this should be equal to your check digit which in our case is 9. 120 - 111 = 9. This means your EAN-13 number you have entered is a valid EAN-13 number.

Now that we understand this logic there might be the need to incorporate this into your database so you can use it as a check. Thanks to Asif Momen and the discussion on you can use the function below to do so:

  FUNCTION ean_13(
      P_Str IN VARCHAR2)
    l_Sum      NUMBER;
    l_Multiple NUMBER;
    FOR I IN 1..12
      IF mod(i,2)   = 0 THEN
        l_Multiple := 3;
        l_Multiple := 1;
      END IF;
      l_Sum := NVL(l_Sum, 0) + SUBSTR(P_Str, i, 1) * l_Multiple;
    IF 10 - Mod(l_Sum, 10) = SUBSTR(P_Str, 13) THEN
    END IF;

Wednesday, January 09, 2013

Oracle SQL automatic ID generation

When developing an application at one point you most likely will come to the subject of providing ID's. When you store a record in the database you most likely will like to have an ID for this record so you can uniquely identify it and make relations to records in other tables based upon this ID.

When creating the design for your application and associated database model you can select 2 locations to generate the ID used for a specific record. One option is to have your application query the next number in a sequence in the database and use this the second is to have the database populate this when you enter the record. Depending on some criteria you can select both options however I am personally in factor of having your database handle this. As in most cases an ID is directly influencing your data model integrity I do favour having this done here however in some cases there is a good reason to deviate from this.

Considering that you will use the database to populate your ID's for you, a coupe of things are needed: first we need the table where we will store the data, secondly we need a sequence to give you the next number and we need a trigger on the database table. The trigger will be used to query the next number from the sequence and use this number during the insert in the table. When you are using Oracle SQL Developer you can use the "Column Sequences" option to have this all done for you. When using this option a trigger and a sequence will be created and associated with the column you want to populate with a ID. This is shown in the example below:

Looking at the code generated you will notice that a trigger and a sequence are generated. In the case of the example above we have a new trigger named AUTO_ID_EXAMPLE_0_TRG
 and a sequence named AUTO_ID_EXAMPLE_0_SEQ

Code for the trigger:
create or replace
    END IF;

Code for the sequence:
MINVALUE 1 MAXVALUE 9999999999999999999999999999 

Now we can test to see if there is indeed an automatic ID generated when we insert records into the table. We insert the following records as a test, do note we only insert data in the example_text column and not in the example_id column.
insert into auto_id_example_0 (example_text) VALUES ('TEST1');
insert into auto_id_example_0 (example_text) VALUES ('TEST2');
insert into auto_id_example_0 (example_text) VALUES ('TEST3');

When we query the table we do get the following result:
SELECT * FROM auto_id_example_0;

EXAMPLE_ID           EXAMPLE_TEXT       
-------------------- --------------------
1                    TEST1                
2                    TEST2                
3                    TEST3    

having a meaningless number as an ID can be very good and is commonly used as a database internal reference however when we are building an application where the number is used as a reference towards humans commonly the number is more meaningful and/or it is changed to be more human friendly. For example, if you check order numbers or ticket numbers it is commonly seen that you have some padding on the left side. Meaning, you will not see the number 1024 for example, you will see more likely on your order the number 00001024 or even more meaning where you have for example ORD_UTR_00001024 where for example ORD is showing we are talking about an order and UTR is showing that the order is generated in a shop with the code UTR.. An automatically generated ID can be anything and is not limited to a number, you can code all meaning you like in the trigger.

As an example for the 00001024 example we could construct the following trigger instead;
create or replace
    END IF;

Within the trigger we do use the LPAD function from Oracle SQL to left-pad the number generated by the sequence with a 0 until we reach a length of 8 characters. If we now insert a new record the next example_id will be something like 00000004 (or 00001024).

Now we have the situation where we would like to give some more meaning to the number and we would like to have something like ORD_UTR_00001024. In the below example we have added some additions to the trigger. A subquery is introduced to query the UTR part and we have added ORD_ in front. In the below example the subquery is not making any sense because it will always return UTR however you can introduce your own business logic to it.
create or replace
      SELECT 'ORD_'
            || lpad(AUTO_ID_EXAMPLE_0_SEQ.NEXTVAL,8,0)
      FROM dual;
    END IF;

By using a more sophisticated trigger you can ensure that your automatically generated number by the sequence is more meaningful then only a number. As you will notice when developing applications it is quite a common request from business owners that ID's like order numbers or ticket numbers are not simply numbers however have some more meaning. When looking for a way to build this you can opt to build this into you application side however as you can see there are a lot of options in the database and to my personal believe this is the place where this should be done by default with the exception if there is a very good reason not to do this.

Monday, January 07, 2013

Oracle fusion IDM domain architecture

Oracle fusion has a complete new architecture which differs from previous Oracle products on quite some ways. One of the things that is different to products like Oracle e-Business suite is for example the way security is handled. With Oracle e-Business Suite you would have the option to add identity management and single sign on solutions to your solution stack. Within the Fusion stack this is no longer an option and is this a integrated part of the solution stack itself.

If we take for example Oracle fusion CRM a good place to start understanding the architecture of the solution stack is the Oracle document; Oracle Fusion Applications Customer Relationship Management Enterprise Deployment Guide. This document is offering some blueprints on how you should implement Oracle Fusion CRM. Looking at the guide you will note that the stack fusion stack is split up into several domains, namely the CRM domain, Common domain, IC domain, SCM domain, HCM domain, FIN domain and the BI domain.

Not noted in this list nor in the image above is the IDM (Identity Management) domain which is a pre-req for getting the domains implemented. As you can see in the image below you have a CRM domain which consists out of 2 webhost servers, 2 crmshost servers and a 2 node Oracle Database RAC cluster. However, on the left side you can note that we have a IDM domain which also needs to be connected to be able to use the Oracle Fusion CRM part.

The IDM domain holds a couple of primary components as can be seen in the diagram below and is broken down into webhost, OAM host, OID host and a database server. Best practice states that you have 3 firewalls in place. Firewall 1 provides security against the outside world secondly we have 2 firewalls to create 2 DMZ's one DMZ holds the webhost and one DMZ holds the OAM host.

The webhost in Oracle fusion holds the Oracle HTTP server (OHS). The Oracle HTTP server is based upon the Apache web server. This installation of the Oracle HTTP server holds webgate for OAM (Oracle Access manager) and the MOD_WL_OHS module for the Oracle HTTP server.

A WebGate is a Web server plug-in that is shipped out-of-the-box with Oracle Access Manager. The WebGate intercepts HTTP requests from users for Web resources and forwards them to the Access Server for authentication and authorization. An AccessGate is an Oracle Access Manager access client that processes requests for Web and non-Web resources and is developed using the Software Developer Kit. The terms AccessGate and WebGate may be used interchangeably. Before you can install a WebGate, you must associate it with an Access Server.

MOD_WL_OHS is a module in Oracle HTTP Server 11g which allows requests to be proxied from Oracle HTTP Server (OHS) to Oracle WebLogic Server.

Communication between the webhost and the OAMhost is primarily done via the HTTP protocol and 2 specific Oracle protocols, the OAP and OIP protocol. The OAP Oracle Access Protocol enables communication between Access System components (for example, Access Manager server, WebGate) during user authentication and authorization. This protocol was formerly known as NetPoint Access Protocol (NAP) or COREid Access Protocol. Also the OIP, Oracle Identity Protocol is used for communication between the webhost and OAMhost.

The OAMHost is primarily used to host the Access Server supported by JRF/OPSS. The Access Server is a stand-alone component that provides dynamic policy evaluation services for both Web-based and non-Web resources and applications. The Access Server receives requests from an access client, either a WebGate or a custom AccessGate; queries your LDAP directory for authentication, authorization, and auditing rules; and validates credentials, authorizes users, and manages user sessions for Oracle Access Manager.

Oracle Platform Security Services comprises Oracle WebLogic Server's internal security framework and Oracle's security framework (referred to as Oracle Platform Security). OPSS delivers security as a service within a comprehensive, standards-based security framework. OPSS provides an abstraction layer in the form of standards-based application programming interfaces (APIs) that insulate developers from security and identity management implementation details. With OPSS, developers do not have to know the details of cryptographic key management or interfaces with user repositories and other identity management infrastructures. By leveraging OPSS, in-house developed applications, third-party applications, and integrated applications all benefit from the same uniform security, identity management, and audit services across the enterprise.

The OAMHost communicates with the OIDHost via OID (Oracle Internet Directory) and OVD (Oracle Virtual Directory) to request the information needed for authentication and authorization.

The OIDHost runs an instance of Oracle Internet Directory. Oracle Internet Directory is a system component. That is, it is a manageable process that is not an Oracle WebLogic Server. System components can use the WebLogic Administrative Domain for management services, including Oracle Enterprise Manager Fusion Middleware Control, Audit Framework, configuration management through MBeans and Secure Sockets Layer and Wallet Management. The Oracle WebLogic Server Administration Server controls Oracle Internet Directory and other system components through OPMN. Oracle Internet Directory itself is a C-based process. Its only run time dependency is the Oracle Database. To be managed by the Oracle Fusion Middleware management framework, Oracle Internet Directory must register itself with a local or a remote Oracle WebLogic Server administration domain during installation or from the command line after installation. Therefore, an Oracle Internet Directory 11g installation requires either a local or a remote installation of Oracle WebLogic Server. Also, the Directory Management user interface, ODSM, is a Java component deployed on Oracle WebLogic Server. If you must manage Oracle Internet Directory in your deployment using only command-line tools and a remote ODSM, there is also an option to install and configure Oracle Internet Directory without registering with a Oracle WebLogic Server Domain.

the DBHost holds the data structure used by OID and is nothing more then a datastore. You can create a Oracle database RAC cluster to ensure a more high-available architecture however this is not a strict requirement. 

Saturday, January 05, 2013

Oracle view usage for multi layer application security

When designing a solution which includes a database you commonly have a couple of expertise's involved in the architecture of the solution. Sometimes combined into one person and sometimes spread accords multiple people. You will commonly have an application architect and a data-model architect. The application architect will be responsible for creation the application side of things and the data-model architect will be responsible for creating the database side of the solution. In small projects you will most likely combine the roles and in more complex projects you like to have them separated. Both architects will have to consider security and will have to provide a solution that is as secure as possible. Having security implemented on several levels, both the application and the database design in good practice and will strengthen your total design.

To be able to create a secure solution both architects will have to work together and will have to follow some guidelines. One of the guidelines that can be considered good practice is the use of views where possible. Where possible is depending on the obvious fact that the application will only have to be able to read from the data presented by the view and there is a performance aspect to it. To find out if there are performance differences that are critical you will have to do some benchmark testing with a view based and a direct query based approach. The results will have to be taken into account when deciding to use a direct query on a table or to use a view. Not taking the performance into account the following approach might be a good approach to harden your application.

for example, lets state you have a table ORDER_ENTRY_B2C and you have a table ORDERS_SHIP_COMPLETE. The application you are developing is used by your salesforce to enter new orders when the make a sale in the business to consumer market segment. Secondly they will have to have the ability to check if a order has been shipped already. Two things are to be noted here. The application used will never update or insert anything in the table ORDERS_SHIP_COMPLETE and this table contains both B2C as well as B2B orders.

The most easy to implement way is to provide access to the entire database for this application as shown below. This will ensure that the application can read and write to all tables possibly limiting the user database account from reading and writing to the table for B2B orders. This example is shown below.

This will work however there are some flaws with it and some things to keep in mind.  You will have to only grant read access for the application user to ORDERS_SHIP_COMPLETE table. The flaw in this design is that the user is only limited by the application layer from reading information about B2B orders, the application is intended only for B2C orders however is on a database level not limited from reading the B2B orders. A better and more secure model is to ensure that the application will only read the shipment information from a view which is limited to only show the orders for B2C orders. Shown below you can see how we have segregated the data model in 3 sections. One section for the B2B, one for B2C and one to hold the information about completed shipments. By defining your view in such a way that it is filtering only the B2B or B2C orders for the specific section you can ensure that not only access to information is secured on a application level however that it is also secured on a database level.

By having checks and limitations on multiple levels you are taking action against the possibility that someone is gaining access to data they are not privilege to. It is however very often seen that such a security model is only applied in the application. As soon as someone is able to retrieve the information used to login to the database they are able to view a lot more information than they are supposed to. Having a multi layer security approach is always the best option to go for even if this includes setting a lot more grants on database objects and building additional views. What you have to keep in mind is possible performance issue that might arise however commonly this is not an issue or it can be resolved by using a different type of view.