Monday, April 14, 2014

Oracle Big Data Appliance node layout

The Oracle Big Data Appliance ships in a full rack configuration with 18 compute nodes that you can use. Each node is a Sun X4-2L or a X3-2L server depending on the fact if you purchased a X4-2 or X3-2 Big data appliance. Both models do however provide you with 18 compute nodes.  The below image shows the rack layout of both the X3-2 and X4-2 rack. Important to note is that the number of servers is done bottom up. A starter rack has only 6 compute nodes and you can expand the rack with in-rack expansions of 6 nodes. Meaning, you can grow from 6 to 12 to a full rack as your capacity needs grow.

In every setup, regardless of the fact if you have a starter rack, a full rack or a rack extended with a single in-rack expansions of 6 nodes (making it a 12 node cluster), 1, 2 and 3 do have a special role. As we go bottom up, starting with node 1 we have the following software running on the nodes:

Node 1:
First NameNode
Failover controller
Puppet master
Puppet agent
NoSQL database

Node 2:
Second NameNode
Failover controller
MySQL backup server
NoSQL Admin
Puppet agent

Node 3:
Job Tracker
ODI Agent
MySQL primary server
Puppet Agent

Node 4 – till 6/12/18
Cloudera manager Agent
Puppet Agent

Understanding what runs where is of vital importance when you are working with an Oracle Big Data appliance. It helps you understand what parts can be brought down without having to much effect on the system and which parts you should be more careful about. As you can see from the above list there are some parts that are made high available and there are some parts that will result in loss of service when brought down for maintenance. 

Friday, April 04, 2014

Oracle ZFS storage appliance configuration

Oracle is incorporating its ZFS storage appliances in more and more engineered systems. Even if you are not a pure storage administrator or consultant and more into Oracle software and engineered systems it is good to have some basic understanding of how a ZFS storage appliance is working and what you can potentially do with it to enhance your solution and provide a better performing and maintainable solution.

The issue with hardware based solutions commonly is that you cannot just play with it without ordering the device. This is holding a lot of people back from gaining experience before they are getting involved in a project where this specific hardware solution is used. The Oracle ZFS storage appliance is bit different, reason for this is that Oracle has decided to create a virtual appliance you can use to play with the solution. The virtual appliance provides you all the options to test and work with the storage appliance in a Oracle VirtualBox image in the same manner as you would do when you would have purchased the real physical hardware.

Oracle ZFS storage appliance

The virtual Oracle ZFS storage appliance can be downloaded from the Oracle site. After unpacking and importing it into Oracle Virtualbox you will be up and running in  matter of minutes. One thing is good to keep in mind, this is a system to play around with, it is not intended to be used in any serious solution except playing and testing. When the initial boot has been completed you will notice that the welcome screen of the host informs you where you can point your browser to.

A minimal setup is done during the initial boot process, the full configuration and setup will be done via the browser. This is exactly the same manner as you would do when you use the real physical ZFS appliance in your datacenter.The primary things you need to completed are during the inital setup are:
  • Host Name
  • DNS Domain
  • Default Router
  • DNS server
  • Password

After completing those steps you will be pointed to a https://{ip}:215 address which will be the main URL for maintaining the ZFS storage appliance, or rather the ZFS storage appliance simulator in this case.

Oracle ZFS configuration STEP 1:
Before we can configure the machine you will have to login, for this you can use the root account in combination with the password you entered during the initial CLI configuration.

After login you will be shown the shown the welcome screen which again tells you that this is only to be used for demonstration purposes. You can use this also for some extreme small testing however, remember this system is not a solution for a storage need and just to play with.

Oracle ZFS configuration STEP 2:
The next step is to ensure you have all the correct networking in place to be able to use our ZFS appliance in the right manner within your corporate infrastructure.

Oracle ZFS appliance network configration

As you can see from the above screenshot there is a datalink and an interface already configured however still stating "untitled" which is giving a hint that you need to do some configuration to it before it will become usable. By clicking the pencil icon you can edit the details of both the datalinks and the interfaces as shown below.

Oracle ZFS storage configuration

After configuring a the ZFS storage appliance interfaces and datalinks you will be asked to configure the routing tables, DNS and NTP.

By having done this the pure network configuration steps are done. Optional you can now select a manner on how you will embed the new storage into your corporate authentication and authorization solution. You can solutions like; NIS, LDAP or an active directory solution you might already have in place within your corporate IT infrastructure.

More information on how to connect a new ZFS appliance to an already existing Microsoft Active Directory can be found in the Oracle documentation.

Oracle ZFS configuration STEP 3:
In step 3 the actual storage configuration will be done. Here you will have to select how you will use the disks and what type of data profile you will be using. All previous steps are more concerning how you will fit the appliance in your existing IT infrastructure. Those steps are concerning how you will actually configure and use the appliance on a storage level. It is advisable to have given this some thorough thoughts before you do the actual implementation of the appliance.

The first decision you will have to make is to decide how many storage pools your device will have (initially).

During this implementation we will only be using a single storage pool. The next important decision that needs to be made is what kind of storage profile you will be using within your pool or pools. You can have different storage  profiles per pool. The following strorage profiles are available:

Double parity
RAID in which each stripe contains two parity disks. This yields high capacity and high availability, as data remains available even with the failure of any two disks. The capacity and availability come at some cost to performance: parity needs to be calculated on writes (costing both CPU and I/O bandwidth) and many concurrent I/Os need to be performed to access a single block (reducing available I/O operations). The performance effects on read operations are often greatly diminished when cache is available.

Data is mirrored, reducing capacity by half, but yielding a highly reliable and high-performing system. Recommended when space is considered ample, but performance is at a premium (for example, database storage).

Single Parity, Narrow stripes
RAID in which each stripe is kept to three data disks and a single parity disk. At normal stripe widths, single parity RAID offers few advantages over double parity RAID -- and has the major disadvantage of only being able to survive a single disk failure. However, at narrow stripe widths, this single parity RAID configuration can fill a gap between mirroring and double parity RAID: its narrow width offers better random read performance than the wider stripe double parity configuration, but it does not have quite the capacity cost of a mirrored configuration. While this configuration may be an appropriate compromise in some situations, it is generally not recommended unless capacity and random read performance must be carefully balanced: those who need more capacity are encouraged to opt for a wider, double-parity configuration; those for whom random read performance is of paramount importance are encouraged to consider either a mirrored configuration or (if the workload is amenable to it) a double parity RAID configuration with sufficient memory and dedicated cache devices to service the workload without requiring disk-based I/O.

Data is striped across disks, with no redundancy whatsoever. While this maximizes both performance and capacity, it comes at great cost: a single disk failure will result in data loss. This configuration is not recommended, and should only be used when data loss is considered to be an acceptable trade off for marginal gains in capacity and performance.

Triple mirrored
Data is triply mirrored, reducing capacity by one third, but yielding a very highly reliable and high-performing system. This configuration is intended for situations in which maximum performance, and availability are required while capacity is much less important (for example, database storage). Compared with a two-way mirror, a three-way mirror adds additional protection against disk failures and latent disk failures in particular during reconstruction for a previous failure.

Triple parity, wide stripes
RAID in which each stripe has three disks for parity, and for which wide stripes are configured to maximize for capacity. Wide stripes can exacerbate the performance effects of double parity RAID: while bandwidth will be acceptable, the number of I/O operations that the entire system can perform will be greatly diminished. Resilvering data after one or more drive failures can take significantly longer due to the wide stripes and low random I/O performance. As with other RAID configurations, the presence of cache can mitigate the effects on read performance.

The decision which profile to apply is depending on a number of variables like what the type of performance you need will be and for example how "secure" your data should be in relation to data loss and hardware failure. The decision you make has a direct impact on performance as well as usable storage on your appliance. It is of the highest importance that, before you do the installation, have discussed the options with the consumers of your storage. This can be for example database and application administrators or even the business.

After having completed this section of the setup you should have similar situation as shown below.

This is completing the primary initial setup and you will be able to start distributing the storage to servers and users who will make use of the new ZFS appliance within your corporate IT infrastructure.

Thursday, March 20, 2014

Oracle Enterprise Manager plugin development PortARUId

Oracle Enterprise Manager is promoted by Oracle as the default monitoring and maintenance solution from Oracle. Oracle has provided numerous monitoring solutions for a growing number of systems. However, due to the fact that Oracle will not be able to create functionality for all systems that might be a candidate for monitoring and maintenance by Oracle Enterprise Manager there is an extensibility option. Oracle provides the option to develop your own plugins for Oracle Enterprise Manager and by doing so extend the capabilities of Oracle Enterprise Manager for your own company or to create a commercial product with it.

In essence Oracle Enterprise Manager provides capabilities to monitor targets on a diverse number of operating systems. When developing a custom plugin you might develop this for a specific platform. For example, the plugin you develop is only applicable on Linux systems and not on Windows systems. To ensure that your plugin will only be used on those platforms you can limit the deployment options during the development phase.

One of the base files in your plugin creation is the plugin.xml file which provides the general information about your plugin and tells Oracle Enterprise Manager how to handle this plugin. In this file you also have the option to limit the number of operating systems that can handle the plugin and should be able to be discovered and on which it can be deployed. When developing a plugin which is not usable on all platforms this is of special interest.

Within the plugin.xml you can define a number of things and among them the type of operating system where the plugin can be hosted (on the Oracle Enterprise Manager Management server) as well as which deployed agents are applicable to use the plugin based upon the target operating system and you can controle which targets should be discovered as potential deployment target for this plugin.

PluginOMSOSAruId: the PluginOMSOSAruId within plugin.xml is to state which Oracle Enterprise Manager Management server are applicable of running this plugin. In almost all cases this is applicable for all operating systems as the extensiability framework part on the management server is protecting you (up to a certain level) from making decisions that will limit this. Due to this the value is commonly set to 2000 which refers to “all”

<PluginOMSOSAruId value="2000">

Within the certification section of the plugin.xml file you can define the applicable operating systems on both agent and discovery. Both are defined as a component type as can be seen in the below example:

<Component type="Agent">
<PortARUId value="46" />
<PortARUId value="226" />
<Component type="Discovery">
<PortARUId value="46" />
<PortARUId value="226" />

Correct values for component type are:
Agent (Management Agent component)
Discovery (Discovery component)

Correct values for PortARUId are:
46 (Linux x86 (32-bit))
212 (AIX 5L and 6.1 (64-bit))
226 (Linux x86-64 (64-bit))
23 (Solaris Sparc (64-bit))
267 (Solaris x86-64 (64-bit))
233 (Microsoft Windows x86-64 (64-bit))

Saturday, March 15, 2014

The Capgemini perspectives on big data

Capgemini share their perspectives on big data and analytics. Perspectives include defining big data, good governance in a big data world, finding the value behind the big data hype, and what building blocks organizations will need to set in place to make it work for them.

It looks at the people aspects, the skills you need the move towards data driven decision making, digital transformation and the impact on the customer experience. Big data typically is very specific to industry so although there are common technologies and some common information sets ultimately each industry sector has many different new data sources and different business issues.

Therefore a key part of this series is looking at how big data is affecting sectors and the associated opportunities it presents. Visit the Capgemini expert pages or the Capgemini Big Data page for more information.

Wednesday, March 12, 2014

big data in the oil and gas industry

The oil and gas industry is in general an industry sector which collects large amounts of data. Not only companies who are active in upstream but also companies who are active in midstream and downstream. Specially companies who do work in all sectors of the industry do have large amounts of data ranging from seismic data to data on how the distribution chain is performing and much more. To be able to succeed in a industry like the oil and gas industry and to achieve compelling advantages over competitors it can be very beneficial to combine all this raw and unstructured data to help an organisation. Currently most of the data is siloed and not accessible or usable to do analysis over the whole chain. Also it is commonly impossible to use this data in a manner that is providing good results that are usable within the business planning.

For those cases and in oil and gas industry related companies it can be very beneficial to start thinking about a big data strategy. Implementing a big data strategy go's way beyond "only" implementing a hadoop cluster and give this to a number of tech people. Hortonworks recently published an article in which they provide a first insight into how hadoop and a big data strategy can be used in the oil and gas industry. A good image and starting point for thinking about a big data strategy is the below image.

The above image is showing it from a Hortonworks perspective however the same can be achieved with other Hadoop implementation vendors like for example Oracle or others. What is interesting about this image as a starting point is that it is showing a first impression of sources within the oil and gas industry that could potentially be used within a big data strategy.

Monitor your network connections on Linux

In some cases and in some environments you do want to keep track of all the network connections a Linux server or workstation is making. For example if you are planning to control your local network in a better way and think about implementing more strict firewall rules it is good to investigate what users are trying to access. In general external connections to webservers are common and should be allowed, most likely you also know which servers in your local network are likely to be accessed by other local servers and workstations. However, a lot of hidden network traffic can be executed which you are not aware of and when closing a lot of ports in your network you might start hindering daily operations.

In those cases it is good to start monitoring which traffic if executed so you can investigate this and make a network connection diagram. For this you can use logging on network switches, routers and firewalls. However, a more easy way in my opinion is to ensure all your workstations do have a running copy of tcpspy on it which will start collecting data for some time and report this back to a central location.

tcpspy is a little program that will log all the connections the moment the connect or disconnect. By default tcpspy will install in a manner that it will automatically start as a daemon and write all information to /var/log/syslog in a manner that it will capture everything. You can however create certain rules to what tpcspy needs to capture by editing the file /etc/tcpspy.rules or by entering a new rule with the tcpspy -e options.

Before implementing a more strict local firewall rule on the workstations on my private home network I first had tcpspy running for a couple of weeks and extracted all information from /var/log/syslog to a central location and visualized it with a small implementation of D3.js to visualize this. This showed that a number of unexpected however valid network connections were made on a regular basis which I was unaware of.

Implementing this at your local home network is something that could be considered not that difficult, especially if you have some scripted way of implementing tooling on all workstations in an automated manner. Also it might look a bit overdone in a home environment, however, as this can be considered a testdrive for preparing a blueprint to be implemented in a more business like environment it shows the value of being able to quickly visualize all internal and external network traffic.

When you are looking into manners to log all internal and external network connections that are made by a server or workstation it might be a good move to give tcpspy a look and when you are looking into ways to visualize the data you receive you might be interested in the options provided by D3.js

Saturday, March 08, 2014

Managing Oracle Big Data Appliance with Oracle Enterprise Manager

Oracle provides customers with a need to implement Hadoop based solutions the Oracle Big Data Appliance as an engineered system. As with most of the Oracle products, both hardware and software, Oracle is using Oracle Enterprise Manager as the primary monitoring and maintenance solution. For the big data appliance Oracle is providing the "Oracle Enterprise Manager System Monitoring Plug-in for Oracle Big Data Appliance" in the form of a OEM plugin that can be added to Oracle Enterprise Manager the moment you are adding a big data appliance to your IT landscape.

Even though managing Hadoop and tuning Hadoop is still the work for skilled people Oracle is making life a lot more easy with providing a ready build rack with both software and hardware that is tuned to work together. For example Mammoth is helping you in a great way during the initial setup of your big data appliance and during operation Oracle Enterprise Manager and the plugin will help you to monitor and manage the large part of the engineered system.

Due to the fact that most of the parts in the Oracle Big Data Appliance are build by Oracle and the other parts that are not manufactured by Oracle are in most of the other engineered system there is the option to show you virtually everything on both hardware and software level from within a single tool. As you can see from the below screenshot there is a visual recognizable representation of your hardware within the tool from where you can drill down to most of the components via the left pane menu.

Oracle Enterprise Manager for Oracle Big Data Appliance

Also it is providing you a single entry and screen to see on which node in your cluster which component is located and what there current status is. In the below screenshot you can see an overview of a cluster within the Oracle Big Data Appliance.

Oracle Enterprise Manager for Oracle Big Data Appliance

As you can see from the above screenshot it is providing you a software component overview broken up per server node. Each server node, represented as a record in the table, shows the status (if it is on this particular server node) of the namenode, failover node, journal node, data node, job tracker, task tracker or where ZooKeeper is.

All this can also be achieved by adding your own tooling however and scripting together. However, due to the fact that Oracle has had the option to both combine the hardware and the software and use for a large part technology that is already used in larger numbers for monitoring and maintenance you can be up and running in a fraction of the time that might be needed if you need to design and develop this yourself.

Thursday, March 06, 2014

Oracle Smart Flash Cache patching requirements

When using a PCI Express flash card in your server to enable the Oracle database to make use of the Oracle Smart Flash Cache options there are a number of things to consider. First of all, this only works when you are using Oracle Linux or Oracle Solaris as an operating system. This is commonly known, however, less commonly known is that you will have to ensure that your database is on a certain version.

Popular believe is that Oracle Database Smart Flash Cache is available and working with all Oracle database versions in an out of the box manner. However, when using Oracle database or less you will have to apply database patch 8974084. To be able to apply patch 8974084 you will have at least have to applied 9654983.

When you are trying to get Oracle Database Smart Flash Cache up and running on an Oracle Database or an earlier version you have to apply those patches before you start configuring this option. If the patches are not applied it will not work.

Wednesday, February 12, 2014

Oracle Exadata default passwords

When your Exadata is deployed it is by default equipped with a number of standard usernames and passwords. By default all root SSH keys and user accounts will be disabled, however, a number of accounts will be open and will have the standard passwords. Good practice dictates that all standard passwords should be changed directly to ensure that nobody can misuse this and make use of the default passwords. As a quick checklist you can find the default accounts and passwords that will be enabled below and you should ensure they are closed.

Database Server:
  • root/welcome1
  • oracle/welcome1
  • grid/welcome1
  • grub/sos1Exadata
Exadata Storage Servers:
  • root/welcome1
  • celladmin/welcome1
  • cellmonitor/welcome1
InfiniBand switches:
  • root/welcome1
  • nm2user/changeme
Ethernet switches:
  • admin/welcome1
Power distribution units (PDUs):
  • admin/welcome1
  • root/welcome1
Database server ILOMs:
  • root/welcome1
Exadata Storage Server ILOMs:
  • root/welcome1
InfiniBand ILOMs:
  • ilom-admin/ilom-admin
  • ilom-operator/ilom-operator
Keyboard, video, mouse (KVM):
  • admin/welcome1
Keeping the default passwords in use is, from a security point of view, very unwise decission and this should be changed as soon as possible. When not done the changes an attacker can gain access to your Exadata machine is increasing enormously. In many companies a default process for resetting passwords is in place for more common servers, however, Exadata servers are not implemented by the hunderds a year in a single company so processes might not always include them. Due to this it is an extra point of attention for administrators and security officers.

Oracle Database Smart Flash Cache

Oracle database has, already from release 11G the option to use Smart Flash Cache. What Smart Flash Cache enables you to do is to extend the SGA buffer cache size of your database without the need to extend your memory in your server but rather use level 2 caching options. For this you can use, for example, PCIe flash cache cards like the Sun Flash Accelerator F20 PCIe Card which is shipping from Oracle. However, other vendors do also manufactur cards that can be used to make use of this.

The below image shows what happens in essence when you are using the Oracle database Smart Flash Cache options.

  1. When a block is retrieved from the storage it is stored within the buffer cache of the system global area. 
  2. When using Smart Flash Cache a block is not removed from the buffer cache however is evicted to the flash cache instead.
  3. When a block is needed again and is not available in the buffer cache in the SGA it is retrieved (if available) from the flash cache instead.
By implementing this you can avoid, up until a certain level, recurring calls to your storage device. Requesting a block from storage is slow and if this can be avoided the benefits to the performance of your database are directly visible. Next to this, and often overlooked, is the fact that this is also resulting a positive effect for other databases and applications that make use of the same shared storage device. Due to the fact that you can lower the I/O operations on your shared storage there is more room to handle the requests from other applications and due to this there can be a positive performance gain due to this indirect relation between the components.

One good thing to keep in mind is that this option is only working when your database is deployed on Oracle Linux or on Oracle Solaris.

For more information please do have a look at the presentation on this subject which is embedded below: 

Tuesday, February 11, 2014

Oracle Exadata smart PDU

The standard Oracle Exadata cabinet that holds the parts of your Exadata is equipped with two independent power distribution units or PDU’s for short. This is regardless of the size of your Exadata, every size Exadata contains two independent PDU’s. The PDU’s are not specifically designed for the Exadata and are standard Sun Rack II Power Distribution Units who are generally available. In general the PDU’s are not given that much attention by administrators and it is something that is “just there”. There are however a number of important things to note about the PDU’s.

Oracle Exadata Power Distribution Unit

First important thing to note about the Exadata PDU’s is that they are more intelligent than your average Power Distribution Unit. During a standard deployment the PDU’s are connected to the Exadata management network. The Exadata management network is used to connect all Exadata components to your existing management network. For example your ILOM, switches and also your PDU’s to the management network.

This means that you have a power distribution unit that is connected to your network and that it is not a pure “dumb” device. You can actually connect to the device. This however introduces an “issue” that this can also be misused. An often overlooked part after the initial deployment is that the PDU is equipped with a default login and for security reasons you have to change this. In line with the Oracle documentation you need to undertake the following steps to change the PDU password and add potentially additional user accounts for the PDU with a maximum up to five users. The original instructions can be found in the Oracle Exadata Database Machine Owner's Guide 12c Release 1 (E50467-01).

The default account user for the power distribution unit (PDU) is admin. The following procedure describes how to change the password for the PDU:

  1. Use a Web browser to access the PDU metering unit by entering the IP address for the unit in the address line of the browser. The Current Measurement page appears.
  2. Click Network Configuration in the upper left of the page.
  3. Log in as the admin user on the PDU metering unit.
  4. Locate the Admin/User fields. Only letters and numbers are allowed for user names and passwords.
  5. Enter up to five users and passwords in the Admin/Users fields.
  6. Designate each user to be either an administrator or user.
  7. Click Submit to set the users and passwords.

As stated, the PDU is connected to the Exadata management network and you can actively connect to it. You can use SNMP for this or you can use the webinterface. An example of such a webinterface page from a standard PDU can be seen below.

Oracle Exadata Power Distibution Unit monitoring screen

However, as it is in the strategy of Oracle to use Oracle Enterprise Manager to be the central management and monitoring tool for all parts of a landscape you can also add a PDU monitoring plugin to Oracle Enterprise Manager to monitor your Exadata Power Distribution Unit.

Monday, February 10, 2014

Download Oracle Exadata documentation

Need to check something in the Oracle Exadata documentation, not an issue, you can find all the documentation on the machine itself. Connect to the storage cell and check /usr/share/doc/oracle/Exadata. This works fine with the minor disadvantage that you need to purchase an Exadata machine to satisfy the prerequisite. Purchasing an exadata to be able to read some documentation might not be in everyone his budget, however, there is a little escape to still be able to download the documentation.

You can login to metalink and download patch 10386736. Patch 10386736 contains the full set of documentation. The current release of the patch dated 08-JAN-2014 contains the following Exadata documentation:
  • Exadata X3-8b Hardware documentation
  • Exadata X4-2 Eighth Rack documentation
  • Exadata X4-2 Full Rack documentation
  • Exadata X4-2 Half Rack documentation
  • Exadata X4-2 Hardware documentation
  • Exadata X4-2 Quarter Rack documentation
Oracle Exadata Documentation Download

Meaning, if you are need to read up on the documentation and you have not downloaded the documentation from the Exadata storage cell to your workstation however you do want to have a look, patch 10386736 will be your solution and will provide you the latest version of all documentation.

Sunday, February 09, 2014

Understanding Oracle Exadata function shipping principal

When looking at the Oracle Exadata architecture it shows a couple of things that ensures that the Exadata Database machine can provide the performance that it is known for. Numerous design decisions are note worthy however function shipping is most likely one that is the most ingenuous and is currently only available on Oracle hardware and cannot be reproduced to other hardware platforms. We do see other companies then Oracle who are building hardware platforms that are capable or providing extreme performance however non of them are able to implement the function shipping in a manner that Oracle is capable of doing.

Main reason for this is that Oracle both owns the proprietary rights on the software and the hardware included in the Exadata platform. Function shipping is, simply put, moving SQL instructions from the database nodes to the storage nodes. In the below diagram you see a common way of deploying a database in combination with the storage, which in this case is a storage appliance however this could also be local storage on the database server.

If we, for example, need to execute a select statement on the table "TBL_SALE" and retrieve all records where the "country_of_sale" is "NL" the database will retrieve all data which is in "TBL_SALE" fro the storage and when it retrieves this it will filter on "country_of_sale" and finally only show the records where this contains "NL".

In essence there is nothing wrong with this mechanism, the issue however is two folded. The first issue is that if your table is large (multi terabyte) all this data has to be moved from your storage appliance to your database instance. This can result in a serious performance loss. The second issue is that the CPU's that are busy with (A) handling all the incoming data and (B) sorting this data to verify if the "country_of_sale" equals "NL" are unable to do any other tasks.

When we look at the way the Exadata machine is developed we have a model as shown below which will mitigate against the performance loss as described above.

In the above diagram, as we execute the same query as we do on a traditional database setup. the function shipping comes into play. Instead of requesting all data from the table to be send to your database engine Oracle makes use of the function shipping principal to send the majority of the SQL execution to the Exadata storage cell. The major benefit of this that, in our example case, not the entire multi terabyte table needs to be shipped from the storage layer to the database layer. The SQL statement parts that can be executed on the storage cell will be executed right there and only the rows that satisfy the SQL select statement.

By using this function shipping the system is not providing traditional block serving services to the database but rather, in a smart way, providing options to only transfer the applicable rows and records from the storage to the database. This limits the load on the CPU's on the database server and limits the bandwidth usage between the database server and the storage appliance.

Reason that this is only available on Oracle hardware in combination with the Oracle database is that Oracle build both the hardware and the software. Due to this Oracle is able to place propriatary code on the storage cell layer wich can communicate with the database instance.

To achieve this The database servers and Exadata Storage Server Software communicate using the iDB (Intelligent DataBase) protocol. iDB is implemented in the database kernel and transparently maps database operations to Exadata-enhanced operations. iDB implements a function shipping architecture in addition to the traditional data block shipping provided by the database. iDB is used to ship SQL operations down to the Exadata cells for execution and to return query result sets to the database kernel. Instead of returning database blocks, Exadata cells return only the rows and columns that satisfy the SQL query. Like existing I/O protocols, iDB can also directly read and write ranges of bytes to and from disk so when offload processing is not possible Exadata operates like a traditional storage device for the Oracle Database. But when feasible, the intelligence in the database kernel enables, for example, table scans to be passed down to execute on the Exadata Storage Server so only requested data is returned to the database server.

Saturday, February 08, 2014

Oracle Exadata X4-2 hardware accelerated cryptographic features

The Oracle Exadata database machine is known for a lot of features, however, one of the more overlooked features is the ability to make use of some of the hardware accelerated  cryptographic features that come with the Intel processors that are in the Exadata storage cells. The Exadata X4-2 is equipped with Intel Xeon E5-2697 v2 processors who do have the AES-NI Intel data protection technology build in to them. 

"Advanced Encryption Standard New Instructions (AES-NI) are a set of instructions that enable fast and secure data encryption and decryption. AES-NI are valuable for a wide range of cryptographic applications, for example: applications that perform bulk encryption/decryption, authentication, random number generation, and authenticated encryption."

When there is a need to ensure encryption of data on an Exadata platform you can make use of the Oracle Advanced Security feature Transparent Data Encryption. The Transparent Data Encryption is also extended to the Smart Flash Cache and Hybrid Columnar Compression parts of the Exadata to ensure that there is a full encrypted storage of your data. When doing encryption in a standard way with processors that are unable to directly work with cryptography in the processor itself it might be a performance degradation, due to the fact that the Exadata platform makes use of processors that have AES-NI build in this is no longer an issue. 

What Oracle Transparent Data Encryption is doing is protecting your data against data theft on a storage level. Your data files will have to be stored somewhere on a data storage device, in the case of an Exadata these are the storage cells in your exadata. A commonly overlooked fact is that someone could potentially steel those files directly from storage or when they are moved to a backup device. 

Transparent data encryption stores all data on storage in an encrypted manner. This however, can be a hughe performance gain when done on non-optimized processors. Due to the fact that Intel has AES-NI Intel data protection technology build in to the processors directly a number of additional layers are removed from the processes that would otherwise be used to encrypt and decrypt the data coming from disk before it can be used by the database. 

Tuesday, February 04, 2014

Replicate data to Oracle database using Attunity replicate

When using a database to store data it is in some cases needed or desired to replicate the full set of data or a subset of the data to another database. In some cases this is not the same type of database which can complicate things. In many cases companies decide to create a custom build export, move and import routine. Even though this works in general there are numerous solutions that will do this task for you.

Several vendors do provide solutions to synchronize data between databases from different vendors. For architects known to the Oracle portfolio a known solution is making use of Oracle Goldengate. Less known are the solutions from Attunity. A solution for syncronizing data from a source to a target database which do not have to be from the same vendor is provided in the form Attunity Replicate. One of the big advantages from Attunity Replicate is that it is not needed to have a direct SQL connect between the source and the target system, this can be extreme valuable in cases where you might experience a lot of latency and the replication will not have to be (near) real-time.

In the below video is a short demo given on how to setup a replication of data between a Microsoft SQL server and an Oracle database by making use of Attunity Replicate.