Johan Louwers - Tech blog: October 2013

Tuesday, October 29, 2013

Big data recievers

The overall feeling and initial thoughts when talking about big data is commonly that big data is coming from social media. The most used example of big data is to track what customer think and feel about a company or a product when the write something in social media. This is indeed a very good example and a very usable implementation target for the technical parts that form the basis of handling big data.

However, big data is to be honest much more then social media generated content which can be processed and analyzed. Big data is in general talked about by the four V's. Oracle has identified the 4 V's as Volume, Velocity, Variety and Value while IBM is giving the 4 V's another meaning.

Other sources of big data can be very well the output of the sensors in your factory, sensors in a city, a large number of other sources from within your company or outside of your company. Big data can be the output of devices from the internet of things as described in this blogpost.

Whatever the source of your "big" data the common factors are in most cases that their is a large volume of it, it is coming in a variety of formats and it coming at you fast and continuous. When working with big data the following 3 steps are common;

- Acquire
Their is the need to acquire the data from a number of sources. Within the acquire process their is also the need to store the data. Acquire is not necessarily capturing the data. In the example of sensor data the sensor will capture the data and send it to the acquire process.

- Process
When data is acquired and stored it will need to be processed. In some (most) cases this will be organizing the data so it can be used in analysis however it can also be very well processed for other means then analysis.

- Use
Using the processed data is in many cases analyzing or further analyzing the data that comes out of the process step. However it can also be input for other, non, analytic business processes.

In the below you can see the vision from Oracle on those steps in which they acquire, organize and analyze data.

Even though the above is giving a good example a step is missing from the visual representation. As it is shown now the acquire phase is displayed as stored data in HDFS, NoSQL or an transactional database. What one of the big parts of a good big data strategy should hold is a step before it is stored and that is receiving the data. One of the big parts where a lot of time will have to be dedicated is creating a good technological capture strategy. You will have to have "listeners" to which data producers can talk and who will write the data to the storage.

If you take for example the internet of things you will have a lot of devices that will be sending out data. You will have to have receivers to which those devices talk. As we have stated that that big data is characterized by volume, variety and velocity this means that you will have to build a number of receivers and those receivers should be able to handle a lot of messages and data. This means that your transceivers should be able to, for example, balance the load and work in parallel to be able to cope with the load of all devices that send data to them. Also you will have to ensure that the write speed of the data that the transceivers need to store the data is in line with the supply of data that is send to the transceivers.

An example of where such topics where an issue and where handled correctly is the CMS DAQ system developed by the TriDAS Project project team at cern when developing data capture triggers for the Large Hadron Collider project. In the below video Tulika Bose a assistant professor from the University of Boston gives a short introduction to this.

Saturday, October 26, 2013

Vine will push the VCA industry

As it turns out Vine is currently the fastest growing app in the world. Vine is a app developed by the people behind twitter and it provides you the option to record a video of only a couple of seconds and share this directly on facebook or via twitter. In some sense it can be seen as the Instagram for video content. Previously people had the option to share their daily life via social media primarily via text, added to this are now platforms that enable you to share in the form of pictures you take with your mobile phone. With Vine those options are now expanding to video. Already other social media companies are implementing video capabilities. For example Instagram is adding video capabilities to their app. Instagram, a Facebook company is trying hard to push Vine out of the market as you can also read on this BGR.com arcticle.

A "downside" of this new option however is that analyzing and mining meaning out of a video is much harder compared to doing this from text. Companies are already picking up on implementing systems where they harvest for example twitter messages to find people who talk about their product and brand or who talk about products and services offered also by their company.

One part of the newly upcoming technologies has been looking for a place to fit in. The VCA or Video content Analysis segment of the market has been looking into security and surveillance solutions. Primarily used for analysing surveillance camera feeds. As video content is getting more and more a place in the social media arena the demand for software that can analyse the content of those videos will also grow. This will be a field of expertise where most likely companies who have been working on security VCA solutions can step into the social analytics game.

The major downside of VCA is that it is a resource intensive and currently is still experiencing a high level of faulty outcomes. Far from being mature however this sector of the IT industry could grow to heights as we see the number of video content enablers grow.

What to keep in mind when reading a best practice

When developing an technical deployment architecture for a company it is always good to base your architecture on the best practices. Best practices are offered by a number of sources and when reading the best practices it is good to understand where they are coming from and if there is a "second agenda" in the best practices that is offered.

In general the best best practice is are to be found by independent organizations that provide industry best practices. Next to this a good source of best practices can be the numerous blogs that can be found online. a lot of software vendors do provide best practices and also implementation companies do provide best practices. One thing you have to remember though is that there is always the change that the people from the software vendor and/or an implementation partner do write a best practice without keeping in mind a specific business case.

In real world deployments every additional piece of software and hardware is an additional cost on the budget. This additional cost in the budget should be accompanied with a valid business reason. When someone writes without keeping that in mind, or even with the intention to add additional and costly features to the design, the costs of your deployment architecture might not weigh up to the business benefits and needs.

When reviewing a best practice always try to keep the following things in mind:
- Is the goal used in the best practice in line with my business need
- Is this best practice possibly written with a second agenda
- Is this best practice including higher demands then in my situation
- Is his best practice only focusing on a single vendor solution stack and are the options to include solutions from other vendors.

If you keep the above point in the back of your mind best practices you download from software vendors are very usable.

Tuesday, October 15, 2013

Oracle ZFS analytics capabilities

When Oracle integrated Sun Microsystems a couple of years ago part of the acquired technologies was the ZFS filesystem. ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z and native NFSv4 ACLs. The ZFS filesystem is (theoretically) capable of holding a maximum of 16 Exbibytes fo data.

As we can see in the Oracle strategy for storage is that they are building and shipping at this moment a number of storage and backup appliances based on the ZFS technology. At this moment they do ship the ZFS storage appliances ZS3-2, ZS3-4, 7120, 7320 and the 7420 and also Oracle is shipping the Sun ZFS backup Appliance.

A lot of exciting technologies are included in the ZFS storage appliances both on hardware level and software level which can help you get performance gains especially when used in combination with Oracle databases. However, often forgotten is that there is a management suite to manage and monitor you storage appliances.

The strategic product roadmap of Oracle, which is not officially communicated, shows that all management and monitoring solutions for both hardware and software should be integrated within the Oracle Enterprise Manager solution or at least (for now) interact with it. For the ZFS storage appliances a plugin is available to include functionality for managing and monitoring ZFS appliances from Oracle Enterprise Manager. You can download the plugin from the Oracle website.

However, next to the integrated way you have a standalone solution for managing and monitoring your ZFS appliances. This solution is holding the ZFS storage appliance analytics which helps tuning your storage to an optimum. The entire analytics solution is based on the dtrace capabilities, this means that a deep core analysis can be done.

In the above video you can see a bot more about the capabilities of the Analytics that you are able to pull out of a ZFS storage appliance and how they can help you in tuning your storage in a more efficient way.

The common analytics that are provided are:
- CPU: Percent utilization
- Cache: ARC accesses
- Cache: L2ARC I/O bytes
- Cache: L2ARC accesses
- Capacity: Capacity bytes used
- Capacity: Capacity percent used
- Capacity: System pool bytes used
- Capacity: System pool percent used
- Data Movement: Shadow migration bytes
- Data Movement: Shadow migration ops
- Data Movement: Shadow migration requests
- Data Movement: NDMP bytes statistics
- Data Movement: NDMP operations statistics
- Data Movement: Replication bytes
- Data Movement: Replication operations
- Disk: Disks
- Disk: I/O bytes
- Disk: I/O operations
- Network: Device bytes
- Network: Interface bytes
- Protocol: SMB operations
- Protocol: Fibre Channel bytes
- Protocol: Fibre Channel operations
- Protocol: FTP bytes
- Protocol: HTTP/WebDAV requests
- Protocol: iSCSI bytes
- Protocol: iSCSI operations
- Protocol: NFSv bytes
- Protocol: NFSv operations
- Protocol: SFTP bytes
- Protocol: SRP bytes
- Protocol: SRP operations

Next to the common analytics there are also a number of things where you can get more detailed and more advanced analytics on;
- CPU: CPUs
- CPU: Kernel spins
- Cache: ARC adaptive parameter
- Cache: ARC evicted bytes
- Cache: ARC size
- Cache: ARC target size
- Cache: DNLC accesses
- Cache: DNLC entries
- Cache: L2ARC errors
- Cache: L2ARC size
- Data Movement: NDMP bytes transferred to/from disk
- Data Movement: NDMP bytes transferred to/from tape
- Data Movement: NDMP file system operations
- Data Movement: NDMP jobs
- Data Movement: Replication latencies
- Disk: Percent utilization
- Disk: ZFS DMU operations
- Disk: ZFS logical I/O bytes
- Disk: ZFS logical I/O operations
- Memory: Dynamic memory usage
- Memory: Kernel memory
- Memory: Kernel memory in use
- Memory: Kernel memory lost to fragmentation
- Network: IP bytes
- Network: IP packets
- Network: TCP bytes
- Network: TCP packets
- System: NSCD backend requests
- System: NSCD operations

Getting all those analytics can be done via the GUI that is provided by Oracle. The mentioned analytics can help you tune your appliance and the way applications are interacting with it. One thing however is of vital importance, that you have a deep understanding of what the figures mean. A good starting guide is analytics guide from Oracle. However, this alone will not be sufficient. When running a mission critical system which is based upon a ZFS storage appliance and you have to deliver the most optimum performance a deep knowledge of ZFS and storage solutions will be needed.

Thursday, October 03, 2013

Oracle Software Defined Datacenter enabling strategy

When using a cloud service less and less people are thinking about how things work "under the cloud". The cloud is taken as a given fact without thinking about how a cloud vendor is ensuring everything is working and is capable of providing the scalability and flexibility that comes with a true cloud solution. There is also no need to think about this in many cases, unless you are the one who is building the cloud solution and/or responsible for maintaining the solution.

As already stated by Pat Gelsinger, the VMWare CEO we are entering the third wave of IT which is the Mobile-cloud wave. This third wave is making life much more simpler for a number of people, when you need an environment you can simply request one by your infrastructure-as-a-service provide and most hints will be arranged. When you for example request a new instance at Amazon web services you can simply click your network components together and magically everything is working.

The more complicated factor that is coming in to play which was not (that much) the case in the client-server era is that more and more components need to be virtualised and should be able to be controlled from a central software based portal. This is when SDDC is coming into play, SDDC stands for Software-Defined Data Center and is an architecture approach in which the entire IT infrastructure extending on the virtualisation concept. Within this concept all infrastructure components are delivered as it where software components. In general the main 3 components of a SDDC architecture are:

Compute virtualisation, which is a software implementation of a computer.

Network and security virtualization. Network virtualization, sometimes referred to as software-defined networking, is the process of merging hardware and software resources and networking functionality into a software-based virtual network.The network and security virtualization layer untethers the software-defined data center from the underlying physical network and firewall architecture

Software-defined storage, or storage virtualization, enables data center administrators to manage multiple storage types and brands from a single software interface. High availability, which is unbundled from the actual storage hardware, allows for the addition of any storage arrays as needed.

When we take a look at the Oracle portfolio we do see a tendency towards software-defined-datacenter solutions. As Oracle is adopting the cloud thinking and is not only providing a cloud platform but also is providing the building blocks for customers to build there own (internal) clouds it is not more then logical that we find SDDC supporting solutions.

Oracle Compute virtualisation;

it is without any doubt that Oracle is working on a number of virtualisation technologies where Oracle VM is the most noteworthy and used. Next to this Oracle is working on a Solaris containers approach however for the x86 platforms the common standard is becoming Oracle VM which is based on the XEN hypervisor.

Software defined networking;

In this field Oracle is taking some great steps. Oracle SDN (software defined Network) has been launched some time ago. Oracle SDN boosts application performance and management flexibility by dynamically connecting virtual machines (VMs) and servers to any resource in your data center fabric. Oracle SDN redefines server connectivity by employing the basic concepts of virtualisation. Unlike legacy port- and switch-based networking, which defines connectivity via complex LAN configurations, Oracle SDN defines connectivity entirely in software, using a supremely elegant resource: the private virtual interconnect. A private virtual interconnect is a software defined link between two resources. It enables you to connect any virtual machine or server to any other resource including virtual machines, virtual appliances, bare metal servers, networks, and storage devices anywhere in the data center.

The SDN solution from oracle provides a great set of management and monitoring tools which enables administrators and architects to manage the virtual network in a more efficient way and also tie this into a flexible cloud solution which is architected front he ground up and is fully automated.

Software defined storage;

within the field of software defined storage Oracle is, at this moment, not providing a clear path to the future. However when searching the Oracle website you can find some reports that are hinting or talking about the subject.

There is an IDC report on the oracle website where IDC is stating the following question without answering it directly; "Will Oracle leverage ZFS or ZFS/OpenStack for a software-only, software-defined storage solution for hyperscale cloud builders? Given that Oracle does not have a material storage hardware business to protect and it has an excellent software stack with ZFS (and more enhancements coming), Oracle could really become a strategic supplier to next-generation cloud builders."

I my opinion this is a bit off the mark as Oracle has a storage hardware department where they do build and sell storage appliances however it is true that this is not the main focus of the company however can become a more and more valuable part of the company in the upcoming time.

Next to this there is a report from Dragon Slayer Consulting which can be found on the Oracle website which is also talking for a bit about software defined storage and is also stating some hints on how ZFS appliances can be used in combination with Oracle Enterprise Manager to be used in a software defined storage solution.

Even though there are a lot of options to "trick" components to act like software defined storage solutions and a lot can be done by using Oracle Enterprise Manager there is not a real good definition and a clear path coming from Oracle on what role they will play with regards to Software Defined Storage in the future.

Oracle Enterprise Manager;

We do see a trend that Oracle is integrating the monitoring and management options into Oracle Enterprise Manager and making this the central location for all management tasks. Also Oracle announced that it will be integrating with openstack and will provide OpenStack Swift API's. Having the Oracle Enterprise Manager capabilities extended with OpenStack API's and making more and more components "software defined" Oracle is building a portfolio that is able to form the basis for a full Oracle Red Stack private cloud solution, not only for "small" enterprises but also for large cloud vendors who are willing to provide large scale cloud solutions to large number of (internal or external) customers.