The personal view on the IT world of Johan Louwers, specially focusing on Oracle technology, Linux and UNIX technology, programming languages and all kinds of nice and cool things happening in the IT world.
When your Exadata is deployed it is by default equipped with a number of standard usernames and passwords. By default all root SSH keys and user accounts will be disabled, however, a number of accounts will be open and will have the standard passwords. Good practice dictates that all standard passwords should be changed directly to ensure that nobody can misuse this and make use of the default passwords. As a quick checklist you can find the default accounts and passwords that will be enabled below and you should ensure they are closed.
Database Server:
root/welcome1
oracle/welcome1
grid/welcome1
grub/sos1Exadata
Exadata Storage Servers:
root/welcome1
celladmin/welcome1
cellmonitor/welcome1
InfiniBand switches:
root/welcome1
nm2user/changeme
Ethernet switches:
admin/welcome1
Power distribution units (PDUs):
admin/welcome1
root/welcome1
Database server ILOMs:
root/welcome1
Exadata Storage Server ILOMs:
root/welcome1
InfiniBand ILOMs:
ilom-admin/ilom-admin
ilom-operator/ilom-operator
Keyboard, video, mouse (KVM):
admin/welcome1
Keeping the default passwords in use is, from a security point of view, very unwise decission and this should be changed as soon as possible. When not done the changes an attacker can gain access to your Exadata machine is increasing enormously. In many companies a default process for resetting passwords is in place for more common servers, however, Exadata servers are not implemented by the hunderds a year in a single company so processes might not always include them. Due to this it is an extra point of attention for administrators and security officers.
Oracle database has, already from release 11G the option to use Smart Flash Cache. What Smart Flash Cache enables you to do is to extend the SGA buffer cache size of your database without the need to extend your memory in your server but rather use level 2 caching options. For this you can use, for example, PCIe flash cache cards like the Sun Flash Accelerator F20 PCIe Card which is shipping from Oracle. However, other vendors do also manufactur cards that can be used to make use of this.
The below image shows what happens in essence when you are using the Oracle database Smart Flash Cache options.
When a block is retrieved from the storage it is stored within the buffer cache of the system global area.
When using Smart Flash Cache a block is not removed from the buffer cache however is evicted to the flash cache instead.
When a block is needed again and is not available in the buffer cache in the SGA it is retrieved (if available) from the flash cache instead.
By implementing this you can avoid, up until a certain level, recurring calls to your storage device. Requesting a block from storage is slow and if this can be avoided the benefits to the performance of your database are directly visible. Next to this, and often overlooked, is the fact that this is also resulting a positive effect for other databases and applications that make use of the same shared storage device. Due to the fact that you can lower the I/O operations on your shared storage there is more room to handle the requests from other applications and due to this there can be a positive performance gain due to this indirect relation between the components.
One good thing to keep in mind is that this option is only working when your database is deployed on Oracle Linux or on Oracle Solaris.
For more information please do have a look at the presentation on this subject which is embedded below:
The standard Oracle Exadata cabinet that holds the parts of your Exadata is equipped with two independent power distribution units or PDU’s for short. This is regardless of the size of your Exadata, every size Exadata contains two independent PDU’s. The PDU’s are not specifically designed for the Exadata and are standard Sun Rack II Power Distribution Units who are generally available. In general the PDU’s are not given that much attention by administrators and it is something that is “just there”. There are however a number of important things to note about the PDU’s.
First important thing to note about the Exadata PDU’s is that they are more intelligent than your average Power Distribution Unit. During a standard deployment the PDU’s are connected to the Exadata management network. The Exadata management network is used to connect all Exadata components to your existing management network. For example your ILOM, switches and also your PDU’s to the management network.
This means that you have a power distribution unit that is connected to your network and that it is not a pure “dumb” device. You can actually connect to the device. This however introduces an “issue” that this can also be misused. An often overlooked part after the initial deployment is that the PDU is equipped with a default login and for security reasons you have to change this. In line with the Oracle documentation you need to undertake the following steps to change the PDU password and add potentially additional user accounts for the PDU with a maximum up to five users. The original instructions can be found in the Oracle Exadata Database Machine Owner's Guide 12c Release 1 (E50467-01).
The default account user for the power distribution unit (PDU) is admin. The following procedure describes how to change the password for the PDU:
Use a Web browser to access the PDU metering unit by entering the IP address for the unit in the address line of the browser. The Current Measurement page appears.
Click Network Configuration in the upper left of the page.
Log in as the admin user on the PDU metering unit.
Locate the Admin/User fields. Only letters and numbers are allowed for user names and passwords.
Enter up to five users and passwords in the Admin/Users fields.
Designate each user to be either an administrator or user.
Click Submit to set the users and passwords.
As stated, the PDU is connected to the Exadata management network and you can actively connect to it. You can use SNMP for this or you can use the webinterface. An example of such a webinterface page from a standard PDU can be seen below.
However, as it is in the strategy of Oracle to use Oracle Enterprise Manager to be the central management and monitoring tool for all parts of a landscape you can also add a PDU monitoring plugin to Oracle Enterprise Manager to monitor your Exadata Power Distribution Unit.
Need to check something in the Oracle Exadata documentation, not an issue, you can find all the documentation on the machine itself. Connect to the storage cell and check /usr/share/doc/oracle/Exadata. This works fine with the minor disadvantage that you need to purchase an Exadata machine to satisfy the prerequisite. Purchasing an exadata to be able to read some documentation might not be in everyone his budget, however, there is a little escape to still be able to download the documentation.
You can login to metalink and download patch 10386736. Patch 10386736 contains the full set of documentation. The current release of the patch dated 08-JAN-2014 contains the following Exadata documentation:
Exadata X3-8b Hardware documentation
Exadata X4-2 Eighth Rack documentation
Exadata X4-2 Full Rack documentation
Exadata X4-2 Half Rack documentation
Exadata X4-2 Hardware documentation
Exadata X4-2 Quarter Rack documentation
Meaning, if you are need to read up on the documentation and you have not downloaded the documentation from the Exadata storage cell to your workstation however you do want to have a look, patch 10386736 will be your solution and will provide you the latest version of all documentation.
When looking at the Oracle Exadata architecture it shows a couple of things that ensures that the Exadata Database machine can provide the performance that it is known for. Numerous design decisions are note worthy however function shipping is most likely one that is the most ingenuous and is currently only available on Oracle hardware and cannot be reproduced to other hardware platforms. We do see other companies then Oracle who are building hardware platforms that are capable or providing extreme performance however non of them are able to implement the function shipping in a manner that Oracle is capable of doing.
Main reason for this is that Oracle both owns the proprietary rights on the software and the hardware included in the Exadata platform. Function shipping is, simply put, moving SQL instructions from the database nodes to the storage nodes. In the below diagram you see a common way of deploying a database in combination with the storage, which in this case is a storage appliance however this could also be local storage on the database server.
If we, for example, need to execute a select statement on the table "TBL_SALE" and retrieve all records where the "country_of_sale" is "NL" the database will retrieve all data which is in "TBL_SALE" fro the storage and when it retrieves this it will filter on "country_of_sale" and finally only show the records where this contains "NL".
In essence there is nothing wrong with this mechanism, the issue however is two folded. The first issue is that if your table is large (multi terabyte) all this data has to be moved from your storage appliance to your database instance. This can result in a serious performance loss. The second issue is that the CPU's that are busy with (A) handling all the incoming data and (B) sorting this data to verify if the "country_of_sale" equals "NL" are unable to do any other tasks.
When we look at the way the Exadata machine is developed we have a model as shown below which will mitigate against the performance loss as described above.
In the above diagram, as we execute the same query as we do on a traditional database setup. the function shipping comes into play. Instead of requesting all data from the table to be send to your database engine Oracle makes use of the function shipping principal to send the majority of the SQL execution to the Exadata storage cell. The major benefit of this that, in our example case, not the entire multi terabyte table needs to be shipped from the storage layer to the database layer. The SQL statement parts that can be executed on the storage cell will be executed right there and only the rows that satisfy the SQL select statement.
By using this function shipping the system is not providing traditional block serving services to the database but rather, in a smart way, providing options to only transfer the applicable rows and records from the storage to the database. This limits the load on the CPU's on the database server and limits the bandwidth usage between the database server and the storage appliance.
Reason that this is only available on Oracle hardware in combination with the Oracle database is that Oracle build both the hardware and the software. Due to this Oracle is able to place propriatary code on the storage cell layer wich can communicate with the database instance.
To achieve this The database servers and Exadata Storage Server Software communicate using the iDB (Intelligent DataBase) protocol. iDB is implemented in the database kernel and transparently maps database operations to Exadata-enhanced operations. iDB implements a function shipping architecture in addition to the traditional data block shipping provided by the database. iDB is used to ship SQL operations down to the Exadata cells for execution and to return query result sets to the database kernel. Instead of returning database blocks, Exadata cells return only the rows and columns that satisfy the SQL query. Like existing I/O protocols, iDB can also directly read and write ranges of bytes to and from disk so when offload processing is not possible Exadata operates like a traditional storage device for the Oracle Database. But when feasible, the intelligence in the database kernel enables, for example, table scans to be passed down to execute on the Exadata Storage Server so only requested data is returned to the database server.
The Oracle Exadata database machine is known for a lot of features, however, one of the more overlooked features is the ability to make use of some of the hardware accelerated cryptographic features that come with the Intel processors that are in the Exadata storage cells. The Exadata X4-2 is equipped with Intel Xeon E5-2697 v2 processors who do have the AES-NI Intel data protection technology build in to them.
"Advanced Encryption Standard New Instructions (AES-NI) are a set of instructions that enable fast and secure data encryption and decryption. AES-NI are valuable for a wide range of cryptographic applications, for example: applications that perform bulk encryption/decryption, authentication, random number generation, and authenticated encryption."
When there is a need to ensure encryption of data on an Exadata platform you can make use of the Oracle Advanced Security feature Transparent Data Encryption. The Transparent Data Encryption is also extended to the Smart Flash Cache and Hybrid Columnar Compression parts of the Exadata to ensure that there is a full encrypted storage of your data. When doing encryption in a standard way with processors that are unable to directly work with cryptography in the processor itself it might be a performance degradation, due to the fact that the Exadata platform makes use of processors that have AES-NI build in this is no longer an issue.
What Oracle Transparent Data Encryption is doing is protecting your data against data theft on a storage level. Your data files will have to be stored somewhere on a data storage device, in the case of an Exadata these are the storage cells in your exadata. A commonly overlooked fact is that someone could potentially steel those files directly from storage or when they are moved to a backup device.
Transparent data encryption stores all data on storage in an encrypted manner. This however, can be a hughe performance gain when done on non-optimized processors. Due to the fact that Intel has AES-NI Intel data protection technology build in to the processors directly a number of additional layers are removed from the processes that would otherwise be used to encrypt and decrypt the data coming from disk before it can be used by the database.
When using a database to store data it is in some cases needed or desired to replicate the full set of data or a subset of the data to another database. In some cases this is not the same type of database which can complicate things. In many cases companies decide to create a custom build export, move and import routine. Even though this works in general there are numerous solutions that will do this task for you.
Several vendors do provide solutions to synchronize data between databases from different vendors. For architects known to the Oracle portfolio a known solution is making use of Oracle Goldengate. Less known are the solutions from Attunity. A solution for syncronizing data from a source to a target database which do not have to be from the same vendor is provided in the form Attunity Replicate. One of the big advantages from Attunity Replicate is that it is not needed to have a direct SQL connect between the source and the target system, this can be extreme valuable in cases where you might experience a lot of latency and the replication will not have to be (near) real-time.
In the below video is a short demo given on how to setup a replication of data between a Microsoft SQL server and an Oracle database by making use of Attunity Replicate.
While social media is just a fun way of interacting for some people it is big business for a lot of companies. For some people it is big business because the are in the business of mining data and interpretation data so it can be used for other purposes within companies. For other companies it is one of the prime channels for exposure and brand building. While social media is relative new it has extremely quickly become of vital importance and is not something that can be ignored by a large portion of the enterprises and even small and medium businesses. One of the things to keep in mind however is that not every social media platform might be the right one for your business.
I have already shared some blogposts which tap into the differences per social media platform and also what king of content is used on which platform in the best manner to grow your brand. Recently Hassan Bawab from Magic Logix released the below infographic on the do's and dont's per social media and what it can mean for your brand exposure and building your digital marketing strategy.
To put things in a bit other fashion I have adopted the information from the infographic and included it in the below table:
Circles & communities allow you to target your messages to groups of prospective clients & customers, rather than having them generalized for entire online market.
A reported 105 million unique visitors per month equates to a large possibility of exposure for your brand.
Studies show that traffic from Google+ to sites that have a +1 button is 3.5x greater than traffic from Google+ to sites without the +1 button.
Controversy exists of whether or not +1's affect the search rankings, but overall this network shows up high on SERP's and provides valuable backlinks as well.
Better for informational / educational purposes. however, it can be used to reply to queries about specific customer messages.
The largest power point sharing community, good presentations can be equal broad exposure for your brand.
With 60 million monthly visitors at a page rank of 8,130M page views, there is plenty of potential to draw people to your site.
Search engines include "notes" text, so properly optimized presentations are hue. Also grate for leads and referral traffic.
3
The above information might help you in creating a social media strategy and decide on which social media you will focus the most. as you can see an additional column is added to show some sort of ranking. when developing a strategy to promote your company it is good to keep the above in mind and also take into account which target group is using which social network. If you for example are in to car parts then Pinterest might not be your first choice as this social network is dominated by women and not men who are most likely your target audience. however, if you resell fashion, then Pinterest might be very interesting to you.
In the era of big data and the massive collection of data by corporations there is a growing need for scalable and affordable ways of storing all data and processing data on large scale. For some of this data, which might not be that mission critical to your operations or is not that confidential a solution might be available in the cloud. More and more cloud companies do provide the option to use cloud based databases for massive data storage or in the case of Amazon Redshift a complete data warehouse service.
What Amazon Redshift in essence provide is a large scale cloud data platform based upon a columnar storage principle and which is accessible via standard JDBC and ODBC and where you can execute standard SQL commands. Amazon Redshift is an tailored implementation of the ParAccel platform. This means you can connect your standard business intelligence tooling to it to work with Amazon Redshift.
For initial loading Amazon is offering a number of options, for example, you can load data from Amazon S3, Amazon DynamoDB or AWS Data Pipeline. Next to this numerous other ways, including SQL style insert batches, can be used to load your initial data into Amazon Redshift.
Even though I do not think it is a good idea to have a cloud only and by default strategy as some companies try to implement it can be a good strategy for some of your systems and some of your data. Next to the fact that you have to think about how to manage data and systems in the cloud and have to realize that there are some implications that are good and some that are bad there is also the legal side of storing your data in the cloud. This legal side might be facing some added complexity with the upcoming introduction of the European union data protection reform which will become active somewhere in 2014.
However, in the cases where you will have the option to store your data in the cloud and you are confortable with it then Amazon Redshift is a very valid choice for massive data storage for data warehousing. Amazon has provided some fairly acceptable security measures for non confidential data. For data security Amazon is currently encrypting every data block with AES-256 which is considered fairly secure even though a cryptanalysis related-key attack vulnerability has been discovered by Alex Biryukov and Dmitry Khovratovich. To speed the encryption and decryption process up Amazon is using hardware accelerated encryption to ensure this layer of security is not impacting performance (to much).
On the Amazon website a number of BI solutions have been outlined to be compatible with Amazon Redshift and should be able to connect to it to provide BI capabilities. The products currently listed and stated to be able to connect to Amazon Redshift are; Actian, Actuate, Birst, Chartio, Dundas, Infor, Jaspersoft, Jreport, Logi Analytics, Looker, Microstrategy, Pentaho, Redrock BI, SiSense and Tableau. All those companies provide software that is capable of using the Amazon Redshift platform to do Business Intelligence on.
As an consultancy and implementation partner Amazon is stating Capgemini to be a preferred partner with the following quotation from Capgemini on the Amazon website; "We’re impressed with Amazon Redshift's ease of implementation, scalability to meet virtually any price point, and ability to handle the toughest big data and predictive analytics demands of our clients. In the era of Big Data and advanced analytics, we see Amazon Redshift and other AWS services, like Amazon Elastic MapReduce, as vital additions to the solutions and service offerings we provide".
A less known fact, or rather an overlooked fact, is that next to the above mentioned products you can essentially connect all BI tools to Amazon Redshift while keeping your BI processing platform on premise or somewhere in the cloud and you are NOT bound by the mentioned BI vendors that Amazon is stating. For example, it is very well possible to connect your Oracle OBIEE implementation to Amazon Redshift.
The above image comes from the rittmanmead.com website who have some interesting articles on OBIEE and how you can connect them to numerous datasources including the Amazon Redshift implementation. With Oracle OBIEE you are not bound to use a Oracle database and you can still use all the features of OBIEE.
The internet has been growing at a rapid speed which was most likely not foreseen by a lot of people in the beginning. When Tim Berners-Lee added HTML and HTTP to the internet mix and by doing so opened the possibilities for the world wide web as we know it today he most likely was also not foreseeing what this would become, not even in his most wildest dreams.
As we are now becoming more and more used to the daily influence of the internet and the possibilities that it is offering us we are also trying to find ways on how to improve to the current state of it. We have had the web2.0 revolution which came down to a more social and a more interactive internet where people could more easily add to the content instead of only consuming it as a reader of web-pages. Many see the next big paradigm shift for the internet in the Semantic Web. The idea of the Semantic Web includes a lot of components from open-data, linked data, micro-formats and the internet of things / the internet of everything till big-data concepts. It all comes down to how we think about data and how data will play a role in the next version of the web.
In the below video you see what the vision from Tim Brenners-Lee is on the next version of the internet which he posed during one of his talks at TED.
What it comes down to is the desire to provide data in an open format. Not necessarily in a nice and flashy website or a closed website however rather in the pure and raw format. This in an open format like XML or any other usable format.
By unlocking this data there will be directly involved the unlocking the potential of a lot of new options. When people do have access to raw data you can create new applications and come to interesting insights. currently the most data is still in closed silo's and not accessible to the public or not accessible in an open way that you can use it in a way you can build new applications around it. When the world starts with adopting the open-data model and the linked data model there will be a sprawl of new options coming to the public.
Even though all the above mentioned things are coming our way and will change the way the internet is used and seen by the public one important architectural part is missing in my opinion, or rather, is not given the attention it deserves. The missing, or rather under appreciated part is the role that push interfaces and API's will play in this new revolution. If we look at many of the new solutions, models and formats they are all based, or primarily based, upon a pull architecture.
If you draw a high level architectural representation of the current state in which applications are commonly are developed you will get something similar like the drawing below.
The custom application could for example be a web application that tracks flight information from a major airport and shows this to visitors of the website. The website itself is represented as the "Presentation layer" in the above diagram. Every time someone is logging in to the website the code will query for the most recent flights in the local database. However, to ensure that the local database contains this information it needs to be filled with data that comes from the "data owner". What you commonly see is that there is a scheduled program to download the basic information from the "data owner" data API and it is stored in the local database. The moment a user clicks on a flight the web application code will do a "online" query to the "data owner" to get the more detailed and latest information about this specific flight.
In essence there is nothing wrong with this model and it is serving its purpose. However there are more and better solutions to do this which makes the development much more easy and ensures that applications are becoming faster, less prone to issues and less resource intensive on both the "custom application" side as well as the "data owner" side.
A more favorable way of doing this is by making use of a push API rather then a pull API. In the below diagram the same solution is shown however no making use of a push API initiated by the data owner.
The implementation of this model makes a lot of sense as it is removing a lot of unneeded communication between the customer application(s) and the data owner. Also missing an update is less likely to happen and new information, in our example flight information, is in real-time available to all who custom applications that have subscribed to the push interface from the data owner.
In the above implementation, as soon as a new flight is known at the data owner it will send out a message to all the subscribed applications. By doing so the data owner knows that all the other applications are now aware of the new flight and might, if needed, query additional information via the "direct remote data access" method to the more traditional data access API. In some cases the real-time push API will remove the need for a more traditional API however in most cases there will still be a need for this.
The hurdle for implementing this model is the need to develop standardization for communication between the data owner and the custom applications. Ideally the communication between both parties is arranged based upon the HTTP protocol and JSON or XML messages are exchanged between the 2 parties as shown in the image below.
From a technical point of view it is not hard to implement the communication from both the "data owner" side for a push data API nor is it for the receiving side complicated to create a data update receiver who will receive and process the data send by the data owner.
The biggest
hurdle in this process is to ensure that the construction of the message is
correct and contains all the needed information that the receiver of the receivers
might need and that this is done in a functional format that is not only for a
single receiving application usable but for a large set of external
applications. If we take for example a look at the work done in the field of ontology
for annotating offerings and other aspects of e-commerce on the Web by GoodRelations or the work done by schema.org you will see that developing a good standard is
something that is not to be thought of lightly. However the below illustration might
show the need for such standardization of message formats on both a more functional
level and a technical level.
Commonly a custom application, for example a web portal showing flight information, is not making use of a single source. If, for example, you like to build a web portal showing all the incoming and outgoing flights in real-time or near real-time for the city of London you have to work with 11 different data providers. London is having ten "normal" airports, Luton, Stansted, Northolt, Stapleford, Heathrow, London city, Biggin Hill, Blackbushe, Farnborough and Gatwick. They also have a helicopter only airport named Battersea.
From a technical point of view it would be a great benefit if all those airports do make use of the same way of providing a push API interface to all push the information in the same manner. This way the developers of the custom applications that will connect to it will only have to develop a single data update receiver and simply subscribe to all the 11 airport push interfaces instead of developing 11 different receivers. Next to this the data providers, in this case the airports, can make use of an already developed standard data format and can potentially decide that they will develop a push interface together so they can split costs.
From a functional point of view a standard makes sense because you will know exactly what data you will get and which format. From a functional point of view this is not the technical format but rather the functional format. For example, if the data owner is also sending out weather information per flight, it would be good if all data owners do this in the same manner and all the information is in the same functional format. Meaning, temperature is for all data owners in Celsius or it is for all data owners, who comply to the standard, in Fahrenheit. This way you always know from a functional point of view what the data means and how to interpret this.
In conclusion, push API implementations will fuel the next generation of the internet in my mind and they will ensure that the web becomes (even) more real-time and data rich. The flow of data will help sprawl new initiatives and will help create new companies and great projects. For this however there will be a need to ensure that standards will be developed, not only on a technical implementation level, however even more important on a functional level where the standards will define which information in which format will be provided to the data consumers.