Monday, March 18, 2019

Oracle Linux & Cloud Automation - clean yum data with ansible

Building infrastructure, configuring servers and deploying software has been a manual task for many years. With the rise of cloud computing, virtual machines, containers and CI/CD processes we see more and more that those manual tasks are being diminished and fully automated. When building your infrastructure in the Oracle Cloud you can make use of a wide set of automation tools to make everything software defined and automated. Solutions like Ansible and Terraform provide some of the building blocks which can help you automate all the previously manual tasks. Ensuring you leverage solutions like this will increase speed and agility supported by Cloud computing.

In this  serie "Oracle Linux & Cloud Automation" we will go into more detail on how several solutions you can leverage to automate large parts of your IT footprint lifecycle. Please use the tagged label to find all posts on this subject.

Ansible
Ansible is an open-source software provisioning, configuration management, and application deployment tool. It runs on many Unix-like systems, and can configure both Unix-like systems as well as Microsoft Windows. It includes its own declarative language to describe system configuration.

Cleaning yum
Ansible has the easy possibility to install (remove or update) packages using yum. As a good practice you should clean your yum data from cached data so it is not taking any unneeded space on your local file system. Ansible is providing a module as a wrapper around the yum command, this means that you can use Ansible instead of directly interact with the yum command itself.

Missing from the module is a command to clean the cache. You can do so by putting the following in the Ansible playbook.

# clean all the yuym data.
  - name: Clean all yum data
    command: yum clean all
    args:
      warn: yes 

As you can see this is not using the yum module, it is using the command module instead. This is also the reason that we have the warn flag, currently set to yes however advisable to set to no. The warn flag will print the below message;

 [WARNING]: Consider using the yum module rather than running yum.  If you need to use 
command because yum is insufficient you can add warn=False to this command task or set 
command_warnings=False in ansible.cfg to get rid of this message.

The reason for the warning is that Ansible expects that all yum related commands are being done via the yum module and not the command module. As the yum module is missing the "clean all" option we have to do this via the command module and use the full command.

Oracle Linux tested
The below playbook is tested on an Oracle Linux 7 instance with Ansible 2.7.7.

show the playbook:
[root@localhost ansi]# cat webserver_playbook.yml 
- hosts: localhost
  tasks:
  - name: Ensure the latest yum-utils python package is available on the server
    yum:
      name: yum-utils
      state: latest
  - name: Ensure the latest python package is available on the server
    yum:
      name: python
      state: latest
  - name: Ensure the latest python-pip package is availabel on the server
    yum:
      name: python-pip
      state: latest
# clean all the yuym data.
  - name: Clean all yum data
    command: yum clean all
    args:
      warn: yes 

Show the output:
[root@localhost ansi]# ansible-playbook webserver_playbook.yml 
 [WARNING]: provided hosts list is empty, only localhost is available. Note
that the implicit localhost does not match 'all'


PLAY [localhost] ***************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [Ensure the latest yum-utils python package is available on the server] ***
ok: [localhost]

TASK [Ensure the latest python package is available on the server] *************
ok: [localhost]

TASK [Ensure the latest python-pip package is availabel on the server] *********
ok: [localhost]

TASK [Clean all yum data] ******************************************************
 [WARNING]: Consider using the yum module rather than running yum.  If you need
to use command because yum is insufficient you can add warn=False to this
command task or set command_warnings=False in ansible.cfg to get rid of this
message.

changed: [localhost]

PLAY RECAP *********************************************************************
localhost                  : ok=5    changed=1    unreachable=0    failed=0   

[root@localhost ansi]# 

Do also note that you have "changed=1" always. This is due to the fact that you always run the yum clean all. Even though this is not a config change on your system it is detected as a change. Technically it is a change as you clean the yum data, even in the cases that you do not download a package you still clean the meta-data.

Conclusion
Even though no native support for the clean is available in the Ansible yum module you can ensure that your system is not holding unnecessary data on the filesystem as shown in the example above. 

Friday, March 15, 2019

dash - changing the favicon

Dash is a Python framework for building analytical web applications. It can be used to very quickly develop small applications capable of running small analytical visualizations. As it is developers in Python you have a natural fit with solutions like Panda and the like.

When getting started with Dash, developed by Plotly, you might run into the following question; how do I change the favicon to show the one I want and not the one shipped by default.



The answer is, stop trying to code the solution. The only way (currently) is to create a directory named assets in the root of your project and add the desired favicon in to this location. This should result in the desired favicon showing.


Tuesday, March 05, 2019

Python - machine learning and clustering

Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups. In simple words, the aim is to segregate groups with similar traits and assign them into clusters.

Within machine learning we place clustering under unsupervised learning, clustering is used for example in recommendation systems, targeted marketing and customer segmentation.

The below outline is a simple starting point showing a basic form of clustering on a relatively small dataset. The dataset we will use is displayed in the scatter chart below. The objective we have is to determine 3 clusters in the data shown in scatter chart. In this example case the data is just random data, the data can however represent virtually everything. The data could for example be customers, demographic data, sensor data-points or anything else.


If you look at the data humans are by default driven by Apophenia to try and see patterns. Apophenia has come to imply a universal human tendency to seek patterns in random information, such as gambling. However, even though the human mind will try to see a pattern this is far from correct in many cases. To make a true valid clustering we will need to actually base the clustering on math and not the feeling of the human mind. 

By leveraging Python code we can devide the data into 3 distinct clusters, the found clusters are shown below in different colors.


We can now see the different clusters that are within the data. Finding the members of the cluster is done based upon K-means clustering,  K-means is a clustering algorithm that aims to partition n observations into k clusters. The main steps are:


  • Initialisation – K initial “means” (centroids) are generated at random
  • Assignment – K clusters are created by associating each observation with the nearest centroid
  • Update – The centroid of the clusters becomes the new mean

The result is that after the updates yuu will end up with (in our case) 3 centroids and the datapoint which is assoicated with this centroid based upon the most optimal (smallest) distance to the centroid.


The above scatter chart shows the centroids which form the backbone of the clustering. Normallyt hey will be hidden as they do not form an actual datapoint from the dataset. As you can see we have now 3 clusters from the bigger dataset.

Examples of clustering can be found on my Github project containing machine learning examples.