Tuesday, February 27, 2018

Oracle Linux - digging into YUM repository XML data

Everyone using Oracle Linux will have used, at one point in time, the yum command to install additional tooling required. For most, it will be simply consuming yum to get new things in or to update the distribution while connected to the public YUM repository from Oracle and everyone who is using it will see that new data is being fetched to ensure you have proper information on the latest versions available. for those wondering what is inside of the data that is being fetched, you can simply have a curious look inside of the xml files that your machine is gathering.

To make things more easy, yum is using XML that is stored on the yum server and it will use this meta data whenever you request an update or want to do an installation. There is however a wealth of information in the files you might not be aware is available.

Even though there are commands that will help you get this information out in a clean format, it is good to also have a look at the more raw form in case you want to build your own logic around it. As an example, we will be downloading the other.xml.gz file for Oracle Linux 7 from the Oracle Public YUM repository server.

if we extract the file and start exploring the content of the file you will see a lot of information is available. The below screenshot shows the information of one package, in our case the source package for jing-trang. we use jing-trang just as an example for no specific reason, it is a schema validation and conversion based on RELAX NG.

If we now start to explore the content of this specific package we have obtained from the other.xml.gz file we downloaded from the Oracle Linux YUM repository we see that a lot of information is available.

Interestingly we see that for example all the changelog information is available and we can see who the developer is. Having stated that, the developer or rather author is somewhat of a misleading term. It is the author of the package, which means, it is not perse the author of the code. In any case, it is helping you to find who is behind a certain piece of code and it helps you to get more insight in what has changed per version.

Even though this should be in the release notes it can be an additional set of information. One of the reasons you might be interested in the more raw form of information is, as an example, you want to be able to collect and visualize insights in changes to the packages your enterprise uses (as it was the case I started to look into this)

No comments: