Visualising Drupal Security Advisory Data – Hacker Noon

Drupalgeddon 2.0 brought a lot of focus on the Drupal security initiative and its practices. The way the security team was proactive with respect to disclosure, the way it was communicated to the developers, community and press was commendable. In addition to all these the communication was continuous.

The vulnerability which started off with a risk score of 21/25 on March 28th was upgraded to 22/25 on April 13th and was finally marked as 24/25 on April 14th. If you are interested in what changed across these days for the score to vary you can checkout the revisions and compare them yourself here.

One thing that we observed was that in spite of all the communication not many were aware of the details and terminology. Security risk levels are well defined on Security Risk Level Definitions Page but still looks like not many are reading it in detail. This intuition is based on a sample survey of around 100 developers from three different cities in India. While it may not be an actual representation it still highlights a problem that needs to be addressed. It would be great if Drupal Association adds this question to one of their surveys. “Have you read https://www.drupal.org/drupal-security-team/security-risk-levels-defined” with options of Yes and No. That should give us more insights. If developers themselves are not aware of these details I think it is too much of a ask from site-builders to know about this and take corrective actions. So we created a crude static file that can make it easy for humans to understand what the security string like 24∕25 AC:None/A:None/CI:All/II:All/E:Exploit/TD:Default means. While the security score is pretty much self explanatory it is the second part that generally stumps many people.

You can check it out on https://nkgokul.github.io/drupalsecurity/.

Once you enter the security string like “24∕25 AC:None/A:None/CI:All/II:All/E:Exploit/TD:Default” it gives a description that humans can understand. It is a very crude version without any validations. Would be great if anybody can clean it up.

Once this was done we wanted to do a basic analysis of how Security Advisories have been released till date and what were the security scores of each of these advisories. Though official Security track record has some details it was not up to date and it was not detailed enough. So we set out to gather the data. Though there were API endpoints like

https://www.drupal.org/api-d7/node.json?type=sa&status=1
https://www.drupal.org/api-d7/node.json?taxonomy_forums=1852 
https://www.drupal.org/api-d7/node.json?taxonomy_forums=1856

we felt it was too much of work to normalise the data from these endpoints. So we took a different approach of scraping the data from https://www.drupal.org/security. It was not a straight forward job and it was not as easy as we initially we thought it would be. We took the route of scraping https://www.drupal.org/security using our good old Google docs and some queries.

Since the data was inconsistent we had to use different queries based on the different time windows during which these announcements were made. After doing an initial round of scraping we did some data manipulations to get all the relevant data in the format we wanted.

The next challenge was that two different approaches were used. Post August 6th, 2014 NIST Common Misuse Scoring System (NISTIR 7864) mechanism was used for categorizing the vulnerabilities and scoring them. So that data was better structured. Before that Drupal team had its own way of classifying the vulnerabilities.

You can read about them here — https://www.drupal.org/drupal-security-team/security-risk-levels-defined

To have some meaninful insights we wanted to have the security risk score for vulnerabilities that were reported prior to August 6th, 2014. So based on the new guidelines and the security risk level assigned to the vulnerabilities announced before the date we did a reverse mapping.

For highly critical we gave a score of 22.5,

Critical we gave 17,

Moderately Critical we gave 12,

Less Critical we gave 7 and

Not Critical we gave 2.

Though these numbers are not accurate this gives us a broad sense. To have an exact scoring we will need to have a rating for each of the six Risk metrics defined in NISTIR 7864. This can be time consuming and hence we put it on hold for now. It would be great if somebody can rate the old SAs as per the new guidelines. If you would like to take a dig at it you are free to do it here — Drupal core vulnerability analysis. All users have edit access. So please go ahead and update the Columns H to M that are marked in orange and have the text “Details not available”. Once you are done with it you can also update the column G and you can use these values in https://security.drupal.org/riskcalc to find out the Risk Score.

With the currently available data we made some visualizations.

This was created using Google spreadsheet. As you can see the number of SAs have reduced over the time and it is specially interesting note that vulnerabilities with score less than 10 have reduced drastically post January 2010. I am not sure if this could be attributed to automation tools that were around that time.

Using the data from Google spreadsheets we created a couple of interactive maps in PowerBI.

You can check out these interactive maps here.

read original article here