Over the last year I have been very interested in information security and what I can do as a developer/researcher to improve security for the projects I lead. This blog post goes someway in showing the strides we have made towards a secure environment, but shows that we still have a long way to go. As I was preparing this blog post, news hit that WannaCry ransomware was causing chaos around the world. I am not suggesting that the failure of organisations to use https leads to a vulnerable system (such as ransomware) but it could be an indicator of the culture within.
I blogged a while back that Queen's Mary Job Site was hacked, malicious links were being injected into web pages. I am not aware of any reporting they have done to inform them that the platform was compromised. It was that blog post which resulted in a person getting in touch about a University portal for students, similar to Moodle, but an in-house development.
The portal was not using https throughout including the login form. I decided to probe the site and found that the password reset function embedded the "old" plaintext password in the HTML page (in a hidden field - this is totally unacceptable). It was also possible to perform an enumeration to obtain all passwords for users (user_id=kxxxx reported in the URL request).
I reported the incident to the "Information Security Officer" of the University in question, who simply stated that https is costly to implement and that the site uses other security measures which makes hacking, man-in-the-middle attacks etc "practically impossible".
But then more recently, we had Wycombe District Council failing to renewal its SSL certificate. This resulted in them offering up the following laughable advice:
They were telling users to ignore the security warning triggered by the browser and to head to the site. This is very bad practice, we should not be attempting to normalise security warnings, they are notices that users must pay close attention to.
I currently work on a range of digital projects that collect personal information for research purposes via web apps and REST API in very sensitive population, the Armed Forces. It is important that we build services that do not allow for this information to be intercepted or modified by a malicious third party as it travels over the web. But I wonder how many public sector sites actually use and enforce https, is our public infrastructure taking basic precautions?
Where are we at with https
Scott Helme has written extensively on the use of security based http response headers by the Top 1 Million Alexa Ranked Websites. He performs this analysis every 6 months (Aug 2015, Feb 2016, Aug 2016 and Mar 2017) and what it shows is a general increase in the use of secure http response headers including the use of https over the period (from Aug 2015: 6.7% to Feb 2017: 18.4%). This has led to Troy Hunt labeling it as:
That's it - I'm calling it - HTTPS adoption has now reached the moment of critical mass where it's gathering enough momentum that it will very shortly become "the norm" rather than the exception it so frequently was in the past.
Troy Hunt -HTTPS adoption has reached the tipping point
But you may think, only 18.4% of sites (~184,000) use https, have we surely reached "critical mass"? The answer is yes, and the reason is simple; these are some of the most visited websites on the planet and it’s about perception and promotion of https; getting it out there and conditioning users to look out for it.
Another important side-note is that Google is actively seeking to end the unencrypted web by flagging up visually to the user insecure sites or sites using https incorrectly. I will concede that https is virtually useless in some circumstances (see Reed et al) but it is certainly better than nothing!
We trust the UK Government (and linked agencies) without data, we do not have a choice in the data it collects, processes and stores (many of it a legal requirement). We are seeing a change in attitude towards security in the private sector, but what about the public sector? Let's find out.
What are you measuring?
I am solely interested in looking at https, its uptake and enforcement and not the security based response headers like Scott Helme.
What data is included in the analysis?
I am interested in publicly accessible assets such as website(s) and API endpoints operated by public sector organisations (e.g. government, local councils, healthcare services, universities and schools).
Not surprisingly there is no single directory/index available to obtain a list of active domains and Top Level Domains (TLDs) in the public sector. UK Government Digital Services provides a handy list of .gov.uk . So does the Department of Education with its downloadable list.
I also submitted a couple Freedom of Information requests to agencies, the request asked for a list of active and publicly available domain names managed by the organisation. To make better sense of the data and to make it easier to analyse1, I grouped based on type. See footnote for pre-processing2:
Police: 220 sites were provided representing police forces and related agencies in the UK. TLD was solely "*. police.uk".
Government: 3,004 were provided representing government including local councils across the UK. TLDs was solely "*.gov.uk".
Education: 20,650 were provided representing educational establishments in England. TLDs included: ".ac.uk", ".sch.uk", ".co.uk", ".com" etc.
Health: 6,756 sites were provided representing health including hospitals, trusts and GP surgeries across the UK. TLDs was solely "*.nhs.uk".
A request to the Ministry of Defense for "*.mod.uk" was declined due to National Security.
This resulted in the following TLD composition (n=30,0601)3:
The percentages are as follows:
- *.sch.uk (28.37)
- *.nhs.uk (22.07)
- *.co.uk (18.18)
- *.gov.uk (9.64)
- *.org.uk (7.96)
- *.com (5.38)
- *.org (4.92)
- *.net (1.43)
- *. police.uk (0.71)
- *.ac.uk (0.51)
- *.uk (0.18)
- *. education(0.03)
- *. academy (0.06)
- *.info (0.10)
- *.co (0.02)
- *. school (0.006)
- *.me.uk (0.006)
I tested the http version of each site to check if a user would be redirected to https. I followed any redirect on the original request. The app I crated for the job is a node.js application. I will write up a detailed commentary of the source code and release it via GitHub shortly. You can also use this tool to perform batched and single site including security headers.
It took a total of 4 hours to perform the checks, with several additional hours spent verifying the data and generating the results and this post.
HTTP status codes
Not all of the sites provided were "online" or accessible. Below is a list of http status codes received for each group:
Some interesting status codes, I have not seen 410 or 530 for a while. I have never come across 479, I welcome any comment on what this is used for. The vast majority of sites using 302 were redirecting to https.
Empty response reflects a site that is not available (typically no DNS) or no header was returned. For all future analysis these are excluded.
Below is a comparison between those that use http and https for each group.
Surprised? I certainly was. I expected each group to report a higher https uptake than was found. I re-checked the code to make sure it was working, re-run the test and tested a sub-sample and kept getting the same figures.
Police and Government organisations reported a higher use of https than industry, however https use in education and health was significantly lower. The vast majority of insecure health sites were patient centered such as GP surgeries and patient advice. Probing further, I found most sites did use https on the patient appointment booking system (typically a 3rd party managed). However, of great concern was the vast majority of educational sites did not use https at all, even on the e-learning or email login which is accessible by staff and students. I discovered one site which had it database server public facing with no restrictions (reported).
There were also multiple "development" environments which were not secured, publicly available and enabled you to use default credentials such as "username: username, password: password", with some having the information pre-populated (this was reported to the organisation). It is concerns, that with access to the development systems you could potentially start to map out the organisation infrastructure making it easier to perform certain types of attacks.
Web server types
I'm concerned, really concerned. While checking the https status I decided to pull in the header and check what was being reported for the server type. One of the possible causes of the NHS ransomware was the failure to perform updates and to upgrade from legacy operating systems. Over 60% were reporting the server type.
The top 5 most commonly reported server types across the groups was nginx, Apache, Microsoft-IIS/8.0, Microsoft-IIS/7.5 and cloudflare-nginx. However, there was a sizable chunk (8%) of sites using Microsoft-IIS/5.0 and Microsoft-IIS/6.0. Some systems were running Microsoft Windows Server 2003, which is no longer officially supported and is potentially susceptible to the ransomware that did the rounds recently.
Organisations need to get their act together and start using https, it takes less than 5 minutes on a simple site to setup and is free. Some organisations that are using https are using it incorrectly, serving mixed content or only deploying it for certain parts of the site. I have proactively reported several concerns to organisations, such as accessible patient record systems or diagnosis services. But for many, pro-active reporting is not feasible.
Some may argue government sites do not need to use https, but consider this; these are public sector sites which have 'authority' and command public trust. Any compromise could present the public with incorrect information or exposure their details.
Each type/sector was evaluated separately. However, I did not observe any data present in one data source in another. For example, .gov.uk sites in the Department of Education data. For each type/sector removal of duplicates and checks on TLD validation were undertaken. ↩
This is not an exhaustive list of all government sites and domains. ↩
30 sites were excluded as malformed URLS. ↩