“Scientia potentia est”
For decades, we have benefited from modern cryptography to protect our sensitive data during transmission and storage. However, we have never been able to keep the data protected while it is being processed.
Nearly 4 billion data records were stolen in 2016. Each one cost the record holder nearly $158. If we do the simple math, in 2016 alone, attackers amassed a whopping $632 billion. The very scale, sophistication, and cost of cyber-attacks escalate every year. Cyber-attacks will continue this exploitation and today’s technologies will not be able to keep pace. In such times, we need an encryption technology to disorient and discourage bad actors.
For example, many years from now, a fault-tolerant, universal quantum computer with millions of qubits could quickly sift through the probabilities and decrypt even the strongest common encryption, rendering this foundational security methodology, that we know as of today, obsolete.
This is where Homomorphic Encryption comes in. Homomorphic encryption helps us in solving a lot of problems that today’s elliptic-curve cryptography (ECC) algorithms fail to address in our cloud infrastructure security.
Shortcomings of today’s encryption techniques
When it comes to cloud security, our data is basically encrypted in two states: during transit and on storage.
In transit, the encryption techniques that we use today suffer from a problem called as TLS / SSL termination. Interestingly, this problem that we’re talking about, is also very proudly marketed as a feature by reverse proxies such as Nginx, Envoy, etc.
TLS termination is basically used by reverse proxies for handling incoming connections and decrypting the TLS to pass on the unencrypted request to the appropriate servers. This is exactly the infrastructure limitation that attackers take advantage of. The whole threat model revolves around exploiting the fact of the availability of unencrypted data past this TLS termination phase.
In the case of storage, there are two ways in which we do things today. We either store the data in our databases mostly unencrypted in plain text; or in some cases, by doing some form of encryption. In the case of cloud providers like GCP, AWS, Azure, etc., this encryption is done using some Key Management Service (KMS). Even in this case, while the data may be stored encrypted, there always comes a time where the application needs to decrypt the data if it wants to perform any operation on it.
Every service that we know, as of today, runs on unencrypted data. The trends that Twitter shows, cannot be obtained by operating on encrypted data. The recommendations system on YouTube, the news feed on Facebook, all the predictions of every application that we see out there operate on unencrypted data.
It is these very shortfalls that Homomorphic encryption aims to address.
Imagine if you could compute on encrypted data without ever decrypting it. What would you do?
― Flavio Bergamaschi
Lattice-based cryptography proves it’s superiority as it uses very difficult math problems to hide data. By the time computers are strong enough to crack today’s encryption, the world can be prepared with lattice cryptography. Lattice cryptography, as of this day, to the best of our knowledge, is quantum resistant. It means that there does not exist any quantum algorithm that can decrypt this type of cryptography. Lattice cryptography is also the basis of Homomorphic Encryption (FHE).
Homomorphic encryption is the ability to perform arithmetic operations on encrypted data. None of our existing encryption techniques allow us to do that. Because of this ability, we really don’t need to decrypt our data, ever! It does, quite conveniently, address the shortcomings of our existing encryption techniques. In transit, the TLS termination problem never occurs as the reverse proxy need not decrypt the data. It can perform all its operations on the encrypted data itself and make all the necessary decisions without ever terminating the TLS. Even in a persistent store, all database queries can very well be performed on encrypted data.
Fully Homomorphic Encryption (FHE) protects us from these honest-but-curious threat models. An honest-but-curious (HBC) adversary is a legitimate participant in a communication protocol who will not deviate from the defined protocol but will attempt to learn all possible information from legitimately received messages. To get an idea of what this means, an effective comparison can help us great bounds.
With the way that we do things today, the usual consensus is that Alice encrypts some data and sends it as an input to Bob. Bob can decrypt that data, process, and store it at his end. Just like Alice, even Bob can encrypt some data and send it over to Alice where she can decrypt and process it at her end. Such a mechanism protects us against man-in-the-middle (MITM) attacks. Which is why Eve can’t eavesdrop on any communication between Alice and Bob. But Bob, on the other hand, has access to all this unencrypted data. Here, Bob is the honest-but-curious actor.
For the sake of convenience, we are assuming Bob to just be an honest-but-curious actor in this case without any malicious intents. For the threat models involving Bob, sitting inside our cloud infrastructure, having malicious intents, and free access to all this unencrypted data, there are other protocols that we can use in combination with homomorphic encryption to counter such scenarios. But at this moment, for the sake of convenience, we will just be assuming Bob to be an honest-but-curious actor with non-malicious intent.
Interestingly, in the case of Homomorphic encryption, along with protection against eavesdropping and MITM, we get the added protection of not allowing Bob to sit on a gold mine of unencrypted data by encrypting everything that gets stored. This, however, does not steal away Bob’s ability to perform operations on the data as he used to. One of the very benefits of homomorphic encryption is that unlike all the encryption techniques that we’ve seen till now, we need not decrypt the data. We can perform all the operations on the encrypted data itself.
Applications of Homomorphic encryption
Right off the bat, some of the use-cases that we can consider for such an encryption technique are:
- Oblivious queries. Allowing searching without intent. For example, today, while requesting weather info, we need to reveal our location to cloud providers. In case of homomorphic encryption, since our location too, will always be encrypted, we need not reveal a lot of our data
- Set intersections. Today, in order to determine an overlap, we need to completely share both the sets. Using homomorphic encryption, we can determine the overlaps without disclosure of the entire sets.
- Extracting value from private data. We can now use all the machine learning models like traditional, regression or neural network models, etc. to perform the computation of all of our private data
- Secure outsourcing. Even today, there still exist quite a few enterprises that maintain on-prem infrastructure due to lack of trust with the cloud providers. Homomorphic encryption, because of its data privacy features by design, can encourage wider cloud adoption.
Proof of Concept
Without making this article sound like an ad, let us get our hands dirty and watch how Homomorphic Encryption can be actually implemented. Microsoft has a SEAL library which supports homomorphic encryption. IBM too recently released a Fully Homomorphic Encryption toolkit for Linux. For the sake of simplicity, since IBM’s FHE toolkit is based on Docker container, we will be using it for our POC
First, we need to clone the repo:
$ git clone https://github.com/IBM/fhe-toolkit-linux.git
Once cloned, we need to run the FetchDockerImage.sh shell script. We also need to provide container OS as an argument to the shell script. For simplicity, we will be using Ubuntu:
$ cd fhe-toolkit-linux $ ./FetchDockerImage.sh ubuntu
The download and setup of the toolkit will take some time depending on the bandwidth speed and hardware.
Next, we need to run the IBMCOM pre-built toolkit from Docker Hub:
$ ./RunToolkit.sh -p ubuntu
The output of the above command should be something similar to:
$ ./RunToolkit.sh -p ubuntu WARNING: No swap limit support INFO: Using system default persistent storage path... INFO: Persistent data storage: "/home/pratik/Projects/fhe/fhe-toolkit-linux/FHE-Toolkit-Workspace" INFO: CMake: Deleting cached built settings and reconfigure INFO: Launching FHE tookit: docker run -d --name fhe-toolkit-ubuntu -v /home/pratik/Projects/fhe/fhe-toolkit-linux/FHE-Toolkit-Workspace:/opt/IBM/FHE-Workspace -p 8443:8443 ibmcom/fhe-toolkit-ubuntu 8fdcd97b1d203f0e71e4602ce6d24a76cd768c5fc2f8c5ee6b99ed7acb1a7886 CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 8fdcd97b1d20 ibmcom/fhe-toolkit-ubuntu "code-server --bind-…" 6 seconds ago Up Less than a second 0.0.0.0:8443->8443/tcp fhe-toolkit-ubuntu FHE Development is open for business: https://127.0.0.1:8443/
We now have a web server running at https://127.0.0.1:8443/. All our next operations will be through the browser.
On opening the browser and accepting the prompt because of the self-signed certificate, it will open VS code interface in the browser. Soon, it will ask us to select a kit, make sure to select the option which says GCC for x86_64-linux-gnu 9.3.0
Next, click Build in the CMake Tools status bar to build the selected target.
If you look into the examples/BGV_country_db_lookup directory, you can find the countries_dataset.csv file. It is a list of countries and their capital cities from the continent of Europe. When we will be running the toolkit, it will be using the BGV_country_db_lookup.cpp file to encrypt the contents of CSV. It also contains code that allows us to search on encrypted data. On providing the country name as input, the script will look up through the encrypted list of countries and output it’s matching capital.
Let’s proceed to run the toolkit:
On following the text instructions, if we go ahead and enter any country, it goes through the databases and outputs the capital of the same.
Though Homomorphic Encryption is a great and an extremely promising technology, is it ready for out-of-the-box use? Absolutely not. This is very much evident from the POC that we did. For searching an encrypted database with around 47 entries, it took almost 2-3 minutes. T
here is no denying that this is an awesome start and definitely in the right direction, but we still have a long way to go. Having said that, Homomorphic Encryption can very well be the next big breakthrough in the computer science industry.
We can only imagine the endless possibilities when the first FHE-enabled database would be implemented. Or the first FHE-supported proxy. Nonetheless, we’re surely in for some exciting times ahead!
Previously published at https://blog.pratikms.com/revolutionizing-data-security-by-design-ckdxa1fjz03jgids1cbxq4hgs