Enlarge /. We are already used to encrypting data at rest or in flight. FHE offers the possibility of also performing calculations on it without ever actually decoding it.
Yesterday, Ars spoke to IBM's senior research scientist, Flavio Bergamaschi, about the company's recent successful field trials on fully homomorphic encryption. We suspect that many of you will have the same questions as we do – starting with "What is fully homomorphic encryption?"
FHE is a type of encryption that enables direct mathematical operations on the encrypted data. After decryption, the results are correct. For example, you can encrypt 2, 3, and 7 and send the three encrypted values to a third party. If you then ask the third party to add the first and second values, multiply the result by the third value and return the result to you, you can decrypt that result – and get 35.
You never have to share a key with the third party that does the calculation. The data remains encrypted with a key that the third party has never received. While the third party was performing the operations you requested, he knew neither the values of the inputs nor the outputs. You can also ask the third party to perform mathematical or logical operations on the encrypted data with non-encrypted data. For example, in the pseudocode FHE_decrypt (FHE_encrypt (2) * 5) is 10.
Homomorphic encryption options
Enlarge /. The most obvious application of FHE is the solution of the so-called "sysadmin problem" – prevention of secret discovery by root privileged operators.
The most obvious impact of FHE is a solution to what I call a "system administration problem". When you perform your calculation on a third-party managed system, the third-party root privileged operators generally have access to the data. Quiet encryption prevents access to the data outside the range of the calculation that is in progress. With root privileges, however, a system operator can scan or change the contents of the RAM in order to gain access to the data that is currently being worked on.
With FHE, you can perform these calculations without ever making the actual data accessible to the remote system. Obviously, this solves the system administration problem fairly thoroughly – if the computer itself never has access to the decrypted data, its operators won't either.
Of course, FHE is not the first solution to the system administration problem – AMD's Secure Encrypted Virtualization is different and much more efficient. When SEV is enabled, an operator with root privileges on a host system cannot review or change the contents of the RAM used by a virtual machine running on that system. SEV is effectively free – SEV-protected VMs don't work slower than unprotected VMs.
Enlarge /. It is difficult to find all the possibilities of fully homomorphic encryption at first glance. Here are some to help us get started.
Fully homomorphic encryption offers many options, but secure, encrypted virtualization does not. Since all mathematical and logical operations can be constructed from additive and multiplicative operations, this effectively means that every calculation can be carried out with FHE-encrypted data. This opens up a dizzying array of possibilities: you can search a database without ever telling the database owner what you were looking for or what the result was. Two parties can determine the intersection of their separately held records without one party disclosing the actual content of their data to the other party.
"Secure outsourcing" is the only one of these archetypes that is possible without fully homomorphic encryption. It is worth spending most of our time concentrating on the other three that require it – since FHE is expensive.
Limitations of homomorphic encryption
This is a multiplicative scale, not a percentage scale. In many cases you will see 42 times the computing power and 10-20 times the memory required to perform ML operations with FHE-encrypted data.
The true / false positive prediction curves on the left are similar but not identical – the FHE encryption itself is not lossy, but works with floating point data.
This diagram of a cloud-based setup for machine learning and inference shows how storage and calculation on private models can be carried out by untrustworthy third parties.
Although fully homomorphic encryption enables things that would otherwise not be possible, it is expensive. Above are diagrams showing the additional computing power and memory resources required to operate with FHE-encrypted machine learning models – approximately 40-50 times the computing power and 10-20 times of RAM that would be required for the same work unencrypted models.
In the next picture, we see that the result curves of a machine learning prediction task are almost identical, regardless of whether the operations were performed on data in clear data or on FHE-encrypted data. We were amazed at the remaining difference – was FHE a bit lossy? Not exactly, said Bergamaschi. The model used is based on floating point data, not integers – and it is the floating point numbers themselves that are somewhat lossy, not encryption.
Every operation that is performed with a floating point value reduces the accuracy a little – a very small amount for additive operations and a larger one for multiplicative ones. Since the FHE encryption and decryption are themselves mathematical operations, this leads to a slight additional deterioration in the accuracy of the floating point values.
We should emphasize that these diagrams are only directly applicable to machine learning and that not every task that lends itself to FHE is a machine learning task. However, other tasks have their own limitations. For example, we spent some time walking back and forth like a blind search (where the search operator does not know what you were looking for or the result it gives you) can work.
When you query a database, the database does not normally have to perform a full-text search in every row of the queried table (s). The table (s) are indexed, and your search can be significantly speeded up by using these indexes. If you carry out a blind search with an FHE-encrypted value, your encrypted query must be masked for every full text line in the queried tables.
This way you can both send your query and get your result without the knowledge of the database operator. However, this full-text reading and masking process for each individual line in the queried tables is not the least easy to scale. How problematic this is depends very much on the type of data queried and how much of it is available. However, we will likely revisit the 50: 1 fine and 20: 1 machine learning memory penalty, if not worse.
Successful field trials
IBM has completed two field trials with FHE using real data in the financial industry – one with a large American bank and one with a large European bank.
This diagram from the FHE study with a large American bank shows the use of machine learning models based on financial data to determine the likelihood of lending.
As daunting as the performance degradation for FHE may be, it is well below the usefulness threshold – Bergamaschi told us that IBM initially estimated the minimum efficiency to make FHE useful in the real world was on the order of 1,000: 1 would be. With fines below 100: 1, IBM hired a major American bank and a major European bank to conduct real field trials using FHE techniques using live data.
The American study was based on using machine learning models to predict the likelihood of lending and to provide a large group of customers with access to retail banking, credit and investment information. The data set consisted of 364,000 entries with several hundred characteristics – and the model required to identify relatively rare events (around 1 percent) in the data set. The success of the experiment was based on the encrypted predictions with a similar accuracy as the baseline, a variable selection with a similar accuracy and an acceptable calculation effort.
We don't have as many details about the European study as it is still under NDA – but the results of the American study were published in 2019 and can be viewed in detail in the Cryptology ePrint archive.
The homomorphic encryption algorithms from IBM use grid-based encryption, are considerably quantum computer-resistant and available as open source libraries for Linux, MacOS and iOS. Support for Android is on the way.