Securely combining data from multiple sources while preserving privacy
By: Betty Ann Bryanton, Canada Revenue Agency
Introduction
The increasing prevalence of technologies, such as cloud, mobile computing, machine learning (ML), and the Internet of Things (IoT), create opportunities for innovation and information sharing, but also create challenges for data security and privacy. These challenges have been amplified during the global pandemic, working from home has driven faster adoption of hybrid and cloud services. This situation has strained existing security capabilities and exposed gaps in data security (Lowans, 2020). Meanwhile, global data protection legislation is maturing, and every organization that processes personal data faces higher levels of privacy and non-compliance risks than ever before (Wonham, Fritsch, Xu, de Boer, & Krikken, 2020).
As a result, privacy-enhanced computation techniques, such as Secure Multiparty Computation, which protect data while it is being usedFootnote1, have been gaining popularity.
What is Secure Multiparty Computation?
Secure Multiparty Computation (SMPC) is a technique for combining information from different privacy zones to obtain insights on the combined data without having to reveal the raw data to the involved parties. It has evolved from a theoretical curiosity introduced by Andrew Yao's Millionaires problemFootnote2 in the 1980s to an important tool for building large-scale privacy-preserving applications.Footnote3
To illustrate the concept, Bob and Alice want to know if they are being paid the same but do not want to ask this awkward question. They buy four lockable suggestion boxes, each labelled with a dollar amount per hour: 10, 20, 30, 40. Bob earns $20/hr, so he only has a key to unlock the box labelled 20. Alice earns $30/hr; she only has a key to unlock the box labelled 30. Both Bob and Alice, unseen to the other, puts a slip of paper in each box indicating 'yes' or 'no.' For example, Alice puts 'no' into 10, 20, 40 and 'yes' into 30 for the $30/hr she makes. Bob unlocks the 20 box and learns that Alice is not paid $20/hr, but still does not know if her hourly rate is $10, $30 or $40. Alice unlocks the 30 box and learns that Bob does not make $30 an hour but does not know if his hourly rate is $10, $20 or $40. -- This is called 'oblivious transfer.' The ability to do oblivious transfers is the basis for performing SMPC.Footnote4
SMPC is a method of distributed computing and cryptographyFootnote5 that combines data transformation (encryption) with specialized software. It enables multiple parties who do not trust each other, or any common third party, to jointly work with data that depends on all of their private inputs while keeping that data encrypted. Participants know only the results of the collaboration, and not the specific data others contributed. This enables collaboration between trusted partners or even between competitors.
SMPC is often assumed to require the participation of multiple organizations; however, the specific requirement is for multiple privacy zones, i.e., two or more domains with different sets of privacy restrictions. Multiple privacy zones exist across multiple organizations with independent data owners, but they may also exist within a single organization across teams, departments, and/or jurisdictions.
Parties are trusted to adhere to the protocol. If a party is not trusted, additional measures, outside the scope of this paper, are required to prevent malicious or covert breach attempts.
Strengths
- Simultaneously achieves privacy, obliviousness, and authenticity
- Eliminates the need to trust a third-party data broker to access and process the data
- Allows inference on encrypted data: the model owner never sees the client's private data and therefore cannot leak or misuse it
- Eliminates trade-off between data usability and data privacy, i.e., since the raw data is encrypted, there is no need to mask or drop any features in order to share and process it
- Opens new opportunities for enterprise collaborations that were not previously possible due to regulation or risk
- Confidentiality levels similar to Fully Homomorphic Encryption (FHE) but less computationally expensive and complex
Challenges
- SMPC techniques are extremely complex, requiring comprehensive, often complicated, cryptography; thus, it is difficult for non-experts to understand or implement.
- Inability to see the input data may foster suspicion
- If functions are not carefully crafted and tested, security can be broken
- Significant computational overhead due to the complexity and distributed nature. Cost varies greatly depending on the collaboration required (e.g., number of parties, usage of different cloud providers) and the need for protection against malicious parties
- Sensitive to latency between nodes (Krikken, 2019)
- Requires additional infrastructure, which will add to the project planning and total cost of ownership calculation (Byun, 2019)
Why is it important?
According to the U.S. Director of National Intelligence, U.S. cybersecurity in both the public and private sectors is at continual risk and should expect increasing attacks. Organizations rich with data and intellectual property (IP) are prime targets. Attackers often target this 'Crown Jewel'Footnote6 data because of its value and the potential for disruption (Enveil).
Organizations are increasingly concerned about data security in several scenarios, including:
- collecting and retaining sensitive personal information.
- processing personal information in external environments, such as the cloud; and
- information sharing, such as sharing and working on sensitive data in distributed settings, from healthcare to finance (Krikken, 2019).
SMPC can address and alleviate these concerns, by allowing organizations to compliantly, securely, and privately share insights on distributed data without ever exposing or moving it.
This is important because the increasingly distributed nature of customer data means many organizations do not generate the necessary levels of data on their own to derive the unbiased insights required to provide new experiences, open new revenue streams and apply new business models. SMPC enables secure collaboration to provide mutual benefit to all parties, while preserving privacy and confidentiality.
Real World Applications
Though it is still emerging and there are challenges, SMPC is poised to significantly disrupt the enterprise data exchange space and to allow successful data sharing solutions amongst distrusting data owners. Listed below are notable successful deployments.Footnote7
- Danish Sugar Beets Auction, the first successful example of SMPC deployment, in 2008, where the privacy of farmer bids for contracts was assured
- Boston Women's Workforce CouncilFootnote8 Gender / Wage Gap Studies, first conducted in 2016, analyzing payroll data from multiple employers, to serve as a roadmap for change for the city and its employers
- Estonian government study in 2015, analyzing tax and education records to determine if working part-time while studying increased failure ratesFootnote9
Use Cases
SMPC is very popular for use cases where organizations need to share data with, and/or analyze data from, multiple parties without disclosing their data and/or their analytics model to each other.
This list illustrates the range and scale of SMPC applications.
- Collaboration with disparate parties, e.g., sharing citizen data amongst government departments and/or financial institutions; sharing electronic medical records amongst hospitals, pharmacies, insurance manufacturers
- Distributed data mining: collecting private data from independent data sources to learn something that is not possible from a single source, e.g., finding fraudulent taxpayers via private business data or other taxpayer data
- Key management: safeguarding authentication keys as they are being used
- Cloud computing: data exchange, data analytics, and ML across multiple, unknown cloud providers
- Multi-network security monitoring across entities to aggregate private data
- Spam filtering on encrypted email
- Medical discovery, e.g., disease or virus contact tracing apps, combining data of many hospitals for genomics research
- Satellite collision avoidance without disclosing its location
Conclusion
The awareness that personal data can be compromised in a data breach or can be abused by companies whose interests do not align with those of their users, is increasing. New regulations make holding personal data a liability risk for companies. SMPC has emerged as a powerful and versatile technique to gain insights from sharing data without ever exposing it directly.
Although there is no single product or technique that can satisfy every data security requirement, SMPC can be used as one defense alongside other data protection measures, such as data masking, and other privacy-preserving techniques, such as differential privacy and homomorphic encryption.
What's Next?
Gartner expects SMPC to be transformational in the next 5-10 years (Lowans, 2020). In order to be prepared, considering the amount of private data that many organizations hold, and the pressure for that data to be safeguarded, an interested organization should steadily continue to research SMPC and other privacy-preserving data protection techniques.
Related Topics: data anonymization, differential privacy, homomorphic encryption, trusted execution environments / confidential computing, federated learning
Meet the Data Scientist
If you have any questions about my article or would like to discuss this further, I invite you to Meet the Data Scientist, an event where authors meet the readers, present their topic and discuss their findings.
Register for the Meet the Data Scientist event. We hope to see you there!
MS Teams – link will be provided to the registrants by email
Subscribe to the Data Science Network for the Federal Public Service newsletter to keep up with the latest data science news.
References
- Acar, A., Celik, Z. B., Aksu, H., Uluagac, A. S., & McDaniel, P. (2017, Jul 6). Achieving Secure and Differentially Private Computations in Multiparty Settings. Retrieved from Cornell University arXiv: Achieving Secure and Differentially Private .Computations in Multiparty Settings
- Accenture Labs. (2019, Oct 1). Maximize collaboration through secure data sharing. Retrieved from Accenture: Together, we can reinvent your business
- Balamurugan, M., Bhuvana, J., & Pandian, S. C. (2012). Privacy Preserved Collaborative Secure Multipary Data Mining. Journal of Computer Science, 8(6), 872-878. Retrieved from Privacy Preserved Collaborative Secure Multiparty Data Mining
- Barot, S., & Agarwal, S. (2020, Oct 9). 2021 Planning Guide for Data Analytics and Artificial Intelligence (ID: G00732258). Retrieved from Gartner: Gartner
- Bogdanov, D., Kamm, L., Kubo, B., Rebane, R., Sokk, V., & Talviste, R. (2016, Jul). Students and Taxes: a Privacy-Preserving Social Study Using Secure Computation. Proceedings on Privacy Enhancing Technologies, 117-135. Retrieved from Students and Taxes: a Privacy-Preserving Social Study Using Secure Computation
- Byun, H. (2019, Apr 1). Homomorphic Encryption and Multiparty Computation. Retrieved from Baffle: Homomorphic Encryption and Multiparty Computation
- Choi, J. I., & Butler, K. R. (2019, Apr 2). Secure Multiparty Computation and Trusted Hardware: Examining Adoption Challenges and Opportunities. Security and Communication Networks, 2019(Article ID 1368905), 1-28. Retrieved from Hindawi: Secure Multiparty Computation and Trusted Hardware: Examining Adoption Challenges and Opportunities
- De Simone, S. (2020, May 24). Secure Multiparty Computation May Enable Privacy-Protecting Contact Tracing Solutions. Retrieved from InfoQ: Secure Multiparty Computation May Enable Privacy-Protecting Contact Tracing Solutions
- Enveil. (n.d.). The Data Security Triad. Retrieved from Enveil: The Data Triad
- Evans, D., Kolesnikov, V., & Rosulek, M. (2020). A Pragmatic Introduction to Secure Multi-Party Computation. Boston: NOW Publishers. Retrieved from A Pragmatic Introduction to Secure Multi-Party Computation
- Fehr, S. (2011, Dec 8). Secure Multiparty Computation (MPC) [PowerPoint]. Retrieved from Secure Multiparty Computation (MPC)
- Fritsch, J. (2020, Jan 27). Securing the Data and Advanced Analytics Pipeline (ID: G00464663). Retrieved from Gartner: Gartner
- Gidney, C. (2013, May 7). Explain it like I'm Five: The Socialist Millionaire Problem and Secure Multi-Party Computation. Retrieved from Twisted Oak Studios: Explain it like I'm Five: The Socialist Millionaire Problem and Secure Multi-Party Computation
- IBM Corporation, (2017, Nov). Protecting your company's most critical information.
- Information Security Forum. (n.d.). Protecting the Crown Jewels: How to Secure Mission-Critical Assets. Retrieved from ISF: Protecting the Crown Jewels: How to Secure Mission-Critical Assets
- inpher. (n.d.). What is Secure Multiparty Computation? Retrieved from inpher: What is Secure Multiparty Computation
- Krikken, R. (2019, Nov 26). Achieving Data Security Through Privacy-Enhanced Computation Techniques (ID: G00384386). Retrieved from Gartner: Gartner
- Li, Q., Gundersen, J. S., Heusdens, R., & Christensen, M. G. (2020, Sep 2). Privacy-Preserving Distributed Processing: Metrics, Bounds, and Algorithms. Retrieved from ArXiv: Privacy-Preserving Distributed Processing: Metrics, Bounds, and Algorithms
- Lindell, Y. (2021). Secure Multiparty Computation. Communications of the ACM, 64(1), 86-96. Retrieved from Secure multiparty computation
- Lopardo, A., Benaissa, A., & Ryffel, T. (2020, Jun 12). What is Secure Multi-Party Computation? Retrieved from Medium: What is Secure Multi-Party Computation?
- Lowans, B. (2020, Jul 24). Hype Cycle for Data Security, 2020 (ID: G00448204). Retrieved from Gartner: Gartner
- Ma, R., Li, Y., Li, C., Wan, F., Hu, H., Xu, W., & Zeng, J. (2020, May 1). Secure multiparty computation for privacy-preserving drug discovery. Bioinformatics, 36(9), 2872-2880. Retrieved from Oxford University Press: Secure multiparty computation for privacy-preserving drug discovery
- Pagter, J. (2017, Apr 27). Multiparty Computation (MPC): A short introduction. Retrieved from Sepior: An Introduction to Threshold Signature Wallets With MPC
- Parrish, K. (2016, Aug 10). Microsoft Research proposes method for exchanging secure data within the cloud. Retrieved from Digital Trends: Microsoft Research proposes method for exchanging secure data within the cloud
- Wikipedia. (n.d.). Yao's Millionaires' problem. Retrieved from Wikipedia: Yao's Millionaires' problem
- Wonham, M., Fritsch, J., Xu, D., de Boer, M., & Krikken, R. (2020, Oct 9). Guide to Data Security Concepts (ID: G00731430). Retrieved from Gartner: Gartner
- Yao, A. C. (1982). Protocols for Secure Computations. 23rd Annual Symposium on Foundations of Computer Science (FOCS 1982) (pp. 160-164). FOCS. Retrieved from Protocols for secure computations
- Zhao, C., Zhao, S., Zhao, M., Chen, Z., Gao, C.-Z., Li, H., & Tan, Y.-a. (2019, Feb). Secure Multi-Party Computation: Theory, practice and applications. Information Sciences, 476, 357-372. Retrieved from ScienceDirect: Secure Multi-Party Computation: Theory, practice and applications