We get our clients when they advocate for on-prem solutions in order to protect data — but the cost of avoiding cloud technology is extremely high. In this article, we explain how our hybrid SaaS solutions, integrated with specially-developed anonymization techniques, allow them to have their cake and eat it too.
Everyone loves to talk about the scalability, efficiency and general value provided by cloud solutions. At the same time, for a crypto industry deeply concerned by frequent and numerous hacks and data-leaks, trusting cloud solutions isn’t trivial. The risk associated with a major breach can easily outweigh any benefit promised by cloud-based services.
Crypto is by no means the only industry facing this concern. Data-security incidents are ubiquitous these days, with successful attacks even against top-tier software giants, large banks and social media platforms.
So, while cloud-based solutions are becoming, by and large, the standard, some companies understandably prefer to keep as much control as possible over their and their users’ data. As a result, when sensitive information is involved, these companies still consider avoiding cloud solutions, hoping to hold all of their data on-premises (on-prem).
Common sense dictates that keeping data bunkered in one location would be safer, and allowing it to travel would be less safe. While often true, avoiding cloud solutions also carries significant costs. Major pain-points of the on-prem approach include increased system complexity, challenges to support & debugging processes, foreign IT systems that become the responsibility of the clients’ tech-team, slower and more difficult system update mechanisms - and that’s just to name a few.
These drawbacks can easily be converted into man-hours and an inflated price tag, which is never negligible.
In addition to all of the above, on-prem solutions for data management are generally a lose-lose situation for both the client and the industry. Data that remains on-prem cannot be leveraged as training-sets, critical for the development of machine-learning models that benefit our clients in the long run. Essentially, on-prem solutions bar them from taking full advantage of the precision and efficiency ML-powered compliance solutions have to offer.
At Solidus Labs, we meet this cloud-wary sentiment regularly when discussing data management with our clients. It’s particularly the case for our solutions that involve PII (Personal Identifiable Information) data, like our retail and institutional client onboarding solution.
In this blog post, we weigh the added security of keeping data strictly on-prem against the aforementioned costs. We then explain how the right, tailored security measures allow us to offer a solution that minimizes the vulnerabilities associated with cloud technology, while enabling all of its added benefits.
Enter the Hybrid SaaS Solution
In line with our mission — helping our clients grow faster and safer — neither on-prem nor traditional-SaaS approaches truly achieve the combined level of service and PII security we wanted our onboarding solution to offer.
To solve this, we developed a middle-ground option — a hybrid SaaS solution that, when done right, allows our clients to enjoy the best of both worlds.
Our approach seeks to minimize the amount of Solidus Labs software that needs to run on the clients’ on-prem system. We limit it only to processes that are directly related to PII data, such as query schedulers and data-hashers, used as part of our data-anonymization process.
In this setup, we keep the elements of the system responsible for performing the heavy-lifting within our own Solidus Labs cloud — this refers to anything from training and servicing our machine learning models, to automatically pushing alerts when manipulation is detected.
Keeping the highly complex parts in-house is what allows us to provide our clients with all the benefits usually reserved for cloud-based SaaS solutions. In tandem, the reduced complexity on the clients’ side removes as much of the burden associated with on-prem deployments as possible and makes their tech teams’ lives much easier.
That’s a clear “win-win” — clients enjoy the benefit of keeping their PII data on-prem, while also enjoying the advantages of cloud-based machine learning and services. The key, of course, is ensuring PII data is not transmitted outside of our clients’ environments, unless it’s completely anonymized. And, that the anonymized data we as a vendor do store cannot be used to infer its original content — which leads me to a word about data privacy.
Keeping Private Data Private
“Data is the oil of the 21st century” — Joe Kaeser, CEO, Siemens
As vendors of information systems, we understand and deeply respect the immense level of trust our clients demonstrate when entrusting us with their data. This recognition is one of the fundamental principles driving our technology, and it’s what pushed us to incorporate specially-developed data anonymization techniques to all of our products handling PII:
Encryption Via One-way Hashing
The first line of defense is, in fact, deployed on-prem. We use a simple remotely deployed application to handle data-obfuscation for all data leaving our clients’ environments, as well as the decryption mechanism for when it returns.
The encryption is done using a one-way hashing technique, meaning it is impossible to decrypt it without both the clients’ secret key and the hashing function that was used to perform it in the first place.
In simpler terms, for anyone outside the client (including us at Solidus Labs), sensitive information such as account numbers, names, etc. is reduced to strings of gibberish that will regain significance only after decryption has occurred. Which, as mentioned, is only possible for the client who owns the data.
Anonymization Via Distribution-preserving Dataset Transformations
Obfuscating PII is an important step, but it’s just one component of comprehensive protection against cyber-attacks aimed at retrieving personal information.
In the case of an effective attack that enables malicious actors to access the dataset, certain techniques can still be used to link a person to obfuscated data by inference. Essentially, this would include cross-matching sensitive information from areas of the data where data-diversity is not sufficient. This threat is prevalent, for example, in the field of medical research, where extremely sensitive information is, more often than not, an integral part of many datasets.
As a thought experiment, imagine getting your hands on a medical dataset, where you know a family friend’s data is stored. Querying your friend’s zip code, you can easily come up with anonymized pieces of the data which you know is likely related to your friend.
Try it out with the table below — knowing only basic information about this individual, would you be able to infer your friend’s ailment? Most likely — yes.
To address this, we’re taking a leaf out of the medical researchers’ book, applying solutions that are widely exercised in their domain, in order to maintain the integrity of the data our clients entrust with us.
Namely, these solutions include data-desensitization methods that block the ability to re-identify individuals in our data while preserving the data’s statistical-distribution.
Simply put, we actively manipulate the dataset — adding rows, and mixing values around, in a fully controlled and intentional way. As previously stated, the goal is to transform the data just enough to add sufficient data obfuscation while having a minimal effect of distributive characteristics.
Maintaining the data’s statistical-distribution is acute as it maintains the dataset’s representation of the original data. It allows us to use it for R&D purposes in the realms of machine-learning and general compliance while rendering it completely useless for cyber-attackers.
The efficiency and completeness of our transformations are then verified using domain-related metrics such as “k-anonymity”, “l-diversity” and “t-closeness.” This verification ensures we are protected against privacy attacks such as “Homogeneity attacks”, “Background-knowledge attacks” and more (for additional information, read the linked academic articles).
By applying these techniques to our data, we also enjoy some unexpected perks. For example, the level of obfuscation we achieve effectively turns this data non-identifiable, diminishing PII-related concerns, such as GDPR compliance.
To Sum It All Up
With mounting security concerns and an alarming rate of cyber attacks plaguing our industry (and others), finding a secure solution for your data, that lets you sleep at night, is an ever-increasing challenge.
In view of this, Solidus Labs offers a unique hybrid-SaaS solution that can give our clients the best of both worlds — the security and ease of mind of on-prem data management on the one hand, and the seamlessly updatable, hassle-free user experience of a cloud solution powered by state-of-the-art machine-learning algorithms that can propel your organization’s compliance and growth, forward.
We have much more to say on the topic (and data, in general), and are always happy to start a conversation. Reach out anytime at firstname.lastname@example.org