Blog

5 elements to improve ROI and maintain control over data

February 11, 2022

By David Maher

The era of big data and the cloud has hit an impasse. We’ve essentially just recreated our previous silos of data by stuffing even more data into cloud repositories. To truly realize the opportunities promised by the big data movement, we have to move to the next level. This won’t be large standalone silos of data controlled by one organization—the next level will be data cooperatives supported by trusted data exchanges built on distributed data sets.

One major issue has stymied big data projects. New revenue streams as well as the vast new businesses made possible by the increased use of data come only when multiple stakeholders can both contribute data and collaborate in its use. These stakeholders not only include business partners but also regulators, consumers, and even potentially, competitors. Each of these stakeholders not only has varied interests but there are also wide variations within each type of stakeholder as well as within organizations. These interests can also change depending on the nature of the interaction. Accordingly, trusted data exchanges will dynamically support an extremely broad spectrum of data access and rights. Today’s big data projects are just not set up to support these sorts of data cooperatives.

The centralized nature of current big data projects brings its own issues. The data has to come from somewhere and the process of transferring data entails not only increased costs but also security and governance risks. The increasing reliance on public clouds run by third parties adds additional complexities due to their lack of interoperability and high costs of transferring data out of these clouds.

The trusted data exchanges needed to support these sorts of data cooperatives won’t have to rely on new untested technologies. They are possible to create using technical approaches based on proven technologies. Here’s how control over data has the potential to reshape how companies consider and create trusted data exchanges.

Data Point No. 1: A highly scalable policy-based data governance layer

Here, data governance refers to the secure management of multi-party identity verification and access to data. This capability is different from traditional IAM in that it’s focused on access to data, not applications. The layer must also fully support a robust attribution and auditing function.

Data Point No. 2: Flexible policy-based data management

The data governance layer will use very flexible policy-based data management technology. This technology will allow all parties, from individuals to large organizations, to have fine-grained access to data-derived assets controlled by the trusted data exchange based on policies. These policies can be derived from internal policies, regulatory compliance requirements, or other factors.

Data Point No. 3: Secure algorithm execution environments

Similar to secure execution environments in operating systems, secure algorithm execution environments are sandboxed environments where algorithms, which may be provided by third parties, are strictly controlled as to which data they can access and where they can send it. These environments can be set up to only provide desired data analysis results to partners without exposing the raw data used by the algorithms to produce the results.

Data Point No. 4: Data virtualization

Data virtualization technologies allow queries to be sent to distributed datasets. This avoids the need to transfer data from its original location to data lakes or similar centralized data repositories, maintaining maximum control over data for its rights owners.

Data Point No. 5: Secure APIs for data ingress/egress

All the parties contributing data to the trusted data exchange can manage their data contributions via secure APIs.

These elements are essential to bringing about the platforms trusted data exchanges need to gain the trust and widespread adoption by the many stakeholders who will need to participate. And with this participation, truly enable the promise of big data.

About the author

David Maher is the CTO of Intertrust. He has over 30 years of experience in secure computing and is responsible for Research and Development at Intertrust. In addition, he is currently President of Seacert Corporation, a certificate authority for the Internet of Things, a developer of application security software, and Co-chairman of the Marlin Trust Management Organization which oversees the world’s only independent digital rights management ecosystem.

What OpenELM language models say about Apple’s generative AI strategy

Will infinite context windows kill LLM fine-tuning and RAG?

How to turn any LLM into an embedding model

AI in healthcare: Real-world applications for cost-savings and innovation

Stanford’s ReFT fine-tunes LLMs at a fraction of the cost

Fine-tune a Llama-2 language model with a single instruction

What to know about the rising threat of deepfake scams

4 reasons to use open-source LLMs (especially after the OpenAI drama)

No-code retrieval augmented generation (RAG) with LlamaIndex and ChatGPT

How to make your LLMs lighter with GPTQ quantization

What to know about open-source alternatives to GPT-4 Vision

The complete guide to LLM compression

A simple guide to gradient descent in machine learning

The complete guide to LLM fine-tuning

What is low-rank adaptation (LoRA)?

What to know about the security of open-source machine learning models

Understanding the impact of open-source language models

What we learned from the deep learning revolution

AI21 Labs’ mission to make large language models get their facts…

Democratizing the hardware side of large language models

5 elements to improve ROI and maintain control over data

Like this:

Leave a ReplyCancel reply

Like this:

Leave a ReplyCancel reply

Discover more from TechTalks