What is a Data Trust?
A data trust is a legal and technical framework for sharing and managing data. A Data Trust promotes and facilitates data sharing amongst organizations by ensuring trust in the rules, data security, confidentiality and privacy. A data trust comprises of two key elements: legal agreements and a technology platform to collect, aggregate, protect and manage the data.
Data trusts provide opportunities for the members to benefit from the aggregated data, often through new data insights or predictions which can motivate data sharing. The legal agreements cover what data is to be shared, who owns it, who can use or view it and how it is to be used. The platform is a technology solution where the actual data is uploaded, stored, aggregated and viewed. The data and/or the users who provided it can be anonymized, if required as defined by the agreement.
A data trust is a system and legal entity that manages someone’s data on their behalf. Often an independent third party they store the data and usually manages individual and collective rights of collection and access to the data.
A data trust is a legal framework for managing shared data and is an opportunity for collaboration between organizations to make that journey faster, less costly, and less risky. The Data Trust ensures that the data can be trusted, as described below, by ensuring the process and structure of data going in and out via a secure data-sharing platform.
A Data Trust facilitates the agreement where a number of parties come together and decide to share data. Which leads to the questions: What do we sign? What do we do? How do we put this agreement or mechanism to cooperate in place?
The Data Trust provides the software platform to collect, share, control access and mine the data to find trends and insights essential for all parties.
“Sharing data once is easy. Sharing data on an ongoing basis and as part of a business process is a more difficult challenge.”
– James Doyle, Creme Global
Benefits of a Data Trust
Data sharing and access – A single source of truth!
If your first question when you need the data is does it exist? Who has the spreadsheet? Is it up to date? Then you are probably in trouble. A Data Trust provides one central single source of truth for your data across all stakeholders, trustees, researchers and participants.
Stakeholders and data access and data use
Who shares and has access to the data? Is it industry? Is it a private industry? Is it the regulators? Partners? When you talk about a Data Trust,who actually is putting that data in there and getting it back out again?
With a Data Trust you have full security and control over who can access the data, how much, where and what they can do with it.
Safer products (food and cosmetics)
When you look at aggregate data from all parties, suppliers and manufacturers you get to stand back and see the full picture. You get to see what’s going on everywhere. You can more easily discover anomalies, trends and risk factors and as a result, you can produce safer products. All by sharing data.
Data privacy and anonymity
Who can see the data and what they can see is an important question. Often there is quite sensitive data feeding into the Trust. Who can access it is absolutely in an entirely user-defined case dependent. It’s all about the granularity of who gets to see what and where, and agreeing that up front.
Aggregating data for insights
Data trusts can aggregate data from many different organizations and sources thus providing a larger data set which contains more insights and predictive power than any individual organizations data on its own.
Data security
The companies Creme Global work with have branches all over the globe. With so many employees the security to access data is the very same as if it was a public database. It requires similar security functionality like assigning user names and tracking and designating access. With a couple hundred thousand employees. It requires the same level of security as any large public database. We ensure multifactor authentication is in place even within a company to control access data from all across the globe.
With multiple stakeholders and companies getting involved. There’s a lot more sensitivity around anonymizing the data. Who gets to see what kind of aggregated data. And even within one company only certain people get to see certain levels of data, or can access or download it.
In the world of food and safety, you’re in the world of people misinterpreting data and litigation, so you have to be really careful about who gets access to what and who can see what. Even internally in the company.
Data management
A Data Trust must look at version control and data validation issues to ensure data hygiene across an organization and partners. It must compensate or restrict the possibility of having 15 different spellings for Walmart or Salmonella. It is important to ensure data standardization and interoperability. Ensuring people that are talking the same language. But that doesn’t mean defining exactly the right template and everyone fills in exactly the same. It’s more about understanding the terminology. That’s where the effort goes into a Data Trust. Into the people involved rather than the engineering side.
Visualization and insights
When you take data on its own you can get so much insight from it simply by putting it in a chart. You don’t have to do anything fancy with it at all.
Looking at a table, the human eye can‘t pick up anything. Put that information into a chart, or into a plot, all of a sudden, you can tap into the power of the human eye, which is really, really powerful at seeing a trend or observing an anomaly or something that is really difficult to train a computer to do, but through the combination of very simple data and a bit of visualization, the human eye is built around being able to spot something quickly.
Analytics using Machine Learning and AI
Data trusts facilitate the collection of large volumes of clean, validated data – perfect for training Machine Learning (ML) models, a type of Artificial Intelligence (AI) model capable of learning from the data they are trained on.
When ML/AI models are combined with the continuous ingestion of validated data at scale, they are capable of detecting complex trends that could be imperceptible to the human eye. These models are best used to provide insight that subject matter experts can use to make informed decisions. By taking the cognitive burden off of the humans, and allowing ML/AI models to process the bulk data collected via the data trust, experts can spend more time considering the patterns in the data rather than the data itself.
In short, whilst ML/AI encompass a whole host of technologies, the data collected in data trusts are primarily used to arm leaders with the insights they need to make the best decisions possible.
Data collection
Organizations store data in multiple different formats such as PDFs and spreadsheets, or perhaps you might have some historical data on paper. There might even be something coming in live from a sensor. Use a Data Trust to collect it. And when you get that data, structure all that data, get it into your platform and begin getting insights from it.
How do you create a Data Trust agreement?
To create a Data trust agreement among different parties you must look at the following:
- Identify the stakeholders. Be sure to include those who contribute to the data and those who use or consume it.
- Agree a common set of rules for data security/privacy, including what is shared, how it is shared, who contributes, and who accesses it. Define the needs, responsibilities, and expectations of each party.
- Decide on a scalable and reliable process for intaking data from multiple sources on to a single Data trust platform.
- Establish the goals for the collaborative programs and initiatives to come from the Data Trust.
- Agree on the access stakeholders, leaders, and decision-makers in your jurisdiction can get access to.
- Identify and eliminate the inconsistencies across the types of data.
- Decide on an independent private trust to provide the platform and manage your data. You need a platform to act as a universal adaptor, and all parties need to know how to use it.
However, in our experience, the main issues around creating Data Trust is all around coming up with the business case and guiding stakeholders along the process. They’re the barrier and we at Creme Global are happy to help.
Data trusts examples
Regulators
There is a major shift occurring in the regulatory sector. The “New Era For Smarter Food Safety” by the US FDA is a prime example of this. Enabling data-driven decisions is central to this mandate.
Creme Global was recently awarded a sole source contract by the US FDA bringing their unique internationally recognised expertise in developing and deploying a Seafood Data Trust. The FDA is responsible for ensuring that the nation’s food supply is safe, sanitary, and honestly labeled.
“We are extremely pleased to have been awarded this contract by the US FDA. This is a very exciting project which is perfectly aligned with our company mission and goals. It paves the way for smarter collection and use of data for the benefit of all.”
– Cronan McNamara, CEO, Creme Global
Western Growers
After the Romaine lettuce and Salinas valley outbreaks, among others, a mandate was given by the FDA for growers to improve food safety practices.
Western Growers have been at the forefront in driving food safety among it’s growers and took this mandate to heart. They really wanted to understand how to implement best practice among its members. Working with Creme Global they set up a Data Trust with a data-sharing platform to understand what was going on. They needed to figure out what was happening across the growers, supply chain and processing.
If there is a risk of E. coli contamination where is it coming from? Is it from nearby ranches? Is it weather dependent? Does it depend on what factories the food goes through? Is it dependent on wild animals coming through the farms? Or are there other things going on?
“In God we trust, all others bring data”. – Edward Deming
Without the data you’re another person with an opinion. The purpose of the platform was to get away from opinions. Come with data, come with facts. See what’s going on. And really try to get to the bottom of it.
The platform pulls very sensitive data in from a lot of different sources. Western Growers worked with Creme quite deeply to figure out who gets to see what, and at what granularity. There was Western Growers the organization but the actual users are the Growers and various other people along that supply chain. Some of them are customers of each other and so you have to be really careful, step back up from the data and look at how anonymized you make it. That was the challenge and exactly what a Data Trust is designed for.
The big goal for Western Growers is to optimize the most cost-effective preventative measures and ensure food safety for consumers. But until you can identify the risks then you don’t really know what the most effective measures are to implement it.
Food growing is a tight-margin industry, like a lot of food industries. In order to reduce costs you’re trying to figure out, what’s the most cost-effective mechanism for implementing food safety and implementing food protection measures. Sometimes it requires a fairly substantial capital investment. So, if you’re relying on your own opinions instead of data in terms of what may work, it’s much more challenging to discover what’s going to give the best return investment in terms of implementing the solution. This becomes even more challenging when you are bringing in opinions from partners and suppliers.
Using a Data Trust and building on the shared data to make the best decisions provided an opportunity for companies to save significant quantities of money by making the right decision as to what works and what does not.
The revenue impact is compounded when you discover that the regulators are going to put very restrictive food safety controls in place because of the absence of clear data. The regulator has no choice but to take very conservative estimates as to what’s actually happening. The result is that those measures can be quite punitive on industry and can be unworkable. This is where industry and the regulator, In this case, Western Growers and the FDA, can come together, share actual data and can maximize the quantities in the appropriate location. Then, the opportunity arises to maximize revenue and consumer safety for all parties.
Fiin
Following a food safety scandal in the UK and Europe about 8 or 10 years ago a report was published by professor Chris Elliott from Queen’s University, commissioned by the Secretaries of State for Health and the Department for Environment, Food and Rural Affairs (DEFRA). Chris was asked to conduct a review into the integrity and assurance of food supply networks. The report highlighted the industry’s needs to get together and share data to prevent this happening again.
In response, The Food Industry Intelligence Network (fiin) was established for industry to create a ‘safe haven’ to collect, collate, analyze and disseminate information and intelligence to protect the interests of the consumer. A Data Trust.
The platform is gathering inspection data from member companies that include Walmart, Mars and Nestle. Originally this data was stored and shared using spreadsheets and email. FIIN approached Creme Global for a more robust way to do this.
One of the novel things about this project and the data is that it required a double blind anonymization step. Because industry are sharing incoming inspection data there were natural concerns from a regulatory perspective in relation to this data. Even we as data aggregators in Creme Global do not know who submits the data. There’s a mechanism that facilitates data anonymization from the first step including usernames for who is submitting the data.
Looking back at the old method of sharing data for a moment.
Before the Data Trust, data validation was very difficult and time consuming. It was not possible to build in robust data validation in a spreadsheet, which meant there were often errors in the data. As the submitters were anonymous, all queries would have to go via the law firm to the company that submitted the data. It was a very inefficient process and could take over a month to get a response.
Now, through the Data Trust platform, that communication can happen in real time on the secure anonymized chat feature on the platform. Not to mention that the need for manual data validation has all but been eliminated as the submission portal has a very robust and rigorous data validation system built in. The user can self-correct before they are allowed to submit the data to the master dataset.
In FIIN there are 55 member companies interacting with the Data Trust. In addition there are a lot of third party labs that are doing the testing for this data. With the platform, access is automated via APIs to automatically ingest and access this data. Again removing manual access and the errors that come with that access.
Different levels of access, permissions and approval was vital for all members.
There’s a lot of people involved in submitting the data, especially with third-party labs.
If you’re responsible for what goes into the database from your own company perspective, you want to get a final look at it before it gets submitted into the master database. So you would have a data submitter and a data approver. Usually there are many submitters, one approver per company, and they’re the ones that go down through it before it gets submitted to sign off.
The FIIN data trust then sits in the law firm. The reason for the law firm is to give the data some legal privilege and protection, and member companies some distance and protection, while still getting the benefits from the data.
FIIN is structured to work on the issue of food integrity. It does not share data about food safety which places it outside the remit of food regulators and being compelled to hand over all data. Having the legal entity in there gives it some legal privilege.
Double blind anonymization offers additional protection for all members.
This structure is encouraged by Food Safety regulators around the world.
Ron McNaughton is a former police Detective Chief Inspector and now works in the regulatory office in Food Standards Scotland, their equivalent of the FDA.
With his police experience, Ron saw the value of information. He saw that by facilitating the industry to collect this data and giving them a little bit of breathing space they would be able to deal with it themselves. He could see that there was way more value in the industry collecting and sharing this data in a way they could address it. The alternative and traditional approach was to take a punitive approach from a regulatory perspective which discouraged collecting the data in the first place.
Food Standards Scotland was the first one to agree to facilitate that the consortium of companies could share the data and they would not demand to see the raw data. The Food Safety Authority of Ireland (FSAI) and the Food Standards Agency (FSA) UK also agreed to this.
As you can see the main challenge around a Data Trust is forming the agreement and the business reason for organizations. The technology is quite straightforward in comparison.
FIIN members are now getting insights that they couldn’t possibly get on their own. The wisdom of the crowd. And now that the platform and basic data sharing is in place it enables FIIN to start pulling other data sets into the platform to augment the members data, FIIN can now move from historic data to prediction and prevention.
That’s quite an exciting project.
RIFM
Creme Global was approached by two different organizations in the world of Cosmetics and personal care. The Research Institute For Fragrance Materials (RIFM) based in New York, and Cosmetics Europe.
RIFM is the international scientific authority for the safe use of fragrance materials. RIFM generates, evaluates and distributes scientific data on the safety assessment of fragrance raw materials found in personal and household care products.
How fragrances are made and end up in products is a complex process.
The challenge was to figure out how to set safe limits across the industry and tally the data of intake limits for all of the different parameters involved.
Creme Global are doing the exposure science and is represented by the sigma symbol in this diagram. But in order for us to get the most accurate data we set up our Data Trust and sharing platform to facilitate the fragrance and cosmetic manufacturers to share their own formulations.
You can imagine how extraordinarily business sensitive this data set is.
Each company is essentially sharing their unique and very secret ingredients, formulations and concentrations for their best-selling products around the globe. So this needed to be done in a secure, confidential and anonymous manner.
We then combine those data sets with the use habits and practices from consumers. Information on what quantity of different products people are using all across the US. This allows RIFM to estimate what concentration of chemical from all of those different sources is being used by a person in their daily life.
Also added to the mix is the scientific literature analyzing the retention or absorption of different chemicals into the body and how they are retained over time.
The Data Trust pulls all of that together. And then at the end of it you get an exposure model that all RIFM members can use.
Manufacturers and consumers get safer products onto the market. And companies get to maximize revenues because now they know where the maximum concentrations can be.
And when manufacturers run the model in reverse they see what concentrations and what headroom is available for different chemicals in different products to create new formulations.
Data trust model
How do you get started?
Doing these types of data projects and implementing them is a balance between data standardization and interoperability.
You can have the most flexible system where everybody can use and access whatever they like. That is really easy to roll out and to allow people to get involved.
Or on the opposite site you provide users with a template explicitly stating what you can and cannot do. There’s only dropdowns everywhere that users must select from. It’s absolutely formulaic, rigid, and it’s completely inflexible. This ends up being really difficult for an industry to implement on site.
So that’s why it’s always a balance. You’re always trying to find where the needs are in the spectrum. And what level of flexibility can we need to facilitate it.
Thankfully you are rarely starting on a green field. Most organizations have a ton of historic data you can build on.
For the people you need to get an agreement in place, that’s where most of the discussions actually happen. Coming to agreement on what is most easy to implement but also not having the complete wild west of everybody using every different format and really being drowned in data engineering setting it up.
At its core, implementing a good Data Trust is aligning on terminology.
That’s where the time and effort needs to be spent. Everybody needs to understand what this term means in this scenario, for this circumstance. And often you have global languages to add to the complexity of the matter. Especially when dealing with it as a multinational.
The end goal for a Data Trust is insight that’s going to lead to either capital investment, a recall, or whatever it might be. So you need to be sure everybody understands that they are talking about the same thing in the same way.
Human memory is flaky. We recollect stuff differently over time. When you’re talking about projects that run over the long haul your frame of references changes by the time they relook at the data. In six months, a year, or even 18 months time.
Having rigorous data in place eliminates confusion when you need it.
Find a business benefit
To find agreement in an organization and ensure success for a Data Trust you need to find the business benefit. All the other stuff can be fixed. All of the other mechanisms can be overcome.
Once you identify the benefits, a company will get over the hurdle of figuring out the rest. They will ensure new business processes are put in place to facilitate this mechanism of sharing data. Without a business benefit, it does not become somebody’s job. And if it is not somebody’s job it does not get done.
The added benefit of a Data Trust is that it facilitates better communication interaction up and down the supply chain. Everyone is speaking the same language and seeing the same data.
Data trust companies and Data Trusts as a Service
Data Trusts as a Service (DTaaS) is a cloud-based platform that enables multi-party data sharing and connects data subjects, data custodians, and data consumers in a controlled and secure environment.
It is built upon a legal framework with an agreement between users upfront and an independent legal entity that manages the data on their behalf.
How can you trust your data?
A Data Trust and having trust in your data are two sides of the same coin.
There is so much data out there. In your own organization, and across partners and suppliers. There is huge value in the data but getting access, using it, and getting insights from it, is difficult. Technically and bureaucratically.
Data trust means having confidence that your organization’s data is clean, reliable and up to date. It means you can pull out reliable insights and support well-informed decisions around your organization, market and product.
By ensuring data hygiene and building a culture around it can ensure improved operations, streamlined decision making and drive innovation across the organization.
But it means taking a proactive approach to ensuring your data can actually be trusted and implementing that approach across all levels of your organization.
How do you measure data trust?
The Data Management Association of the UK defines six dimensions of data quality:
- Accuracy
Data and people entering the data can be messy. For example, we have seen uncounted variations of the word ‘Salmonella’. Which can make it very difficult to find and report on. Different people or global offices may enter dates using the US date format MM/DD/YYYY or using the European DD/MM/YYYY format which may result in very different date ranges for data and results. - Completeness
Data is considered complete when all the data required for a particular use is recorded and available. It’s not about ensuring 100% of your data fields are complete. It’s about determining which data is critical and which is optional. Completeness is not the same as accuracy as a full data set may still have incorrect values. You may have full information about the safety report, but this does not mean that the information is correct.
- Consistency
As an organization can you ensure that all Stakeholders, trustees, researchers and participants are using the same format and definitions? Using the date example above means different people could be using 11/12/2024, 12/11/24, 24-NOV-12 and even 12th November 2024. - Timeliness
Timeliness indicates whether the data is available when expected and needed. Timeliness can mean different things for different uses. In a factory environment timeliness is critical when monitoring product safety before each batch is shipped out. However, it may be acceptable to use previous quarterly figures to forecast production needs and plan future output. Data quality may diminish over time. Timeliness is important as it adds value to information that is particularly time sensitive.
- Uniqueness
No item or entity instance is recorded more than once based upon how that item is identified
- Validity or conformity
Validity is defined as the extent to which the data conforms to the expected syntax (format, type, or range). For example, every email address must have an ‘@’ symbol; zip / postcodes are only valid if they appear in the designated postcode list. Or the date value for a month should be between one and twelve. Having valid data means that it can be used with other sources. It also helps to promote the smooth running of automated data processes.
Organizational Culture
A strong organizational culture plays a crucial role in fostering collaboration and innovation, particularly in the realm of food safety. As highlighted in a recent article by Creme Global on securing leadership and organizational buy-in for data sharing , creating an environment that values data-driven decision-making can significantly enhance risk forecasting and mitigation. Organizations that embrace transparency and collaboration across the supply chain are better equipped to identify potential hazards early on. By integrating predictive analytics and real-time data sharing, companies can create a proactive approach to food safety, addressing risks before they materialize. This culture of shared responsibility, spanning from producers to processors and supply chain partners, not only strengthens safety protocols but also helps organizations stay ahead of evolving regulatory requirements. As industries face increasing pressure to ensure compliance, building a culture that supports data sharing and innovation can offer both enhanced safety and new market opportunities.
Next steps
If you are interested in finding out more or have any questions please get in touch.
We are happy to help.