A group of engineers and developers with backgrounds from the National Security Agency, Google, and Amazon Web Services are working on Gretel, an early-stage startup that aims to help developers safely share and collaborate with sensitive data in real time. TechCrunch reports:
It’s not as niche of a problem as you might think, said Alex Watson, one of the co-founders. Developers can face this problem at any company, he said. Often, developers don’t need full access to a bank of user data – they just need a portion or a sample to work with. In many cases, developers could suffice with data that looks like real user data. “It starts with making data safe to share,” Watson said. “There’s all these really cool use cases that people have been able to do with data.” He said companies like GitHub, a widely used source code sharing platform, helped to make source code accessible and collaboration easy. “But there’s no GitHub equivalent for data,” he said.
And that’s how Watson and his co-founders, John Myers, Ali Golshan and Laszlo Bock came up with Gretel. “We’re building right now software that enables developers to automatically check out an anonymized version of the data set,” said Watson. This so-called “synthetic data” is essentially artificial data that looks and works just like regular sensitive user data. Gretel uses machine learning to categorize the data – like names, addresses and other customer identifiers – and classify as many labels to the data as possible. Once that data is labeled, it can be applied access policies. Then, the platform applies differential privacy – a technique used to anonymize vast amounts of data – so that it’s no longer tied to customer information. “It’s an entirely fake data set that was generated by machine learning,” said Watson.
The startup has already raised $3.5 million in seed funding. “Gretel said it will charge customers based on consumption – a similar structure to how Amazon prices access to its cloud computing services,” adds TechCrunch.