Generating Private Synthetic Data via LLM APIs

Research demonstrating how differentially private (DP) synthetic tabular data can be generated using only API access to large language models.

Addresses the challenge of creating private synthetic data when model weights are inaccessible
Proposes novel algorithms that maintain privacy while preserving data utility
Enables organizations to leverage powerful third-party LLMs for sensitive data applications
Balances privacy-utility tradeoffs for practical deployment

This research matters because it democratizes privacy-preserving data synthesis, allowing businesses to generate synthetic data with strong privacy guarantees even without direct access to model internals or specialized expertise.

Is API Access to LLMs Useful for Generating Private Synthetic Tabular Data?