The use of real data in generative AI training processes has recently been the subject of many debates (and some lawsuits). It has raised numerous questions concerning its ethical use, the issue of protection of privacy and private property, and, more generally, the near impossibility of controlling how it is used by AI models.In this context, synthetic data has increasingly begun to present itself as an alternative. This type of data, artificially created by algorithms, mimics real data without any of the ethical implications. Synthetic data also has several considerable advantages over real data: it is more controllable, it allows for low-cost expansion of datasets, and it aids in improving generative AI models.From a semiotic perspective, synthetic data can be seen as a simulacrum of real data and has interesting ramifications regarding the issue of enunciation, as it challenges our understanding of authenticity and representation. This proposal aims to investigate the semiotic status of synthetic data and the implications of its use, with particular reference to the matter of veridiction, the construction of meaning, and the role it plays in the broad discourse on AI. To this purpose, specific cases of generative AI models of image generation that use synthetic data, either as a substitute for or in conjunction with real data, will be analyzed. The goal of this proposal is to contribute to the understanding of this transformative and innovative tool.