My intervention will address a theory of machinic enunciation which, as in the case of corporeal enunciation, has a specific status within this theory of language. I will consider image generation models such as Midjourney and DALL-E in order to understand what kind of visual perception is at stake in this kind of visual utterance produced by a hybrid enunciation, determined, on the one hand, by a natural language prompt chosen by a user and, on the other, by a machinic competence that depends on two actants: the algorithms chosen, and the databases called upon. We are interested in determining the relationship between the databases and the utterances produced, and even between the system and the process. Indeed, databases are encyclopedias of a special kind: they do contain global knowledge already produced and available for further manipulation, but above all, in the case of generative artificial intelligence, they become a system of visual expression substances to be actualized when one uses some form of verbal content as a prompt. It is the algorithm that determines the correspondence between the form of a verbal content and a form of visual expression, the result of a recombination of the correspondences between verbal and visual learned during the training phase. The aim of this intervention is to study the very specific enunciative process of generating images from delegated perception in databases.