pas-de-loup@t-online.de on 20 Dec 2000 03:16:25 -0000 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
<nettime> A Visual Archive of Cinematographical Topics |
Wolfgang Ernst / Harun Farocki Joint project: A VISUAL ARCHIVE OF CINEMATOGRAPHICAL TOPICS Visual archiving: Sorting and storing images Cultural memory of images has traditionally linked images with texts, terms and verbal indexes. Confronted with the transition of images into digital storage gradually non-verbal methods of classification gain importance. It is not the archival question which poses a problem to video memory; rather the search methods used to find pictorial information are still limited to models which habe been developed for retrieving texts: Typically, available methods depend on file Ids, keywords, or text associated with the images. <...> they don´t allow queries based directly on the visual properties of the images, are dependent on the particular vocabulary used. <Flickner et al. 1997: 7> The question arises which new kind of knowledge will exists exclusively in the form of images, which part of traditional knowdledge can be transformed into images and which part might just vanish. Techno-image archaeology aims at rethinking the notion of images from the vantage point of the process of archiving. The archive here is seen as a medium of storage und form of organization of all that can be accessed as knowledge. The function of archives of images such as museums or data banks exceeds by far mere storage and conservation of images. Instead of just collecting passively and subsequently archives actively define what is meant to the archivable at all. In so far they determine as well what is allowed to be forgotten. In terms of technology an archive is a coupling of storage media, format of contents and address structure. In this case the images is to be conceived as data format. Methododically this implies leaving behind the contemplation and description of single images in favour of an investigation of sets of images. In his 1766 essay Laocoon G. E. Lessing discussed the aesthetic conflict between the logic of language and the logic of images in terms of a genuinely muli-media semiotics: pictura is no longer - as declared by Horace - ut poiesis; time-based media (like dramatic speech and linear narratives) differ from space-based media (like simultaneous pictures). The digitalization of images today provides a technical basis of inquiry into this conflict, so that this investigation can be grounded in the terms of the medium computer. It would not make sense to retell a teleological story of images processing which finally reached its aim in digitalization; on the contrary this history of images is to be revised from the present point of view of digitalization. How can, for example, archives be related to algorithms of image processing, of pattern recognition and computer graphics? In sharp contrast to hermeneutics the media-archaeological investigation of image archives do not take images as carriers of experiences and meanings. The relation between vision and image cannot be taken as the guideline of investigation, since image processing by computers can no more be re-enacted with the anthropological semantics of the human eye. The methodological starting point is rather a theory of technical media based on Michel Foucault´s discourse analysis and Claude Shannon´s mathematical theory of communication, as well as practices and notions of data-structure oriented programming. The artes memoriae have been visual techniques of memorization from the rhetorics of the antique to the renaissance. Museums, collections, images of picture galleries, catalogues since have always dealt with the programming of material image banks. The strive for visual knowledge in the - literally - age of enlightenment in the eighteenth century led to visual encyclopedias and their visualizations (like the planches, i. e. the visual supplement of the big French Encyclopédie edited by Diderot and d´Alambert). Photography then has been the switching medium from perception to technology, creating the first technical image archives. Movies then have been archives themselves (Hollywood and the rules of image sequences). When it comes to (re-)programming image-oriented structures in digital data bases of given images archives, priority has the development of a visually adressable image archive. By combining Multiresolutional Image Representation with simple Octree structures a variable archive module might be applied. This allows to test the application of different algorithms creating different visual sequences and neighbourhoods. Most operators of image processing and pattern recognition such as filters and invariant transformations can be integrated in the structure of the data base in order to make cluster of images accessable. The next stept will be the development of an interactive and visual agent capable of "intelligent" retrieval of images and visual sketches in large data banks. Navigating images on the borderline of digital addressability While occidental culture has for the longest time practically subjected the memory of images to verbal or numerical access (alphanumerical indexing by authors and subjects, and even Sergej Eisenstejn subjected films to the idea of deciphering its virtual story-book by transcribing moving images into a score - a kind of reverse engineering of the written script), the iconic turn, predicted by W. T. Mitchell, is still to come in the field of image-based multimedia information retrieval. In media culture there is still the problem that audio-visual analogue sources can or should not be addressed like texts and books in a library; these resources form a rather unconquered multi-media datascape. Addressing and sorting non-scriptural media remains an urgent challenge (not only for commercial TV) which after the arrival of fast-processing computers can be matched by digitizing analogous sources. This does not necessarily result in better image quality but rather in the unforseen option of addressing not only images (by frames) but even every single picture element (pixel). Images and sounds thus become calculable and can be subjected to algorithms of pattern recognition - procedures which will "excavate" unexpected optical statements and perspectives out of the audio-visual archive which, for the first time, can organize itself not just according to meta-data but according to its proper criteria - visual memory in its own medium (endogenic). By translating analogous, photographic images (of which film, of course, still consists) into digital codes, not only images become adressable in mathematical operations, but their ordering as well can be literally calculated (a re-appearance of principles of picture-hanging envisaged by Diderot in the eighteenth century). The subjection of images to words is not just a question of adressing, but of still applying the structuralist-linguistic paradigm on audiovisual data as well. Within the medium film, the practice of montage (cutting) has always already performed an kind of image-based image sorting (by similarity, f. e.). Cutting has two options: to link images by similarity of by contrast (Eisenstein´s option). Only video - as a kind of intermediary medium between classical cinema and the digital image - has replaces mechanical addressing of cinematographic images by different means (timecode), offering new options of navigating in stored image space. Automated digital linking of images by similarity, though, creates rather unexpected, improbable links: which are, in the theory of information, the most informative, the least redundant ones. It also allows for searching for the least probable cuts. Jurij Lotman explained in his film semiotics: "Joining chains of varied shots into a meaningful sequence forms a story." This is being contrasted by Roger Odin´s analysis of Chris Marker´s film La Jetée (1963); how can a medium, consisting of single and discrete shots, in which nothing moves internally - photographic moments of time (frozen image) -, create narrative effects? Cinematographic sequences are time-based, but film as such - the cinematographic apparatus - "has no first layer of narrativity", when being looked at media-archaeologically <Gaudreault 1990: 72>. The absence of reproduction of movement <...> tends to block narrativity since the lack of movement means that there is no Before/After opposition within each shot, the narrative can only be derived from the sequence of shots, that is from montage. <Odin, as quoted in: Gaudreault 1990: 72> What happens if that sequence is not being arranged according to iconological or narrative codes any more, but rather in an inherently similarity-based mode, leading to a genuinely (image- oder media-)archaeological montage? After a century of creating a genuinely audio-visual technical memory emerges a new cultural practice of mnemic immediacy: the recycling and feed-back of the media archive (a new archival economy of memory). With new options of measuring, naming, describing and addressing digitally stored images, this ocean asks for being navigated (cybernetics, literally) in different ways and not any longer just being ordered by classification (the encyclopedic enlightened paradigm). This state of affairs has motivated the film director Harun Farocki and the media scientists Friedrich Kittler and Wolfgang Ernst to design a project of performing an equivalent to lexicographical research: a collection of filmic expressions. Contrary to familiar semantic research in the history of ideas (which Farocki calls contentism , that is: the fixation on the fable, the narrative bits), such a filmic archive will no longer concentrate on protagonists and plots and list images and sequences according to their authors, time and space of recording and subject; on the contrary, digital image data banks allow for systematizing visual sequences according to genuinely iconic notions (topoi or- for time-based images, a different notion of Bachtian chrono-topoi) and narrative elements, revealing literally new insights into their semantic, symbolic and stylistic values. This is exactly what the film maker Harun Farocki strived for when in summer 1995 at the Potsdam Einstein Foundation he proposed the project of a kind of visual library of film which would not only classify its images according to directors, place and time of shooting, but beyond that: digitally systemizing sequences of images according to motives, topoi and, f. e., narrative statements, thus helping to create a culture of visual thinking with a visual grammar analogous to linguistic capacities. Different from the verbal space there is still an active visual thesaurus and grammar of linking images lacking; our predominantly scripturally directed culture still lacks the competence of genuinely filmic communication ("reading" and understanding). Genuinely mediatic criteria for storing electronic or filmic images have been listed by the director of the Federal Archives of Germany (Kahlenberg) and the chief archivist of the nationwide public tv-channel ZDF (Schmitt); next to economically driven criteria (recycling of registered emissions) historically-semantically-iconographically "inhaltsbezogene Kriterien" they name 1. "Dominanzereignisse" (historical event-centered), 2. "politische und soziale Indikationen längerfristiger Entwicklungen und Tendenzen", 3. "Soziale Realität im Alltag" follows under "gestaltungsbezogene bzw, ästhetische Kriterien" l. "Optische Besonderheiten" (remarkable camera perspectives, such as "Bildverkantung und extreme Auf- oder Untersicht"), 2. "die dramaturgische Gestaltung von Bildsequenzen" (cut, opposition of single frames), 3. "besondere Bildmotive" (landscapes, people) - close to Farocki´s topoi. Last but not least, of course, "Medientypische Gesichtspunkte" - the very proper media archives, documenting the history of a tv channel itself. On the market, though, digital video browsing still seeks to reaffirm textual notions such as the story format as segmentation of a video sequence, such as the news story, "a series of related scenes with a common content. The system needs to determine the beginning and ending of an individual news story." Beginning and end though, in technical terms, are nothing but cuts here. With film, time enterns the pictorial archive. Once being digitized, even the single frame is no more a static photographic image, but a virtual object which is constantly being re-inscribed on the computer monitor in electronic refresh circle light beams. While the visual archive has for the longest time in history been an institution associated with unchangeable content, the memory of (time-based) images becomes dynamic itself. Thus, images get a temporal index. The equivalent for iconographic studies of images is the search for macroscopic time objects in moving images, "for instance larger sequences constituting a narrative unit" . The media-archaeological look on film, on the contrary, segments serially. What do we mean by the notion of "excavating the archive"? The answer is media-archaeology instead of iconographical history: What is being digitally "excavated" by the computer is a genuinely media-mediated gaze on a well-defined number of (what we call) images. In a different commercial news analysis system, Farocki´s notion of kinetic topoi occurs: "Each segment has some informative label, or topic. It is this kind of table of contents that we strive to automatically generate" (i. e. by topic segmentation). Of course, "motion is the major indicator of content change", a zoom shot f. e. is best abstracted by the first, the last, and one frame in the middle <Zhang et al. 1997: 143>. "Current video processing technologies reduce the volume of information by transforming the dynamic medium of video into the static medium of impages, that is, a video stream is segmented and a representative image is ex-<...>" ; that is exactly what indexing by words (description) does. How to avoid freezing the analysis into a data bank? "Image analysis looks at the images in the video stream. Image analysis is primarily used for the identification of scene breaks and to select static frame icons that are representative of a scene" <Hauptmann / Witbrock 1997: 222>, using color historgram analysis and optical flow analysis and speech analysis for analyzing the audio component (which can be done by transforming the spoken content of news stories into a phoneme string). Thus the image stream is not subjected to verbal description but rather accompanied by an audio-visual frame analysis. Retrieval and browsing require that the source material first be effectively indexed. While most previous research in indexing has been text-based (Davis 1993, Rowe et al. 1994), content based indexing of video with visual features is still a research problem. Visual features can be divided into two levels <vgl. Erwin Panofskys drei ikonologische Bildschichten>: low-level image features <radikale Oberfläche>, and semantic features base don objects and events. <...> a viable solution seems to be to index representative key-frames (O´Connor 1991) extracted from the video sources - but what is "representative", in that archivo-archaeological context? "Key frames utilize only spatial information and ignore the temporal nature of a video to a large extent" <Zhang et al. 1997: 149>. The basic unit of video to be represented or index is usually assumed to be a single camera shot, consisting of one or more frames generated and recorded contiguously and representing a continuous actionin time and space. Thus, temporal segmentation is the problem of detecting boundaries between consecutive camera shots. The general approach to the soultion has been the definition of a suitable quantitative difference metric which represents significant qualitative differences between frames <Zhang et al. 1997: 142> - which is exactly the boundary between the iconological and the archaeological gaze, between semantics and statistics, between narrative and formal (in the sense of Wölfflin) topoi. Of course, a topos is a rhetorical category; rhetoric, though, is more of a technique than a question of content: The philosopher Immanuel Kant, f. e., considers the ordering art of topics to be a kind of storage grid for general notions, just like in a library the books are being distributed and stored in shelves with different inscriptions. Do we have to always group image features into meaningful objects and attach semantic descriptions to scenes <Flickner et al. 1997: 8>, or does it rather make sense to concentrate on syntax, thus treating semantics as second-order-syntax? # distributed via <nettime>: no commercial use without permission # <nettime> is a moderated mailing list for net criticism, # collaborative text filtering and cultural politics of the nets # more info: majordomo@bbs.thing.net and "info nettime-l" in the msg body # archive: http://www.nettime.org contact: nettime@bbs.thing.net