Eugene Thacker on 15 Oct 2000 05:05:59 -0000 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
<nettime> GEML: Gene Expression Markup Language |
GEML: Gene Expression Markup Language Eugene Thacker "We tried to create what we called a discovery system instead of a data retrieval system. A data retrieval system is something that works if you get the data out that you put in. A discovery system is one that finds connections you didn't know about before." James Ostell, NCBI (National Center for Biotechnology Information) Despite the Web's talent for reducing the body to a clicking finger and scanning eye, there's no shortage of bodies of all types on the Web itself, from public bodies on outdoor Web-cams, to the private bodies of video chat, to the modular bodies of streaming porn. But recently, a very special kind of Web-body has emerged. It isn't a body that's visually represented over the Web, but rather a body that is directly encoded into computer databases and through software applications. It isn't a file that's referred to, the way a Webpage refers to a Flash movie or a Quicktime clip, but it is, in itself, a code, a programming language. This past September, Rosetta Inpharmatics - their name itself significant - a bioinformatics company from the Pacific Northwest, announced that it had developed a cross-platform standard for computer-based molecular genetics and biotech research. Called Gene Expression Markup Language - or GEML - this programming language is designed to facilitate the current file-format and compatibility problems in computer-based biotech research. Now researchers in biotech can, with the ease of cross-platform standardization, work between widely different biological databases, from Celera's human genome sequence, to the SWISS-PROT protein database, to the Human Gene Mutation database. All online, all digital, and now all in a standardized backend programming language. GEML is based upon XML - or Extensible Markup Language - and operates independent of any particular database file format schema. GEML manages two types of genetic data - genetic patterns (gene expression, or analyses of which sets of genes are switched on or off, and which may include biochemical pathway information or gene-protein relationships) and genetic profiles (digital scans of microarray chips, which are used to speedily and efficiently analyze genetic samples). GEML can keep meticulous track of a given piece of genetic data, noting what the original file format was, where the data was retrieved from, the type of database search method used, and the operations performed with a given data file - presumably biotech research, like Photoshop, now includes several levels of "undo" as well. The impetus for the development of universal standards like GEML is the emerging field of bioinformatics. Put briefly, bioinformatics is the use of computer and networking technology to handle large amounts of genetic data. Uses of computer databases in molecular biology go back to the 1970s, but with the development of advanced computer processing power and the Internet, researchers began discovering that they could potentially do alot more than just catalogue data. The deluge of genomic data generated from the human genome project has made it necessary to develop more sophisticated, standardized means of managing that information. But, once you've encoded the genetic body, you make an incredibly important shift, from genetic "codes" to computer "codes." The former you could still experiment with in the lab, using recombinant DNA techniques, plasmids, vectors, and other micro-organismic tools. But a shift (or an uploading) to computer codes adds another set of requirements to that genetic data. It demands, first of all, an abstraction of the body (already abstracted a first, bio-scientific level through the discourse of genetic information) to the level of binary on-off switches, genetic molecules into pulses of light. At this level, we get a strange mixture of two systems - a genetic/cellular one and a computer/network one. With sophisticated online databases, genomic analysis software (including the use of intelligent agents and data mining protocols), and microarrays or DNA chips, bioinformatics is promising to automate the gene discovery process. Set up your search parameters, load in a blank CD-R, and press enter - now go home, relax, and come back tomorrow morning, where the results of the search will tell you if there are any hits, what novel genes and/or drug targets have been isolated as candidates, what their expression patterns are, what pathways they're involved in, and whether or not patents currently exist for those genes. The recent race to sequence the human genome would not have been possible without the technical advancements made by tools-companies such as Perkin-Elmer (the primary provider of automated DNA sequencing computers). And the recent interest of the computer industry in biotech (Sun, IBM, Compaq, Motorola) has made bioinformatics a research field within itself. Last spring, the investment research firm Oscar Gruss projected that within five years, bioinformatics' market value could exceed $2 billion. All of which is to say that, with biotech research, the computer is no longer just a tool. It has become that, and much more, extending its range of operations, bringing in computer science and programming, and transforming the "wet" biology lab into a networked computer lab. Unfortunately, researchers and companies seem to still accept bioinformatics tools, such as the GEML language, as a transparent tool, something that will transparently aid in the advancement of science research, without fundamentally altering research itself or the objects of study. When a GEML-based application accesses several genomic databases, is it also accessing genetic bodies? Or is this simply just data acting on data, and if so, where exactly are the points of connection to material, biological bodies? No one is asking whether such bioinformatics techniques fundamentally change our notion of what the body is, or whether the level of complexity that bioinformatics can deal with will fundamentally challenge traditional bioscience and genetics research. This places an "object" like GEML in a very strange position. On the one hand, GEML, as a programming language, refers to or points to something in its tags, the same way that the tag <IMG SRC="mycells.jpg"> points to a digital image of my cells on a Web page. In this sense, GEML operates not only as HTML does (with referring tags and attributes), but it also operates according to the traditional signifier-sign relationships that characterize modern linguistics. Only, the "thing" the language points to is another type of data, a genetic code, itself with its own set of rules and protocols for functioning. This also means that, as an XML-based language, GEML is developed from the ground up, so to speak, so that the types of tags and attributes used, as well as their interrelationships, will be dictated by the ways in which genetic data itself operates. Each use of a GEML implementation needs to be identified by a Document Type Defintion or DTD file, which lists the types of attributes used. This DTD file will be based, in the case of GEML, on the ways in which genetic code operates in the body - that is, according to sequences for genes, chromosomal positioning, gene-protein relationships, promoter-terminator regions, splice variants, gene polymorphisms, and so on. In other words, the DTD file for GEML is based on the current state of knowledge in biotech research - how reductive or complex that knowledge is, how rigid or flexible it is, etc. What's being produced with GEML, then, is a kind of meta-code for approaching the molecular code of the genome. In a sense GEML doesn't add or modifying anything in the genome - it is not a genetic engineering in the traditional sense of the term. This interrelationship between molecular genetics and computer science means that bioinformatics will only be as complex, technically sophisticated, and potentially transformative as its DTD file - or the types of knowledge input into bioinformatics code. The conventional truism of molecular genetics - that, in a causal, linear fashion, "DNA makes RNA makes protein" - will only produce a bioinformatics to that level of complexity. However, as researchers and laboratories have been acknowledging, most diseases and phenotypic markers are the product of multiple genetic triggers and multiple biochemical pathways, not to mention networked interactions with context or environment. This is why the most interesting alternative approaches within biotech research - such as systems biology - have demanded that both molecular genetics and computer science transform the discourse of genetics and biotech, moving away from the over-determinism of single-gene theories, and towards more distributed and networked approaches. There are hundreds of biological databases in existence, from human genome to protein to human gene mutation to tissue banks, some owned by research institutes, some owned by universities, some owned by corporations. In all of this, no one has addressed a basic question: where is "the body"? Or better, where is "the biological"? Addressing this question means going back to the fundamentals of molecular genetics, when the discours of the "genetic code" first began gaining momentum. This is not just a scientific issue, but an issue concerning the possible tensions between bodies and machine, biologies and technologies. The central compatability issue for bioinformatics approaches such as GEML is not that between different computer-based genomic databases. Basically, the databases of Celera, DoubleTwist, or the public consortium all consist of digital files encoding sequences of As, Ts, Cs, and Gs, themselves encoded from a series of DNA samples from anonymous human donors. Creating trans-database compatability is just a matter of writing more code. The real challenge - not just a technical one, but a philosophical one, an ontological challenge - is to create that same compatability between "wet" cells and silicon databases, between genetic "codes" and computer "codes." After all, DNA in a blood sample is not a computer database...or is it? After all, you can encode DNA from a blood sample into a digital format, but not the other way around...or can you? Links: Celera Genomics <http://www.celera.com>. Fikes, Bradley. "Bioinformatics Tries to Find a Common Means to Express Biological Data." DoubleTwist (22 September 2000): <http://www.doubletwist.com>. HUGO Mutation Database Initiative <http://ariel.ucs.unimelb.edu.au:80/~cotton/mdi.htm>. Primer on Molecular Genetics (U.S. Dept. of Energy) <http://www.bis.med.jhmi.edu/Dan/DOE/intro.html>. Rosetta Inpharmatics <http://www.rosettainpharmatics.com>. SWISS-PROT <http://www.expasy.ch/sprot>. ¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬ Eugene Thacker e: maldoror@eden.rutgers.edu w: http://gsa.rutgers.edu/maldoror/index.html Pgrm. in Comparative Literature, Rutgers Univ. ¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬ CURRENT: "Participating in the Biotech Industry: Notes on the Gene Trust" @ nettime: <http://www.nettime.org>. "The Post-Genomic Era Has Already Happened" @ Biopolicy Journal <http://bioline.bdt.org.br/py> "SF, Technoscience, Net.art: The Politics of Extrapolation" @ Art Journal 59:3 <http://www.collegeart.org/caa/ publications/AJ/artjournal.html> "Point-and-Click Biology: Why Programming is the Future of Biotech" @ MUTE (Issue 17 - archives at http://www.metamute.com) "Fakeshop: Science Fiction, Future Memory & the Technoscientific Imaginary" @ CTHEORY <http://www.ctheory.com> ¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬ also: FAKESHOP <http://www.fakeshop.com> ¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬ # distributed via <nettime>: no commercial use without permission # <nettime> is a moderated mailing list for net criticism, # collaborative text filtering and cultural politics of the nets # more info: majordomo@bbs.thing.net and "info nettime-l" in the msg body # archive: http://www.nettime.org contact: nettime@bbs.thing.net