Motif finding in gene networks September 11, 2008
Posted by Geordie in For Developers.trackback
In complex biological systems, it is often known how to measure or otherwise extract the form of a system of interest. For example a protein’s primary sequence can be measured in a straightforward way. However deducing its function from this form, for example by calculating the shape of a protein (the function in this case is believed to be highly correlated to this shape) remains an unsolved problem. A problem with a similar flavor exists in systems biology, where the form of the system is composed of a set of interacting genes. These genes can regulate (ie turn on or off) other genes (and themselves). One way to represent this is to create a directed graph, where the vertices are the genes, and information about who is regulating who is stored in labels on the edges.
Several interesting gene networks are publicly available, including the regulation of early development in the sea urchin Strongylocentrotus purpuratus and Drosophila, and the gene regulation network of E. Coli. Here is a picture of a regulatory network for endomesoderm specification (a collection of genes from a sea urchin that makes something go, from PZ Meyer’s blog: the article is also interesting to read to understand why topological motifs matter to biologists):
One way to extract useful information from these gene regulatory networks is to identify function in a network with specific topological motifs, or patterns. For example if a certain pattern of regulation is known to have a certain function in one organism, if you can find a similar pattern in another organism you can infer that location has the same or similar functionality.
For example, let’s say we have a big gene network for another animal–say Drosophila–and we want to know if the sea urchin module above is replicated in Drosophila. In order to do this you can convert both gene networks into labeled graphs, and then find the Maximum Common Subgraph between them. You can do this by first forming an association graph from the two gene networks, and then finding the Maximum Independent Set of that graph, which is a problem that Orion web services can solve. As abonus with this approach, the Maximum Independent Set gives you the area and size of maximal overlap, so you have a metric for how similar your substructure is to structures in the other organism (it’s unlikely they will be identical in general, so having a similarity measure that is robust against changes is important).
More generically, any substructure of any gene network can be found this way. You first define the substructure you’re looking for as a labeled graph, form an association graph of the substructure and the target gene network, and then find the Maximum Independent Set of the association graph using Orion web services.

I don’t know what I did, but you must be ticked off to post all this stuff, right after I get PRK, so I’m getting a headache from looking at my monitor too much
Wow, you’ve really had crazy output, more in the last two weeks than sometimes a full quarter. I like what you wrote about the musical composition problem, I remember playing with automatically composed music a long time ago (aleatoric), where it sounded better the more rules you gave it (the less you left to chance, such as inputting a specific rhythm, or a set of scales and progressions, the better it sounded).
This is not something I thought of in terms of quantum computing, outside of the dream that QC could lead to strong AI that could understand the ideas of aesthetics, but upon further reflection, this should be much easier than strong AI learning musical appreciation…
Anyways, please pace the postings, because I feel like I’m getting behind in my homework!