Annzi's Project blog: maj 2010

After a month of traveling I'm now back to devote my time to what's left of my project which would be about 8-9 weeks. My work is progressing and much of my time I'm working with human-computer-interaction but also advancing the SPARQL queries and test for accuracy.

MoSS as I probably mentioned a couple of times before is a molecular substructure mining software produced by Christian Borgelt, http://www.borgelt.net/moss.html. I implemented that application for Bioclipse in 2008, http://wiki.bioclipse.net/index.php?tit le=MoSS_in_Bioclipse, and I'm now making use of my own application.

As my chEMBL work is coming along I'm at the moment working on a specific working flow, "from chEMBL to MoSS". With the functionality of SPARQL I am now via java methods accessing compounds from various Kinase protein familes. A method could look like something like this

public IStringMatrix MossProtFamilyCompounds(String fam, String actType)
throws BioclipseException{

String sparql =
"PREFIX chembl: " +
"PREFIX bo: "+
"SELECT DISTINCT ?smiles where{ " + " ?target a chembl:Target;" +
" chembl:classL5 ?fam. " +
" ?assay chembl:hasTarget ?target . " +
" ?activity chembl:onAssay ?assay ;" +
" chembl:type ?actType ; " +
" chembl:forMolecule ?mol ."+
" ?mol bo:smiles ?smiles. " +
" FILTER regex(?fam, " + "\"^" + fam + "$\"" + ", \"i\")."+
" FILTER regex(?actType, " + "\"^" + actType + "$\"" + ", \"i\")."+
" }";
IStringMatrix matrix = rdf.sparqlRemote("http://rdf.farmbio.uu.se/chembl/sparql",sparql);
return matrix;
}

Inside this java method there is a SPARQL query which is a string named sparql. It is possible to run a query like this due to the rdf project done by Egon. I use that feature when I call rdf.sparqlRemote, what that command basically do is accessing the SPARQL endpoint(URL) with my query which is made into a String. So for this to work an internet connection must exist.
I will try to find something that can check if such a connection exist or not to improve the use of the application(no connection -> no search).

The compounds are saved into a file supported by MoSS. This makes it possible for MoSS to run on the compounds dra

wn from the chEMBL database. Also a java script environment is available.

The pictures shows(top) the moss-chembl wizard and (bottom) the moss wizard.

The moss-chembl applications is dynamic which means that you can search for wanted compounds and look at them directly. This ease the work a lot! Also to be mentioned is that the compounds are at the moment only compounds that bind to a protein in a Kinase Family.

When a preferred data set is chosen moss will read in the data and now you are able to perform a substructure mining on them!

Next problem to manage Visualization...

Annzi's Project blog

Topics

måndag 17 maj 2010

A moss-chembl application

Om mig

Bloggarkiv

Bloggintresserade