On this blog you are able to follow my degree project(master thesis in Bioinformatics) which have the title Pharmaceutical knowledge retrieval through reasoning of ChEMBL RDF.
My supervisor is Egon Willighagen, http://chem-bla-ics.blogspot.com/.

Topics

måndag 17 maj 2010

A moss-chembl application

After a month of traveling I'm now back to devote my time to what's left of my project which would be about 8-9 weeks. My work is progressing and much of my time I'm working with human-computer-interaction but also advancing the SPARQL queries and test for accuracy.

MoSS as I probably mentioned a couple of times before is a molecular substructure mining software produced by Christian Borgelt, http://www.borgelt.net/moss.html. I implemented that application for Bioclipse in 2008, http://wiki.bioclipse.net/index.php?title=MoSS_in_Bioclipse, and I'm now making use of my own application.

As my chEMBL work is coming along I'm at the moment working on a specific working flow, "from chEMBL to MoSS". With the functionality of SPARQL I am now via java methods accessing compounds from various Kinase protein familes. A method could look like something like this

public IStringMatrix MossProtFamilyCompounds(String fam, String actType)
throws BioclipseException{

String sparql =
"PREFIX chembl: " +
"PREFIX bo: "+

"SELECT DISTINCT ?smiles where{ " + " ?target a chembl:Target;" +
" chembl:classL5 ?fam. " +
" ?assay chembl:hasTarget ?target . " +
" ?activity chembl:onAssay ?assay ;" +
" chembl:type ?actType ; " +
" chembl:forMo
lecule ?mol ."+
" ?mol bo:smiles ?smiles. " +
" FILTER regex(?fam, " + "\"^" + fam + "$\"" + ", \"i\")."+
" FILTER regex(?
actType, " + "\"^" + actType + "$\"" + ", \"i\")."+
" }";
IStringMatrix matrix = rdf.sparqlRemote("http://rdf.farmbio.uu.se/chembl/sparql",sparql);

return matrix;
}

Inside this java method there is a SPARQL query which is a string named sparql. It is possible to run a query like this due to the rdf project done by Egon. I use that feature when I call rdf.sparqlRemote, what that command basically do is accessing the SPARQL endpoint(URL) with my query which is made into a String. So for this to work an internet connection must exist.
I will try to find something that can check if such a connection exist or not to improve the use of the application(no connection -> no search).

The compounds are saved into a file supported by MoSS. This makes it possible for MoSS to run on the compounds drawn from the chEMBL database. Also a java script environment is available.

































The pictures shows(top) the moss-chembl wizard and (bottom) the moss wizard.

The moss-chembl applications is dynamic which means that you can search for wanted compounds and look at them directly. This ease the work a lot! Also to be mentioned is that the compounds are at the moment only compounds that bind to a protein in a Kinase Family.

When a preferred data set is chosen moss will read in the data and now you are able to perform a substructure mining on them!

Next problem to manage Visualization...