After a month of traveling I'm now back to devote my time to what's left of my project which would be about 8-9 weeks. My work is progressing and much of my time I'm working with human-computer-interaction but also advancing the SPARQL queries and test for accuracy.
MoSS as I probably mentioned a couple of times before is a molecular substructure mining software produced by Christian Borgelt,
http://www.borgelt.net/moss.html. I implemented that application for Bioclipse in 2008,
http://wiki.bioclipse.net/index.php?title=MoSS_in_Bioclipse, and I'm now making use of my own application.
As my chEMBL work is coming along I'm at the moment working on a specific working flow, "
from chEMBL to MoSS". With the functionality of SPARQL I am now via java methods accessing compounds from various Kinase protein familes. A method could look like something like this
public IStringMatrix MossProtFamilyCompounds(String fam, String actType)
throws BioclipseException{
String sparql =
"PREFIX chembl: " +
"PREFIX bo: "+ "SELECT DISTINCT ?smiles where{ " + " ?target a chembl:Target;" +
" chembl:classL5 ?fam. " +
" ?assay chembl:hasTarget ?target . " +
" ?activity chembl:onAssay ?assay ;" +
" chembl:type ?actType ; " +
" chembl:forMolecule ?mol ."+
" ?mol bo:smiles ?smiles. " +
" FILTER regex(?fam, " + "\"^" + fam + "$\"" + ", \"i\")."+
" FILTER regex(?actType, " + "\"^" + actType + "$\"" + ", \"i\")."+
" }";
IStringMatrix matrix = rdf.sparqlRemote("http://rdf.farmbio.uu.se/chembl/sparql",sparql); return matrix;}Inside this java method there is a SPARQL query which is a string named sparql. It is possible to run a query like this due to the
rdf project done by
Egon. I use that feature when I call
rdf.sparqlRemote, what that command basically do is accessing the SPARQL endpoint(URL) with my query which is made into a String. So for this to work an internet connection must exist.
I will try to find something that can check if such a connection exist or not to improve the use of the application(no connection -> no search).
The compounds are saved into a file supported by MoSS. This makes it possible for MoSS to run on the compounds dra
wn from the chEMBL database. Also a java script environment is available.
The pictures shows(top) the moss-chembl wizard and (bottom) the moss wizard.
The moss-chembl applications is dynamic which means that you can search for wanted compounds and look at them directly. This ease the work a lot! Also to be mentioned is that the compounds are at the moment only compounds that bind to a protein in a Kinase Family.
When a preferred data set is chosen moss will read in the data and now you are able to perform a substructure mining on them!
Next problem to manage Visualization...