On this blog you are able to follow my degree project(master thesis in Bioinformatics) which have the title Pharmaceutical knowledge retrieval through reasoning of ChEMBL RDF.
My supervisor is Egon Willighagen, http://chem-bla-ics.blogspot.com/.

Topics

fredag 12 februari 2010

Approching substructure mining

With a simple query like the one below random compounds from kinases from the Tk family is collected. I would like to filter the standard value to be under a certain value but I have some problem with doing that in Bioclipse, via SPARQL endpoint I've managed to create this filter. Will work on it.

var allsmiles = " \
PREFIX onto: \
PREFIX blueobelisk: \
\
SELECT DISTINCT ?smiles \
WHERE { \
?target a onto:Target . \
?target onto:classL5 \"Tk\" . \
?target onto:classL6 ?L6 . \
?assay onto:hasTarget ?target . \
?activity onto:onAssay ?assay . \
?activity onto:standardValue ?st . \
?activity onto:forMolecule ?mol . \
?mol blueobelisk:smiles ?smiles . \
}LIMIT 20 \
";
var all = rdf.sparqlRemote("http://rdf.farmbio.uu.se/chembl/sparql", allsmiles)
var all now contains a list of molecules that I is saved in a file via the net.bioclipse.moss.business plug-in.

moss.saveMoss(String fileName, List all)
Will create a file that support moss(id, threshold value, description), not complete
0,0,Cc1nc(N)sc1c2ccnc(Nc3cccc(c3)[N+](=O)[O-])n2
1,0,COc1cc(Nc2c(cnc3cc(OCCC4CCN(C)CC4)c(OC)cc23)C#N)c(Cl)cc1Cl
2,0,CCOc1cc(Nc2c(cnc3cc(OCC4CCN(C)CC4)c(OC)cc23)C#N)c(Cl)cc1Cl
3,0,COc1ccc(C)c(Nc2c(cnc3cc(OCC4CCN(C)CC4)c(OC)cc23)C#N)c1
4,0,COc1ccc(Cl)c(Nc2c(cnc3cc(OCC4CCN(C)CC4)c(OC)cc23)C#N)c1
5,0,COc1cc2c(Nc3ccc(C)cc3C)c(cnc2cc1OCC4CCN(C)CC4)C#N
6,0,COc1cc(Nc2c(cnc3cc(OCC4CCN(C)CC4)c(OC)cc23)C#N)c(C)cc1C


This file now have to be initialized, add parameters for the run and when done simply run.

> moss.saveMoss("/Moss/Test/collected", all)
> moss.init("/Moss/Test/collected")
done
>moss.setLimits(10,2)
> moss.run("/Moss/Test/collectedOut", "/Moss/Test/collectedOutId")


Only two basic parameter settings work at the moment, this is something to be added as soon as possible. It will take time though since lots of parameters are set by combining flags which I remember to be a crucial thing to do.

To read about how MoSS works, how to understand the output files etc look at Christian Borgelt homepage, http://www.borgelt.net/doc/moss/moss.html.
Output file(not complete):
id,description,nodes,edges,s_abs,s_rel,c_abs,c_rel
1,n1:c2:c(:c(-N-c3:c(-Cl):c:c(-Cl):c(-O-C):c:3):c(-C#N):c:1):c:c(-O-C):c(-O-C-C1-C-C-N(-C-C-1)-C):c:2,34,37,2,10.0,0,0.0
2,n1:c2:c(:c(-N-c3:c(-Cl):c:c:c(-O-C):c:3):c(-C#N):c:1):c:c(-O-C):c(-O-C-C1-C-C-N(-C-C-1)-C):c:2,33,36,3,15.0,0,0.0
3,n1:c2:c(:c(-N-c3:c(-Cl):c:c(-Cl):c(-O-C):c:3):c(-C#N):c:1):c:c(-O-C):c(-O-C-C(-C-C)-C):c:2,31,33,3,15.0,0,0.0
4,n1:c2:c(:c(-N-c3:c(-Cl):c:c:c(-O-C):c:3):c(-C#N):c:1):c:c(-O-C):c(-O-C-C(-C-C)-C):c:2,30,32,4,20.0,0,0.0


Output file Id(not complete)
id:list
1:2,10
2:2,4,10
3:2,9,10
4:2,4,9,10
5:2,7,10
6:2,4,7,10


Want to be able to visualize the result in tables later on, perhaps together with the input and other information collected via SPARQL.

4 kommentarer:

  1. Your code:

    moss.saveMoss("/Moss/Test/collected", all)
    moss.init("/Moss/Test/collected")
    moss.setLimits(10,2)
    moss.run("/Moss/Test/collectedOut", "/Moss/Test/collectedOutId")

    is not thread safe... we need to come up with a solution where the run time parameters are set in a thread-safe manner...

    One option is to use something like:

    moss.run("/Moss/Test/collectedOut", "/Moss/Test/collectedOutId", "limits", 10, 2)

    or another alternative of passing the parameters when run() is called...

    It might also be possible to use a separate Project approach for this... like the QSAR Project set up used by Ola...

    SvaraRadera
  2. I believe that the later option is preferable. There are about 30 different settings so I excluded the first option when I approached the problem.

    I do like to know more about QSAR so will definitely look into it.

    SvaraRadera
  3. JFYI: In case you'd like to filter on a max-value for the standard deviation already when running the SPARQL query, you should be able to do that by adding a "FILTER" construct in the WHERE clause.

    So you'd then go something like ...
    ...
    WHERE {
    ...
    ?activity onto:standardValue ?st .
    ...
    FILTER ( ?st < [some-numerical-value] )
    }

    (I do this in the second code snippet at http://saml.rilspace.com/content/backtracking-key-difference-between-sparql-and-prolog though it was not the appropriate thing for what tried to do :/ ... but I'm quite sure I got basic filtering to work at least, when testing).

    SvaraRadera
  4. Or did you try that already?

    SvaraRadera