On this blog you are able to follow my degree project(master thesis in Bioinformatics) which have the title Pharmaceutical knowledge retrieval through reasoning of ChEMBL RDF.
My supervisor is Egon Willighagen, http://chem-bla-ics.blogspot.com/.


måndag 14 juni 2010

A small but wonderful add-on

Look at the following scenarios:


> var camk = chembl.MossGetProtFamilyCompAct("camk", "IC50")

> chembl.MoSSViewHistogram(camk)

> var camkBounds = chembl.MossSetActivityBound(camk, 1,1000000)

> camkBounds.getRowCount()


>chembl.MossSaveFormat("/ChEMBL-MoSS/Rapport/CAMKIC501", camk)


> var camk=chembl.MossGetProtFamilyCompActBounds("CAMK","IC50",1, 1000000)

> camk.getRowCount()


> chembl.MossSaveFormat("/ChEMBL-MoSS/Rapport/CAMKIC502", camk)

(a)+(b) Scripts taken from the context of retrieving molecules for molecular substructure mining. (a) Collects compounds that bind to proteins from the family CAMK with the activity IC50. The activities for the compounds are looked at in a histogram and the bound is later set to involve molecules within activities between 1-1000,000. Lastly saved out to a file that supports MoSS input file.

(b)Lets say you been working with this set a couple of times and know exactly your parameters then the script in (b) would reduce unnecessary steps in retrieving molecules by simply adding the upper and lower value to the query directly. At last saving into an input file of MoSS.

Small step but wonderful when you run scripts all day!

måndag 7 juni 2010

The ChEMLB-MoSS interaction in Bioclipse

There are two ways of accessing the chEMBL- MoSS feature in Bioclipse, javascript and by wizard. I will present both ways here!

In both situation I work with an example of accessing molecules for the Kinase protein family Tyrosin Kinase also known as TK. I want to look at the compounds that bind to any protein in this family with the activity Ki. Also, to specify in what activity span my molecules should be in.

Starting of with the wizard, this is what it looks like when it is first open.

Only one box is accessible and that is the one for protein families. When a family is selected a SPARQL query run towards the endpoint and returns the available activities for that family. By simply selecting a preferred activity an other SPARQL query will update the table with compounds (with a limitation of 50, the button add all(which is done in the picture) will of course add them all=).
Now I would like to only collect the active compounds hence I first look at the graph displaying the activities.
When I know in what activity span I would like to work with I update the table with help from the lower and upper boxes and simply press update table. When I now press finish a file that supports MoSS will be produced.

Performing almost the same task now provides the following javascript.

> var tkki = chembl.MossGetProtFamilyCompAct("tk","ki",50)
> tkki.getRowCount()
Here I collect 50 compounds from the TK family with the activity of KI.

> var tkki = chembl.MossGetProtFamilyCompAct("tk","ki")
> tkki.getRowCount()
Here I perform the same thing as above without a limit leaving to returning 976 compounds, the same number that was returned when "add all" was pushed in the wizard.

> var tkkiActBound = chembl.MossSetActivityBound(tkki, 1,15000)
> tkkiActBound.getRowCount()
> tkkiActBound

With the specification of an activity span between 1 and 15000 nm the number of compounds are reduced to 850(as in the wizard). If I write the name of the variable a string matrix will display all the information. But in order to work with MoSS it has to be saved in a certain way. That's why we save the matrix to a file just as we did when we pressed finish in the wizard.

> chembl.saveMossFormat("/chembl/Script/tkki",tkkiActBound)

Taken from the produced file(s)(they are exactly the same).



With this shown I will
soon let you know what MoSS can do with the saved data!