tag:blogger.com,1999:blog-27317969767375394502024-02-19T15:50:11.296-08:00Annzi's Project blogAnnzihttp://www.blogger.com/profile/00745732977527862482noreply@blogger.comBlogger15125tag:blogger.com,1999:blog-2731796976737539450.post-41856100702443193492010-06-14T07:35:00.000-07:002010-06-14T07:45:58.335-07:00A small but wonderful add-onLook at the following scenarios:
<br />
<br /> <meta name="Title" content=""> <meta name="Keywords" content=""> <meta equiv="Content-Type" content="text/html; charset=utf-8"> <meta name="ProgId" content="Word.Document"> <meta name="Generator" content="Microsoft Word 2008"> <meta name="Originator" content="Microsoft Word 2008"> <link rel="File-List" href="file://localhost/Users/annzi/Library/Caches/TemporaryItems/msoclip/0/clip_filelist.xml"> <!--[if gte mso 9]><xml> <o:documentproperties> <o:template>Normal.dotm</o:Template> <o:revision>0</o:Revision> <o:totaltime>0</o:TotalTime> <o:pages>1</o:Pages> <o:words>154</o:Words> <o:characters>882</o:Characters> <o:company>Inst f Farm Biovetenskap</o:Company> <o:lines>7</o:Lines> <o:paragraphs>1</o:Paragraphs> <o:characterswithspaces>1083</o:CharactersWithSpaces> <o:version>12.0</o:Version> </o:DocumentProperties> <o:officedocumentsettings> <o:allowpng/> </o:OfficeDocumentSettings> </xml><![endif]--><!--[if gte mso 9]><xml> <w:worddocument> <w:zoom>0</w:Zoom> <w:trackmoves>false</w:TrackMoves> <w:trackformatting/> <w:punctuationkerning/> <w:drawinggridhorizontalspacing>18 pt</w:DrawingGridHorizontalSpacing> <w:drawinggridverticalspacing>18 pt</w:DrawingGridVerticalSpacing> <w:displayhorizontaldrawinggridevery>0</w:DisplayHorizontalDrawingGridEvery> <w:displayverticaldrawinggridevery>0</w:DisplayVerticalDrawingGridEvery> <w:validateagainstschemas/> <w:saveifxmlinvalid>false</w:SaveIfXMLInvalid> <w:ignoremixedcontent>false</w:IgnoreMixedContent> <w:alwaysshowplaceholdertext>false</w:AlwaysShowPlaceholderText> <w:compatibility> <w:breakwrappedtables/> <w:dontgrowautofit/> <w:dontautofitconstrainedtables/> <w:dontvertalignintxbx/> </w:Compatibility> </w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:latentstyles deflockedstate="false" latentstylecount="276"> </w:LatentStyles> </xml><![endif]--> <style> <!-- /* Font Definitions */ @font-face {font-family:Cambria; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 0 0 0 1 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman"; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:Cambria; mso-fareast-theme-font:minor-latin; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} @page Section1 {size:612.0pt 792.0pt; margin:72.0pt 90.0pt 72.0pt 90.0pt; mso-header-margin:36.0pt; mso-footer-margin:36.0pt; mso-paper-source:0;} div.Section1 {page:Section1;} --> </style> <!--[if gte mso 10]> <style> /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman"; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin;} </style> <![endif]--> <!--StartFragment--> <p class="MsoNormal" style="margin: 0.1pt 0cm;"><span style=";font-family:";" >(a)<o:p></o:p></span></p> <p class="MsoNormal" style="margin: 0.1pt 0cm;font-family:courier new;"><span style="font-size:85%;">> var camk = chembl.MossGetProtFamilyCompAct("camk", "IC50")<o:p></o:p></span></p> <p class="MsoNormal" style="margin: 0.1pt 0cm;font-family:courier new;"><span style="font-size:85%;">> chembl.MoSSViewHistogram(camk)</span></p><p class="MsoNormal" style="margin: 0.1pt 0cm;"><span style=";font-family:Courier;font-size:85%;" ><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5xffiThyuOt7Dm4yBBSLfzsqYEFxc4931y80fr-tqcdb53mChI3kq_cEO5MGA__zdisAAXirgKOxuwSndO3Q9l6DIOXPxQqBjIcPpwzlZ47ssWgfgt_ZE4jbecjoVzHbrvU2NWSo0-Pw/s1600/histo.png"><img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 198px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5xffiThyuOt7Dm4yBBSLfzsqYEFxc4931y80fr-tqcdb53mChI3kq_cEO5MGA__zdisAAXirgKOxuwSndO3Q9l6DIOXPxQqBjIcPpwzlZ47ssWgfgt_ZE4jbecjoVzHbrvU2NWSo0-Pw/s320/histo.png" alt="" id="BLOGGER_PHOTO_ID_5482638996044206962" border="0" /></a></span></p><p class="MsoNormal" style="margin: 0.1pt 0cm;font-family:courier new;"><span style="font-size:85%;">
<br /></span><span style="font-size:85%;"><o:p></o:p></span></p> <p class="MsoNormal" style="margin: 0.1pt 0cm;font-family:courier new;"><span style="font-size:85%;">> var camkBounds = chembl.MossSetActivityBound(camk, 1,1000000)<o:p></o:p></span></p> <p class="MsoNormal" style="margin: 0.1pt 0cm;font-family:courier new;"><span style="font-size:85%;">> camkBounds.getRowCount()<o:p></o:p></span></p> <p class="MsoNormal" style="margin: 0.1pt 0cm;font-family:courier new;"><span style="font-size:85%;">2565<o:p></o:p></span></p> <p class="MsoNormal" style="margin: 0.1pt 0cm;"><span style=";font-family:Courier;font-size:10pt;" ><span style=";font-family:courier new;font-size:85%;" >>chembl.MossSaveFormat("/ChEMBL-MoSS/Rapport/CAMKIC501", camk)</span><o:p></o:p></span></p> <p class="MsoNormal" style="margin: 0.1pt 0cm;"><span style=";font-family:";" ><o:p> </o:p></span></p> <p class="MsoNormal" style="margin: 0.1pt 0cm;"><span style=";font-family:";" >
<br /></span></p><p class="MsoNormal" style="margin: 0.1pt 0cm;"><span style=";font-family:";" >(b)<span style="font-size:85%;"><o:p style="font-family: courier new;"></o:p></span></span></p> <p class="MsoNormal" style="margin: 0.1pt 0cm;font-family:courier new;"><span style="font-size:85%;">> var camk=chembl.MossGetProtFamilyCompActBounds("CAMK","IC50",1, <span style=""> </span><span style=""> </span><span style=""> </span><span style=""> </span><span style=""> </span><span style=""> </span><span style=""> </span><span style=""> </span>1000000) <o:p></o:p></span></p> <p class="MsoNormal" style="margin: 0.1pt 0cm;font-family:courier new;"><span style="font-size:85%;">> </span><span style="font-size:85%;">camk</span><span style="font-size:85%;">.getRowCount()<o:p></o:p></span></p> <p class="MsoNormal" style="margin: 0.1pt 0cm;font-family:courier new;"><span style="font-size:85%;">2565<o:p></o:p></span></p> <p class="MsoNormal" style="margin: 0.1pt 0cm;"><span style=";font-family:Courier;font-size:10pt;" ><span style=";font-family:courier new;font-size:85%;" >> chembl.MossSaveFormat("/ChEMBL-MoSS/Rapport/CAMKIC502", camk)</span></span></p><p class="MsoNormal" style="margin: 0.1pt 0cm;">
<br /></p><p class="MsoNormal" style="margin: 0.1pt 0cm;">
<br /><span style=";font-family:Courier;font-size:10pt;" ><o:p></o:p></span></p> <p class="MsoNormal" style="margin: 0.1pt 0cm;"><span style=";font-family:";font-size:10pt;" ><o:p> </o:p></span></p> <p class="MsoNormal" style="margin: 0.1pt 0cm;"><b style=""><span style=";font-family:";font-size:10pt;" ></span></b><span style=";font-family:";font-size:10pt;" ><span style="font-size:100%;">(a)+(b) Scripts taken from the context of retrieving molecules for molecular substructure mining. (a) Collects compounds that bind to proteins from the family CAMK with the activity IC50. The activities for the compounds are looked at in a histogram and the bound is later set to involve molecules within activities between 1-1000,000. Lastly saved out to a file that supports MoSS input file.
<br /></span></span></p><p class="MsoNormal" style="margin: 0.1pt 0cm;"><span style=";font-family:";font-size:10pt;" ><span style="font-size:100%;">(b)Lets say you been working with this set a couple of times and know exactly your parameters then the script in (b) would reduce unnecessary steps in retrieving molecules by simply adding the upper and lower value to the query directly. At last saving into an input file of MoSS.
<br /></span></span></p><p class="MsoNormal" style="margin: 0.1pt 0cm;">
<br /></p><p class="MsoNormal" style="margin: 0.1pt 0cm;">Small step but wonderful when you run scripts all day!
<br /></p><p class="MsoNormal" style="margin: 0.1pt 0cm;">
<br /><span style=";font-family:";font-size:10pt;" ><o:p></o:p></span></p> <!--EndFragment--> Annzihttp://www.blogger.com/profile/00745732977527862482noreply@blogger.com1tag:blogger.com,1999:blog-2731796976737539450.post-89762621572994799782010-06-07T05:03:00.000-07:002010-06-08T02:40:06.512-07:00The ChEMLB-MoSS interaction in BioclipseThere are two ways of accessing the <span style="font-weight: bold;"><a href="http://www.ebi.ac.uk/chembl/">chEMBL</a>- <a href="http://www.borgelt.net/moss.html">MoSS</a></span> feature in <a href="http://www.bioclipse.net/">Bioclipse</a>, javascript and by wizard. I will present both ways here!<br /><br />In both situation I work with an example of accessing molecules for the Kinase protein family Tyrosin Kinase also known as TK. I want to look at the compounds that bind to any protein in this family with the activity Ki. Also, to specify in what activity span my molecules should be in.<br /><br /><span style="font-weight: bold;">Starting of with the wizard,</span> this is what it looks like when it is first open.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRvLEF1GM6aKu7TZK6S-SHq6qdy-WJTHSXNVn-_aDVhvvqmnNQ1P7PLHeVuEu8_PqRZ3jeh1xo9PEvX-gY-Ec6JtyqTxDWweBSTeRzFK02c4FwCoJYetBR32Qxr1webrQiFfmR5QUDdYc/s1600/1.png"><img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 226px; height: 320px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRvLEF1GM6aKu7TZK6S-SHq6qdy-WJTHSXNVn-_aDVhvvqmnNQ1P7PLHeVuEu8_PqRZ3jeh1xo9PEvX-gY-Ec6JtyqTxDWweBSTeRzFK02c4FwCoJYetBR32Qxr1webrQiFfmR5QUDdYc/s320/1.png" alt="" id="BLOGGER_PHOTO_ID_5480006924661791074" border="0" /></a>Only one box is accessible and that is the one for protein families. When a family is selected a SPARQL query run towards the endpoint and returns the available activities for that family. By simply selecting a preferred activity an other SPARQL query will update the table with compounds (with a limitation of 50, the button add all(which is done in the picture) will of course add them all=).<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgKFCGHM5gYVcWtUgl3ba44USV3rNO7Vpp1uSy55EL6lGrLhB7WOUKDYyC0FUb9Y5nta8X2gTn3Wr-E2g6WBh2lN9WKcDQVNd-LHI-O9A80iCiW9o7oNCiiGwyW7X-FuaPSw-Pd8r3V-Jo/s1600/2.png"><img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 227px; height: 320px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgKFCGHM5gYVcWtUgl3ba44USV3rNO7Vpp1uSy55EL6lGrLhB7WOUKDYyC0FUb9Y5nta8X2gTn3Wr-E2g6WBh2lN9WKcDQVNd-LHI-O9A80iCiW9o7oNCiiGwyW7X-FuaPSw-Pd8r3V-Jo/s320/2.png" alt="" id="BLOGGER_PHOTO_ID_5480007251964490258" border="0" /></a>Now I would like to only collect the active compounds hence I first look at the graph displaying the activities.<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6V07JdutT4LUUCpBmxVkA6ClVVTkttVDATuVJ8RWeEzgJVpUXtpgoWxi_Navp_iblCG1Gi7srGd-G1cJx0nqt9ITTDbRt7OayGW3XqSdZraHJ1SbPCq4ldvqYz17uohThGGFgkabgoNU/s1600/3.png"><img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 262px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6V07JdutT4LUUCpBmxVkA6ClVVTkttVDATuVJ8RWeEzgJVpUXtpgoWxi_Navp_iblCG1Gi7srGd-G1cJx0nqt9ITTDbRt7OayGW3XqSdZraHJ1SbPCq4ldvqYz17uohThGGFgkabgoNU/s320/3.png" alt="" id="BLOGGER_PHOTO_ID_5480007784504004818" border="0" /></a>When I know in what activity span I would like to work with I update the table with help from the lower and upper boxes and simply press update table. When I now press finish a file that supports MoSS will be produced.<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQCmz2n4csT9feGxbQf9HYGV2kD3R0Ay3pBy5e2bjk7eSFxTofhwFLHevjDI15EI6_RsqWhUX0N-8B8B4Nm5-ez2j6nTp2En-6MiHT_Spy4ri-rjpk8l-9mxkQjmj8wuwn8u3da9kMamM/s1600/4.png"><img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 226px; height: 320px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQCmz2n4csT9feGxbQf9HYGV2kD3R0Ay3pBy5e2bjk7eSFxTofhwFLHevjDI15EI6_RsqWhUX0N-8B8B4Nm5-ez2j6nTp2En-6MiHT_Spy4ri-rjpk8l-9mxkQjmj8wuwn8u3da9kMamM/s320/4.png" alt="" id="BLOGGER_PHOTO_ID_5480008030176980530" border="0" /></a><br /><span style="font-weight: bold;">Javascript</span><br />Performing almost the same task now provides the following javascript.<br /><span style="font-size:85%;"><br /><span style="font-family:courier new;">> var tkki = chembl.MossGetProtFamilyCompAct("tk","ki",50)<br /><span style="font-size:100%;"><span style="font-weight: bold;"></span></span></span></span><span style="font-size:85%;"><span style="font-family:courier new;">> tkki.getRowCount()</span><br /><span style="font-family:courier new;">50<br /></span></span><span style="font-size:85%;"><span style="font-size:100%;"> Here I collect 50 compounds from the TK family with the activity of KI.</span></span><br /><span style="font-size:85%;"><span style="font-family:courier new;"><span style="font-size:100%;"><span style="font-weight: bold;"></span></span></span><span style="font-family:courier new;"><br />> var tkki = chembl.MossGetProtFamilyCompAct("tk","ki")<br /></span></span><span style="font-size:85%;"><span style="font-family:courier new;">> tkki.getRowCount()</span><br /><span style="font-family:courier new;">976<br /></span></span><span style="font-size:100%;">Here I perform the same thing as above without a limit leaving to returning 976 compounds, the same number that was returned when "add all" was pushed in the wizard.</span><span style="font-size:85%;"><span style="font-family:courier new;"><br /></span><span style="font-family:courier new;"></span><span style="font-family:courier new;"></span></span><span style="font-size:85%;"><br /><span style="font-family:courier new;">> var tkkiActBound = chembl.MossSetActivityBound(tkki, 1,15000)</span><br /><span style="font-family:courier new;">> tkkiActBound.getRowCount()</span><br /><span style="font-family:courier new;">850<br /></span></span><span style="font-size:85%;"><span style="font-family:courier new;">> tkkiActBound</span><br /><span style="font-family:courier new;">[["actval","smiles"],</span><br /><span style="font-family:courier new;">["160","Cc1nc(N)sc1c2ccnc(Nc3cccc(c3)[N+](=O)[O-])n2"],</span><br /><span style="font-family:courier new;">["700","CCOc1nc(cc(N)c1Cl)C(=O)NCc2ccc(cc2)S(=O)(=O)C"],</span><br /><span style="font-family:courier new;">["10000","CS(=O)(=O)Nc1cc2OCCCCCOc3nc(NC(=O)Nc2cc1Cl)cnc3C#N"],</span><br /><span style="font-family:courier new;">["10000","OCCCOc1cc2OCCCCCOc3nc(NC(=O)Nc2cc1Cl)cnc3C#N"],</span><br /><span style="font-family:courier new;">["10000","OCCCc1cc2OCCCCCOc3nc(NC(=O)Nc2cc1Cl)cnc3C#N"],</span><br /><span style="font-family:courier new;">["19.4","Cc1cc(cc2nnc(Nc3ccc(OCCN4CCCC4)cc3)nc12)c5c(Cl)cccc5Cl"],</span><br /><span style="font-family:courier new;">["950","COc1cc2ncc(C#N)c(N[C@@H]3C[C@H]3c4ccccc4)c2cc1OC"],</span><br /><span style="font-family:courier new;">["10000","N#Cc1cnc2ccc(cc2c1N[C@@H]3C[C@H]3c4ccccc4)c5ccc(CN6CCOCC6)cc5"],</span><br /><span style="font-family:courier new;">["10000","N#Cc1cnc2ccc(cc2c1N[C@@H]3C[C@H]3c4ccccc4)c5cccc(CN6CCOCC6)c5"],</span><br /><span style="font-family:courier new;">…</span></span><br /><span style="font-size:85%;"><span style="font-family:courier new;"></span><span style="font-size:100%;">With the specification of an activity span between 1 and 15000 nm the number of compounds are reduced to 850(as in the wizard). If I write the name of the variable a string matrix will display all the information. But in order to work with MoSS it has to be saved in a certain way. That's why we save the matrix to a file just as we did when we pressed finish in the wizard.</span><br /><span style="font-family:courier new;"><br />> chembl.saveMossFormat("/chembl/Script/tkki",tkkiActBound)</span></span><br /><br /><span style="font-size:100%;">Taken from the produced file(s)(they are exactly the same).<br /></span><br /><div style="text-align: left;"><span style="font-size:85%;">1,0,Cc1nc(N)sc1c2ccnc(Nc3cccc(c3)[N+](=O)[O-])n2</span><br /><span style="font-size:85%;">2,0,CCOc1nc(cc(N)c1Cl)C(=O)NCc2ccc(cc2)S(=O)(=O)C</span><br /><span style="font-size:85%;">3,0,CS(=O)(=O)Nc1cc2OCCCCCOc3nc(NC(=O)Nc2cc1Cl)cnc3C#N</span><br /><span style="font-size:85%;">4,0,OCCCOc1cc2OCCCCCOc3nc(NC(=O)Nc2cc1Cl)cnc3C#N</span><br /><span style="font-size:85%;">…</span><br /><span style="font-size:85%;">…</span><br /><span style="font-size:85%;">…</span><br /><span style="font-size:85%;">848,0,Clc1cc2NC(=O)Nc3cnc(C#N)c(OCCCCOc2cc1NCc4cncs4)n3</span><br /><span style="font-size:85%;">849,0,OC[C@@H](NC(=O)c1cc(c[nH]1)c2[nH]ncc2c3cccc(Cl)c3)c4ccc(F)c(Cl)c4</span><br /><span style="font-size:85%;">850,0,FC(F)(F)c1cccc(c1)c2nnc3ccc(NC4CCNCC4)nn23</span><br /><span style="font-size:100%;"><br />With this shown I will</span> soon let you know what MoSS can do with the saved data!<br /></div>Annzihttp://www.blogger.com/profile/00745732977527862482noreply@blogger.com3tag:blogger.com,1999:blog-2731796976737539450.post-66744702957230112482010-05-17T04:42:00.000-07:002010-05-19T03:11:56.834-07:00A moss-chembl applicationAfter a month of traveling I'm now back to devote my time to what's left of my project which would be about 8-9 weeks. My work is progressing and much of my time I'm working with human-computer-interaction but also advancing the SPARQL queries and test for accuracy.<br /><br />MoSS as I probably mentioned a couple of times before is a molecular substructure mining software produced by Christian Borgelt, <a href="http://www.borgelt.net/moss.html">http://www.borgelt.net/moss.html</a>. I implemented that application for Bioclipse in 2008,<a href="http://wiki.bioclipse.net/index.php?title=MoSS_in_Bioclipse"> http://wiki.bioclipse.net/index.php?tit</a><a href="http://wiki.bioclipse.net/index.php?title=MoSS_in_Bioclipse">le=MoSS_in_Bioclipse</a>, and I'm now making use of my own application.<br /><br />As my chEMBL work is coming along I'm at the moment working on a specific working flow, "<span style="font-style: italic;">from chEMBL to MoSS". </span> With the functionality of SPARQL I am now via java methods accessing compounds from various Kinase protein familes. A method could look like something like this<br /><br /><span style=";font-family:courier new;font-size:85%;" >public IStrin</span><span style=";font-family:courier new;font-size:85%;" >gMatrix MossProtFamilyCompounds(String fam, String actType)<br />throws BioclipseException{<br /><br />String sparql =<br />"PREFIX chembl: <http: se="" chembl="" onto=""> " +<br />"PREFIX bo: <http: org="" chemistryblogs="">"+</http:></http:></span><br /><span style=";font-family:courier new;font-size:85%;" ><http: se="" chembl="" onto=""><http: org="" chemistryblogs=""> "SELECT DISTINCT ?smiles where{ " + " ?target a chembl:Target;" +<br />" chembl:classL5 ?fam. " +<br />" ?assay chembl:hasTarget ?target . " +<br />" ?activity chembl:onAssay ?assay ;" +<br />" chembl:type ?actType ; " +<br />" chembl:forMo</http:></http:></span><span style=";font-family:courier new;font-size:85%;" ><http: se="" chembl="" onto=""><http: org="" chemistryblogs="">lecule ?mol ."+<br />" ?mol bo:smiles ?smiles. " +<br />" FILTER regex(?fam, " + "\"^" + fam + "$\"" + ", \"i\")."+<br />" FILTER regex(?</http:></http:></span><span style=";font-family:courier new;font-size:85%;" ><http: se="" chembl="" onto=""><http: org="" chemistryblogs="">actType, " + "\"^" + actType + "$\"" + ", \"i\")."+<br />" }";<br />IStringMatrix matrix = rdf.sparqlRemote("http://rdf.farmbio.uu.se/chembl/sparql",sparql);</http:></http:></span><br /><span style=";font-family:courier new;font-size:85%;" ><http: se="" chembl="" onto=""><http: org="" chemistryblogs=""> return matrix;</http:></http:></span><br /><span style=";font-family:courier new;font-size:85%;" ><http: se="" chembl="" onto=""><http: org="" chemistryblogs="">}</http:></http:></span><br /><br />Inside this java method there is a SPARQL query which is a string named sparql. It is possible to run a query like this due to the <a href="http://wiki.bioclipse.net/index.php?title=Bioclipse.Rdf">rdf project</a> done by <a href="http://chem-bla-ics.blogspot.com/">Egon</a>. I use that feature when I call <span style="font-family:courier new;">rdf.sparqlRemote</span>, what that command basically do is accessing the SPARQL endpoint(URL) with my query which is made into a String. So for this to work an internet connection must exist.<br />I will try to find something that can check if such a connection exist or not to improve the use of the application(no connection -> no search).<br /><br />The compounds are saved into a file supported by MoSS. This makes it possible for MoSS to run on the compounds dra<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEieY9vX06NN0cSGWSekNUbUuzs094OXrLBIq-Q8bg5MIcks4T4hmtAb3OziQZcV4bYY3T44yf-vSjiBWjNsH6BPRrNxnAjJ2dHsQTf01yx7G9Tb2-MKIEquXUohgdDNuAkGATT_nkwEosE/s1600/Screen+shot+2010-05-19+at+11.39.24.png"><img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 240px; height: 320px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEieY9vX06NN0cSGWSekNUbUuzs094OXrLBIq-Q8bg5MIcks4T4hmtAb3OziQZcV4bYY3T44yf-vSjiBWjNsH6BPRrNxnAjJ2dHsQTf01yx7G9Tb2-MKIEquXUohgdDNuAkGATT_nkwEosE/s320/Screen+shot+2010-05-19+at+11.39.24.png" alt="" id="BLOGGER_PHOTO_ID_5472920651392646002" border="0" /></a>wn from the chEMBL database. Also a java script environment is available.<br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhBdCCkFKsFRxK6TPlRB3WWoUl58Q8jjV6Nsrk0WO_pMu4ohdZv45bu75ME48Xqx0HBexGZqJLg37QpYbJN1VXavMUWo8aJzUzOeCNhzitQNlX3QhTu3gB6CDtEQEGNgfZY9-9sJaTEnns/s1600/Screen+shot+2010-05-19+at+12.00.06.png"><img style="float: left; margin: 0pt 10px 10px 0pt; cursor: pointer; width: 247px; height: 308px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhBdCCkFKsFRxK6TPlRB3WWoUl58Q8jjV6Nsrk0WO_pMu4ohdZv45bu75ME48Xqx0HBexGZqJLg37QpYbJN1VXavMUWo8aJzUzOeCNhzitQNlX3QhTu3gB6CDtEQEGNgfZY9-9sJaTEnns/s320/Screen+shot+2010-05-19+at+12.00.06.png" alt="" id="BLOGGER_PHOTO_ID_5472919386741071090" border="0" /></a><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br />The pictures shows(top) the moss-chembl wizard and (bottom) the moss wizard.<br /><br />The moss-chembl applications is dynamic which means that you can search for wanted compounds and look at them directly. This ease the work a lot! Also to be mentioned is that the compounds are at the moment only compounds that bind to a protein in a Kinase Family.<br /><br />When a preferred data set is chosen moss will read in the data and now you are able to perform a substructure mining on them!<br /><br />Next problem to manage Visualization...Annzihttp://www.blogger.com/profile/00745732977527862482noreply@blogger.com1tag:blogger.com,1999:blog-2731796976737539450.post-53776607027271174332010-03-24T08:07:00.000-07:002010-03-24T09:00:32.379-07:00The things you can do with a wizard . . .Now I have started to get a feeling for SPARQL but do you have one?<br />Well I do not want to force anyone to learn new languages all the time therefor I began to develop a wizard. This wizard is far from done but it do mange some functions at the moment which is really cool. As you write an id or keyword SPARQL queries against <a href="http://rdf.farmbio.uu.se/chembl/snorql/">http://rdf.farmbio.uu.se/chembl/snorql/ </a> is on the go returning the values to the wizard. If you change your search the old data will be deleted and the new one displayed.<br /><br />A search may now be done with keywords, SMILES or chebi id to find information about compounds. This search will expand as I implement biological networking to other knowledge bases(<a href="http://chebi.bio2rdf.org/sparql">http://chebi.bio2rdf.org/sparql</a> as an example).<br />If the checkbox for target is check a search with proteins id's, keywords, ec-number etc will take place instead.<br />As you write the table will fill up with various data depending on what you search on.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjsCdTrgKMAoBrGEBPv_XEYSUYpJ85gmM2pqbctNak9rfc8BfMq9VoMwW6mDM9Q8PZMldloTePrkYHaAFRIop20A7KOicpSidWHNAJM2HVqTR7XDfYoun591HO2l6A0xCV1zBaBtj_Lyxs/s1600/Screen+shot+2010-03-24+at+16.29.04.png"><img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 257px; height: 320px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjsCdTrgKMAoBrGEBPv_XEYSUYpJ85gmM2pqbctNak9rfc8BfMq9VoMwW6mDM9Q8PZMldloTePrkYHaAFRIop20A7KOicpSidWHNAJM2HVqTR7XDfYoun591HO2l6A0xCV1zBaBtj_Lyxs/s320/Screen+shot+2010-03-24+at+16.29.04.png" alt="" id="BLOGGER_PHOTO_ID_5452222974367801858" border="0" /></a><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEimECY-IstzPYrt1jCDTy9cSooFnPGfZq3SL15Q2AsoOTUDo0vP4wg2uW528uVdbHg36ktnZZU3C31UR5VNwf3bR9uAd4gGSKZLnd-YZ4uA1cH0ZoFlD2G-8pnHVZEAwJvfkFUEov9dQyo/s1600/Screen+shot+2010-03-24+at+16.09.28.png"><br /></a>The upper picture searches for targets that have some connection to sodium channels. The bottom picture search for a chebi id from a SMILES. Unfortunately I don't know yet how to distinguish between strings written in the box so the line have to end with a # at the moment. Working on solving that...<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiA6s6qGG2MRh4jExFsRx-BHbmdyonzqUBH_Frq5eLgGamZaYHjfcxZ9ufePh6-d3_B-FobDRgY400wtPMXQaIpacALMSJIL-9sDQtP7O6h5d-oISSFgae9bgrTRRgUPY7VOkolc8ttW1M/s1600/Screen+shot+2010-03-24+at+16.25.56.png"><img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 318px; height: 320px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiA6s6qGG2MRh4jExFsRx-BHbmdyonzqUBH_Frq5eLgGamZaYHjfcxZ9ufePh6-d3_B-FobDRgY400wtPMXQaIpacALMSJIL-9sDQtP7O6h5d-oISSFgae9bgrTRRgUPY7VOkolc8ttW1M/s320/Screen+shot+2010-03-24+at+16.25.56.png" alt="" id="BLOGGER_PHOTO_ID_5452222121539359826" border="0" /></a><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEimECY-IstzPYrt1jCDTy9cSooFnPGfZq3SL15Q2AsoOTUDo0vP4wg2uW528uVdbHg36ktnZZU3C31UR5VNwf3bR9uAd4gGSKZLnd-YZ4uA1cH0ZoFlD2G-8pnHVZEAwJvfkFUEov9dQyo/s1600/Screen+shot+2010-03-24+at+16.09.28.png"><span style="display: block;" id="formatbar_Buttons"><span class=" on" style="display: block;" id="formatbar_Add_Image" title="Lägg till bild" onmouseover="ButtonHoverOn(this);" onmouseout="ButtonHoverOff(this);" onmouseup="addImage();" onmousedown="CheckFormatting(event);;ButtonMouseDown(this);"><img src="http://www.blogger.com/img/blank.gif" alt="Lägg till bild" class="gl_photo" border="0" /></span></span></a>Annzihttp://www.blogger.com/profile/00745732977527862482noreply@blogger.com0tag:blogger.com,1999:blog-2731796976737539450.post-75120246194907016652010-03-18T02:43:00.000-07:002010-03-23T06:18:46.241-07:00Interesting SPARQL queries for QSAR and PCM data!The following two <a href="http://www.w3.org/TR/rdf-sparql-query/">SPARQL</a> queries are really interesting for QSAR projects and proteochemometric(PCM) project. By accessing <a href="http://www.ebi.ac.uk/chembl/">chEMBL</a> data via <a href="http://www.blogger.com/www.w3.org/RDF/">RDF</a> with <a href="http://www.w3.org/TR/rdf-sparql-query/">SPARQL</a> I can easily retrieve necessary data to build up these kind of projects.<br /><br />For a QSAR project following query could be used:<br /><br /><span style="font-size:85%;"><span style="font-family:courier new;">var forQSAR = "\</span><br /><span style="font-family:courier new;">PREFIX chembl: </span></span><http: se="" chembl="" onto=""><span style="font-size:85%;"><span style="font-family:courier new;">\</span><br /><span style="font-family:courier new;">PREFIX blueobelisk: </span></span><http: org="" chemistryblogs=""><span style="font-size:85%;"><span style="font-family:courier new;">\</span><br /><span style="font-family:courier new;">SELECT DISTINCT ?act ?ass ?conf ?mol ?SMILES ?val ?unit WHERE { \</span><br /><span style="font-family:courier new;"> ?act chembl:type \"IC50\" ; \</span><br /><span style="font-family:courier new;"> chembl:onAssay ?ass; \</span><br /><span style="font-family:courier new;"> chembl:forMolecule ?mol;\</span><br /><span style="font-family:courier new;"> chembl:standardValue ?val;\</span><br /><span style="font-family:courier new;"> chembl:standardUnits ?unit.\</span><br /><span style="font-family:courier new;"> ?mol blueobelisk:smiles ?SMILES. \</span><br /><span style="font-family:courier new;"> ?ass chembl:hasTarget </span></span><http: se="" chembl="" target="" t10885=""><span style="font-size:85%;"><span style="font-family:courier new;"> ; \</span><br /><span style="font-family:courier new;"> chembl:hasConfScore ?conf. \</span><br /><span style="font-family:courier new;">}";</span></span><br /><br />Since I run my queries through <a href="http://www.bioclipse.net/">Bioclipse</a> the <a href="http://www.w3.org/TR/rdf-sparql-query/">SPARQL</a> query is given a name to ease up the following run ¨<br /><span style="font-size:85%;"><span style="font-family:courier new;">var qsar = rdf.sparqlRemote("http://rdf.farmbio.uu.se/chembl/sparql", forQSAR)</span><br /><span style="font-family:courier new;">chembl.saveCsv("/QSAR/q",qsar)</span></span><br /><br />The query will return unique id's for activity(?act), molecules(?mol) and assays(?ass), SMILES(?SMILES) for the molecules, values(?val) and units(?unit) for the activities and confidence values(?conf). And it is really easy expand the query to return more data!<br /><br />The query for PCM returns unique id's for targets(?target), molecules(?mol) and pubmeds(?pubmed), SMILES(?SMILES), protein sequences(?seq), varoius classifications(?l4, ?l5, ?l6), activities(?type) ans activity values(?val).<br />The activities are narrowed down to only include IC50 and Ki and the ion channels should only be Na(the last two lines in the query).<br /><br />The query looks like the following:<br /><span style="font-size:85%;"><span style="font-family:courier new;">var kic50na ="\</span><br /><span style="font-family:courier new;">PREFIX chembl: <http: se="" chembl="" onto="">\</http:></span><br /><span style="font-family:courier new;">PREFIX blueobelisk: <http: org="" chemistryblogs="">\</http:></span><br /><span style="font-family:courier new;">SELECT DISTINCT ?type ?target ?pubmed ?l4 ?l5 ?l6 ?mol ?SMILES ?val ?seq \</span><br /><span style="font-family:courier new;">WHERE {\</span><br /><span style="font-family:courier new;"> ?act chembl:type ?type;\</span><br /><span style="font-family:courier new;"> chembl:onAssay ?ass;\</span><br /><span style="font-family:courier new;"> chembl:forMolecule ?mol;\</span><br /><span style="font-family:courier new;"> chembl:standardValue ?val.\</span><br /><span style="font-family:courier new;"> ?ass chembl:hasTarget ?target;\</span><br /><span style="font-family:courier new;"> chembl:extractedFrom ?journal.\</span><br /><span style="font-family:courier new;">?ass chembl:hasTargetCount 1 .\</span><br /><span style="font-family:courier new;">?journal <http: org="" ontology="" bibo="" pmid=""> ?pubmed.\</http:></span><br /><span style="font-family:courier new;"> ?mol blueobelisk:smiles ?SMILES.\</span><br /><span style="font-family:courier new;"> ?target a <http: se="" chembl="" onto="" target=""> ;\</http:></span><br /><span style="font-family:courier new;"> chembl:classL3 \"VGC\" ;\</span><br /><span style="font-family:courier new;"> chembl:classL4 ?l4 ;\</span><br /><span style="font-family:courier new;"> chembl:classL5 ?l5 ;\</span><br /><span style="font-family:courier new;"> chembl:classL6 ?l6 ;\</span><br /><span style="font-family:courier new;"> chembl:sequence ?seq.\</span><br /><span style="font-family:courier new;">FILTER regex(?l6, \"NA\")\</span><br /><span style="font-family:courier new;">FILTER (?type = \"Ki\" || ?type = \"IC50\")\</span><br /><span style="font-family:courier new;">}";</span></span><br /><br />One problem that was encountered here was that the assays are not always specified for one target but for many which lead to the return of the same information for different targets. This was solved by <a href="http://chem-bla-ics.blogspot.com/">Egon</a> who created </http:></http:></http:><span style="font-size:85%;"><span style="font-family:courier new;">?ass chembl:hasTargetCount 1 </span></span><http: se="" chembl="" onto=""><http: org="" chemistryblogs=""><http: se="" chembl="" target="" t10885="">to solve this problem. That line says that the assays should only contain one target to accurate data for PCM.<br /></http:></http:></http:>Annzihttp://www.blogger.com/profile/00745732977527862482noreply@blogger.com0tag:blogger.com,1999:blog-2731796976737539450.post-61230684636126413752010-03-11T04:25:00.000-08:002010-03-11T04:26:55.708-08:00Review of Towards pharmacogenomics knowledge discovery with the semantic webMe and Jonathan Alvarsson made a review on the article <a href="http://uucheminfoclub.blogspot.com/2010/03/review-of-towards-pharmacogenomics.html">Towards pharmacogenomics knowledge discovery with the semantic web</a>.<br /><br />Have a look!Annzihttp://www.blogger.com/profile/00745732977527862482noreply@blogger.com0tag:blogger.com,1999:blog-2731796976737539450.post-4866872092336853462010-03-08T00:29:00.000-08:002010-03-08T00:38:00.357-08:00Background presentationI held this presentation for the department last week. It's basically a presentation about the background and progress of the project. Enjoy!<br /><br /><div style="width:425px" id="__ss_3362884"><strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/annzi/pharmaceutical-knowledge-retrieval-through-reasoning-of-chembl-rdf" title="Pharmaceutical Knowledge retrieval through Reasoning of ChEMBL RDF">Pharmaceutical Knowledge retrieval through Reasoning of ChEMBL RDF</a></strong><object width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=pp2-100308022736-phpapp02&stripped_title=pharmaceutical-knowledge-retrieval-through-reasoning-of-chembl-rdf" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=pp2-100308022736-phpapp02&stripped_title=pharmaceutical-knowledge-retrieval-through-reasoning-of-chembl-rdf" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object><div style="padding:5px 0 12px">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/annzi">annzi</a>.</div></div>Annzihttp://www.blogger.com/profile/00745732977527862482noreply@blogger.com1tag:blogger.com,1999:blog-2731796976737539450.post-4918159878861313472010-03-01T01:55:00.000-08:002010-03-01T02:18:43.386-08:00Update postI have so many half-finished sub-project that I don't have anything interesting to blog about hence my update post!<br /><br />My sub-projects:<br /><br />Looking into other syntax languages, especially Manchester OWL syntax. In <a href="http://uucheminfoclub.blogspot.com/">Journal Club</a> we read the article <span style="font-weight:bold;">Towards pharmacogenomics knowledge discovery with the semantic web</span> and encountered the Manchester syntax language. I will blog about it when I'm done. And speaking of Journal Club I'm also writing a review together with <a href="http://www.blogger.com/profile/01302696904346930017">Jonathan</a>. And have to find time to read the next article....<br /> <br />Moss Manager needs to be rearranged since the net.bioclipse.rdf plug-in no longer returns lists of arraylist. It now amazingly returns String Matrices which will make things so much easier especially when I'm only interested in the SMILES part of the SPARQL outcome. <br /><br />I'm also working on a presentation that I'm going to present on Thursday 4/3. I will try to put it up here afterwards. It's about the background and status of this project. (Spend hours on creating a gantt chart in excel..well I'm not friends with excel anymore..)<br /><br />And last I'm trying to structure up a new bioclipse plug-in for drug/compound, target and other valuable info retrieval i.e. query ChEMBL in a effective and powerful way with SPARQL. <br /><br />To learn from this post, use TODO lists =0)Annzihttp://www.blogger.com/profile/00745732977527862482noreply@blogger.com0tag:blogger.com,1999:blog-2731796976737539450.post-80775328554814217522010-02-19T06:16:00.000-08:002010-02-19T07:00:07.141-08:00moss + manager = trueMy goal this week was to have integrated moss into a Bioclipse manager.
<br />Well I'm almost there.=0)
<br />
<br />So this is what I have done the later part of the week. I also managed to run some SPARQL queries and got some problems to figure out there, main focus next week.
<br />
<br />Most parts of Moss now work, although there are some settings that involves combining masks that are not quite finished yet. I actually think that I've spend most my hours on this and still not done...grr
<br />
<br />Since Moss have over 30 different parameters I found it important to have a method that shows them. But just now I realized that this is what <span style="font-family: courier new;font-size:85%;" >man moss</span> is for. Well its the same text so no worries there.
<br />Taken from the method though it will look something like this:
<br /><span style="font-family: courier new;font-size:85%;" >> moss.parameterDescription()
<br />Examplea moss.createParamteters("aromatic", "always"),
<br />moss.createParamteters("minEmbed", 6)
<br />
<br />aromatic: ("aromatic", "never"/"upgrade"/"downgrade") |"String"
<br />canonic: ("canonicequiv", false/true) |boolean
<br />canonicEquiv: ("canonic", true/false) |boolean
<br />carbonChainLength: ("carbonChainLength", true/false) |boolean
<br />class: not for use
<br />closed: ("closed", true/false) |boolean
<br />exNode: ("exNode", "Atom") |"String"
<br />exSeed: ("exSeed", "Atom") |"String"
<br />extPrune: ("extPrune", "none"/"full"/"partial"/) |"String"
<br />ignoreAtomTypes: ("ignoreAtomTypes", "never"/"always"/"in rings") |"String"
<br />ignoreBond: ("ignoreBond", "never"/"always"/"in rings") |"String"
<br />kekule: ("kekule", true/false) |boolean
<br />limits: not for use
<br />matchAromaticityAtoms: ("matchChargeOfAtoms", "never"/"always"/"in rings")
<br />|"String"
<br />matchChargeOfAtoms: ("matchAromaticityAtoms", "match"/"no match") |"String"
<br />matom: not for use
<br />maxEmbMemory: ("maxEmbMemory", value) |integer
<br />maxEmbed: ("maxEmbed",value) |integer
<br />maxRing: ("maxRing", value) |integer
<br />maximalSupport: ("maximalSupport", value) |double
<br />mbond: not for use
<br />minEmbed: ("minEmbed", value) |integer
<br />minRing: ("maxRing", value) |integer
<br />minimalSupport: ("minimalSupport", value) |double
<br />mode: not for use
<br />mrgat: not for use
<br />mrgbd: not for use
<br />ringExtension: ("ringExtension", "none"/"full"/"merge"/"filter") |"String"
<br />seed: ("seed", "Atom") |"String"
<br />split: ("split", true/false) |boolean
<br />threshold: ("threshold", value) |double
<br />unembedSibling: ("unembedSibling", false/true) |boolean</span>
<br />
<br />Will immediately start working on the manager.
<br />
<br />I figure that it would be nice to have one method that sets the parameters, in this case createParameters(). The first input specifies what you want to set and the second argument provides the value. The arguments is handled by the following method,<span style="font-family: courier new;font-size:85%;" >
<br />public String createParameters(String propertyName, Object value) throws Exception{
<br />
<br /> if(value.getClass().equals(Double.class)){
<br /> value= ((Double) value).intValue();
<br /> int values = (Integer) value;
<br /> mossbean.setParameters(mossbean, propertyName, values);
<br /> }else{
<br /> mossbean.setParameters(mossbean, propertyName, value);
<br /> }
<br /> return value +" is set to " +propertyName;
<br /> }</span>
<br />When trying out moss myself I got irritated that I forgot the values of my parameters hence the method parameterValues() was created. It returns the current values of all parameters:
<br /><span style="font-family: courier new;font-size:85%;" >> moss.parameterValues()
<br />aromatic:
<br />canonic: true
<br />canonicEquiv: false
<br />carbonChainLength:
<br />class: class net.bioclipse.moss.business.backbone.MossBean
<br />closed: true
<br />exNode: H
<br />exSeed:
<br />extPrune:
<br />ignoreAtomTypes:
<br />ignoreBond:
<br />kekule:
<br />limits: 0.0
<br />matchAromaticityAtoms:
<br />matchChargeOfAtoms:
<br />maxEmbMemory: 0
<br />maxEmbed: 0
<br />maxRing: 0
<br />maximalSupport: 0.02
<br />minEmbed: 0
<br />minRing: 0
<br />minimalSupport: 0.1
<br />ringExtension: none
<br />seed:
<br />split: false
<br />threshold: 0.5
<br />unembedSibling: false</span>
<br />
<br />I will also create a method that restores the values to default since it is valuable to the end-user.
<br />I can't figure out though how to return an arraylist in a smooth way. I returned it as a String, this is how I've done it
<br /><span style="font-family: courier new;font-size:85%;" > public String parameterValues() throws Exception{
<br /> ArrayList<string> name = mossbean.getPropertyNames(mossbean);
<br /> String info="";
<br /> String names;
<br /> for(int i=0; i<name.size(); i++){
<br /> names = name.get(i);
<br /> info= info + names +": " + mossbean.getProperty(mossbean, names) + " \n";
<br /> }
<br /> return info;</span>
<br />
<br />If you know something better, please tell!
<br />
<br />Mostly polishing left when it comes to Moss but (perhaps) bigger mask combination parts to, it depends on the outcome of my Moss tests(which I will do when it's not Friday afternoon and I have a sharp mind).
<br />
<br />Next week main focus is to develop SPARQL queries again!
<br />
<br />Annzihttp://www.blogger.com/profile/00745732977527862482noreply@blogger.com6tag:blogger.com,1999:blog-2731796976737539450.post-17222993243519405922010-02-12T00:03:00.000-08:002010-02-12T00:41:06.071-08:00Approching substructure miningWith a simple query like the one below random compounds from kinases from the Tk family is collected. I would like to filter the standard value to be under a certain value but I have some problem with doing that in Bioclipse, via SPARQL endpoint I've managed to create this filter. Will work on it.<br /><br /><div style="text-align: justify;"><span style="font-family: courier new;font-size:78%;" >var allsmiles = " \</span><br /><span style="font-family: courier new;font-size:78%;" >PREFIX onto: <http://rdf.farmbio.uu.se/chembl/onto/#> \</span><br /><span style="font-family: courier new;font-size:78%;" >PREFIX blueobelisk: <http://www.blueobelisk.org/chemistryblogs/> \</span><br /><span style="font-family: courier new;font-size:78%;" >\</span><br /><span style="font-family: courier new;font-size:78%;" >SELECT DISTINCT ?smiles \</span><br /><span style="font-family: courier new;font-size:78%;" > WHERE { \</span><br /><span style="font-family: courier new;font-size:78%;" >?target a onto:Target . \</span><br /><span style="font-family: courier new;font-size:78%;" >?target onto:classL5 \"Tk\" . \</span><br /><span style="font-family: courier new;font-size:78%;" >?target onto:classL6 ?L6 . \</span><br /><span style="font-family: courier new;font-size:78%;" >?assay onto:hasTarget ?target . \</span><br /><span style="font-family: courier new;font-size:78%;" >?activity onto:onAssay ?assay . \</span><br /><span style="font-family: courier new;font-size:78%;" >?activity onto:standardValue ?st . \</span><br /><span style="font-family: courier new;font-size:78%;" >?activity onto:forMolecule ?mol . \</span><br /><span style="font-family: courier new;font-size:78%;" >?mol blueobelisk:smiles ?smiles . \</span><br /><span style="font-family: courier new;font-size:78%;" >}LIMIT 20 \</span><br /><span style="font-family: courier new;font-size:78%;" >";</span><br /><span style="font-family: courier new;font-size:78%;" >var all = rdf.sparqlRemote("http://rdf.farmbio.uu.se/chembl/sparql", allsmiles)</span><br /><div style="text-align: left;"><span style="font-family: courier new;font-size:85%;" >var all </span>now contains a list of molecules that I is saved in a file via the net.bioclipse.moss.business plug-in.<br /><br /></div><span style="font-family: courier new;font-size:85%;" >moss.saveMoss(String fileName, List all) </span><br />Will create a file that support moss(id, threshold value, description), not complete<br /><span style="font-family: courier new;font-size:85%;" >0,0,Cc1nc(N)sc1c2ccnc(Nc3cccc(c3)[N+](=O)[O-])n2<br />1,0,COc1cc(Nc2c(cnc3cc(OCCC4CCN(C)CC4)c(OC)cc23)C#N)c(Cl)cc1Cl<br />2,0,CCOc1cc(Nc2c(cnc3cc(OCC4CCN(C)CC4)c(OC)cc23)C#N)c(Cl)cc1Cl<br />3,0,COc1ccc(C)c(Nc2c(cnc3cc(OCC4CCN(C)CC4)c(OC)cc23)C#N)c1<br />4,0,COc1ccc(Cl)c(Nc2c(cnc3cc(OCC4CCN(C)CC4)c(OC)cc23)C#N)c1<br />5,0,COc1cc2c(Nc3ccc(C)cc3C)c(cnc2cc1OCC4CCN(C)CC4)C#N<br />6,0,COc1cc(Nc2c(cnc3cc(OCC4CCN(C)CC4)c(OC)cc23)C#N)c(C)cc1C</span><br /><br />This file now have to be initialized, add parameters for the run and when done simply run.<br /><br /><span style="font-family: courier new;font-size:85%;" >> moss.saveMoss("/Moss/Test/collected", all)<br />> moss.init("/Moss/Test/collected")<br />done<br />>moss.setLimits(10,2)<br />> moss.run("/Moss/Test/collectedOut", "/Moss/Test/collectedOutId")</span><br /><br /><div style="text-align: left;">Only two basic parameter settings work at the moment, this is something to be added as soon as possible. It will take time though since lots of parameters are set by combining flags which I remember to be a crucial thing to do.<br /><br />To read about how MoSS works, how to understand the output files etc look at Christian Borgelt homepage, <a href="http://www.borgelt.net/doc/moss/moss.html">http://www.borgelt.net/doc/moss/moss.html</a>.<br /></div>Output file(not complete):<br /><span style="font-family: courier new;font-size:78%;" >id,description,nodes,edges,s_abs,s_rel,c_abs,c_rel<br />1,n1:c2:c(:c(-N-c3:c(-Cl):c:c(-Cl):c(-O-C):c:3):c(-C#N):c:1):c:c(-O-C):c(-O-C-C1-C-C-N(-C-C-1)-C):c:2,34,37,2,10.0,0,0.0<br />2,n1:c2:c(:c(-N-c3:c(-Cl):c:c:c(-O-C):c:3):c(-C#N):c:1):c:c(-O-C):c(-O-C-C1-C-C-N(-C-C-1)-C):c:2,33,36,3,15.0,0,0.0<br />3,n1:c2:c(:c(-N-c3:c(-Cl):c:c(-Cl):c(-O-C):c:3):c(-C#N):c:1):c:c(-O-C):c(-O-C-C(-C-C)-C):c:2,31,33,3,15.0,0,0.0<br />4,n1:c2:c(:c(-N-c3:c(-Cl):c:c:c(-O-C):c:3):c(-C#N):c:1):c:c(-O-C):c(-O-C-C(-C-C)-C):c:2,30,32,4,20.0,0,0.0</span><br /><br />Output file Id(not complete)<br /><span style="font-family: courier new;font-size:78%;" >id:list<br />1:2,10<br />2:2,4,10<br />3:2,9,10<br />4:2,4,9,10<br />5:2,7,10<br />6:2,4,7,10</span><br /><br />Want to be able to visualize the result in tables later on, perhaps together with the input and other information collected via SPARQL.<br /></div>Annzihttp://www.blogger.com/profile/00745732977527862482noreply@blogger.com4tag:blogger.com,1999:blog-2731796976737539450.post-13208950016482889522010-02-09T07:15:00.000-08:002010-02-12T00:42:26.870-08:00Fun stuff with SPARQLI will give you an example of a SPARQL query. I've been running them on the snorql interface <a href="http://rdf.farmbio.uu.se/chembl/snorql/">http://rdf.farmbio.uu.se/chembl/snorql/</a> which is based on <a href="ftp://ftp.ebi.ac.uk/pub/databases/chembl/releases/chembl_02/">ChEMBL02</a>.<br /><br />Example .<br />This experiment started out by me wanting to know more about activities. About its standard values and units, types. But then I kept on going looking at molecules connected to a specific activity which led me to collecting their SMILES. Through the connection between activities and resource I managed to get their pubmed id's. Via assay id I managed to get targets and filtered organism to Homo sapiens. Figure 1 displays the result from the example code.<br />Example code:<span style=";font-family:courier new;font-size:78%;" >PREFIX chemblt: <http: se="" chembl="" targettype=""><br />PREFIX hmm: <http: org="" taxonomy=""><br />PREFIX onto: <http: se="" chembl="" onto=""><br />PREFIX blueobelisk: <http: org="" chemistryblogs="">PREFIX dbpedia: <http: org=""><br /><span style=";font-family:courier new;font-size:78%;" ><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiAxG9Cvv4-GTeXf_GjIIiP0buXEXZAPbI5HcuaKxEOgpJoopuCWSsDCWjCtZX9ARblY3sWCmu5igsskUxpFUAvFJYlUbqSNHS-DNzLE1SMh-_wHkGiB29OtcUxA0mybtYBEb6y-5qgndc/s1600-h/Screen+shot+2010-02-09+at+16.59.38.png"><br /></a></span>SELECT DISTINCT ?target ?organism ?activities ?smiles ?type ?unit ?sval ?res ?pubmed<br />WHERE {<br />#get activities with its data<br />?activities a onto:Activity .<br />?activities onto:standardValue ?sval .<br />?activities onto:type ?type .</http:></http:></http:></http:></http:></span><br /><span style=";font-family:courier new;font-size:78%;" ><http: se="" chembl="" targettype=""><http: org="" taxonomy=""><http: se="" chembl="" onto=""><http: org="" chemistryblogs=""><http: org=""> ?activities onto:standardUnits ?unit .<br />#get compounds for those activies<br />?activities onto:forMolecule ?mol .?mol blueobelisk:smiles ?smiles .</http:></http:></http:></http:></http:></span><br /><span style=";font-family:courier new;font-size:78%;" ><http: se="" chembl="" targettype=""><http: org="" taxonomy=""><http: se="" chembl="" onto=""><http: org="" chemistryblogs=""><http: org=""><br /># get resource id and pubmed article<br />?activities onto:extractedFrom ?res .?res <http: org="" ontology="" bibo="" pmid=""> ?pubmed.<br /><br />#get assay for activity</http:></http:></http:></http:></http:></http:></span><span style=";font-family:courier new;font-size:78%;" ><http: se="" chembl="" targettype=""><http: org="" taxonomy=""><http: se="" chembl="" onto=""><http: org="" chemistryblogs=""><http: org=""><http: org="" ontology="" bibo="" pmid=""> ?activities onto:onAssay ?assay .<br />?assay onto:hasTarget ?target .<br />?target onto:hasTargetType chemblt:PROTEIN .?target onto:organism ?organism .</http:></http:></http:></http:></http:></http:></span><br /><span style=";font-family:courier new;font-size:78%;" ><http: se="" chembl="" targettype=""><http: org="" taxonomy=""><http: se="" chembl="" onto=""><http: org="" chemistryblogs=""><http: org=""><http: org="" ontology="" bibo="" pmid="">FILTER regex(?organism, "Homo sapiens") .<br />FILTER regex(?type, "^Kd") .<br /><br />}LIMIT 5</http:></http:></http:></http:></http:></http:></span><br /><span style=";font-family:courier new;font-size:78%;" ><span style=";font-family:courier new;font-size:78%;" ><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVWAfWT3sbAg0lM4Uzp6K20IkQguWp7YX2AnNZpSnkQlWNqPSkgAS2mU8OV65UZKwozJLzwC0tynzzcx5s5lcCjJpgSjfnqKqViBkN_7ZJu2MjjJaPXWASa0SqkoP1iA-t7UHZr2xzlUs/s1600-h/Screen+shot+2010-02-09+at+16.26.56.png"><img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 320px; height: 200px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVWAfWT3sbAg0lM4Uzp6K20IkQguWp7YX2AnNZpSnkQlWNqPSkgAS2mU8OV65UZKwozJLzwC0tynzzcx5s5lcCjJpgSjfnqKqViBkN_7ZJu2MjjJaPXWASa0SqkoP1iA-t7UHZr2xzlUs/s320/Screen+shot+2010-02-09+at+16.26.56.png" alt="" id="BLOGGER_PHOTO_ID_5436271231427204050" border="0" /></a></span></span><span style=";font-family:courier new;font-size:78%;" > Figure 1. The results for example 1. </span><br /><br />I also began to run queries that are more suitable for my work. Queries that are able to differentiate different kinase protein families (<a href="http://www.sarfari.org/kinasesarfari/family">http://www.sarfari.org/kinasesarfari/family</a>). For instance:<br /><http: se="" chembl="" targettype=""><http: org="" taxonomy=""><http: se="" chembl="" onto=""><http: org="" chemistryblogs=""><http: org=""><http: org="" ontology="" bibo="" pmid=""><span style=";font-family:courier new;font-size:78%;" ><span style=";font-family:courier new;font-size:78%;" ><span style=";font-family:courier new;font-size:78%;" ><span style=";font-family:courier new;font-size:78%;" ><span style=";font-family:courier new;font-size:78%;" ><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgq0ujf_zMDxuA97MfIIlTTTLB2D0tJknq2rpAwSDKZfNNzesKlOihQ3Kld27jqeBRs39OnoioZQlVMQyOzb2fdLbnvHOCbphATpOlVKOie64Vpj5nW0Htkr2c2DhVJhHRfAuugrysEPBI/s1600-h/Screen+shot+2010-02-09+at+17.01.17.png"><img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 400px; height: 315px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgq0ujf_zMDxuA97MfIIlTTTLB2D0tJknq2rpAwSDKZfNNzesKlOihQ3Kld27jqeBRs39OnoioZQlVMQyOzb2fdLbnvHOCbphATpOlVKOie64Vpj5nW0Htkr2c2DhVJhHRfAuugrysEPBI/s400/Screen+shot+2010-02-09+at+17.01.17.png" alt="" id="BLOGGER_PHOTO_ID_5436274594593790562" border="0" /></a>Figure 2. An example of targets that belong to the same protein family Tk.<br /></span></span></span></span></span></http:></http:></http:></http:></http:></http:>Annzihttp://www.blogger.com/profile/00745732977527862482noreply@blogger.com2tag:blogger.com,1999:blog-2731796976737539450.post-24948921144809030442010-02-08T01:55:00.000-08:002010-02-08T02:15:44.880-08:00A whole new world...I've seen a completely new world when looking into the functions of Bio2RDF. I see great linking between knowledge. To be able to collect information from one knowledge base and link to another obtaining more information and always extending knowledge is great!<br /><br />In my work in running queries against ChEMBL to collect active (later on also inactive) compounds I find this linking valuable. For example drugbank holds lots of great information about the compound. Not only physical information but also id's such as chebi id that will make a linking to chebi possible.<br /><br />Kegg is an other kb that could be useful, Kegg:ligand, Kegg:drug and Kegg:compound. Uniprot could provide article info, chebi compound info, PDB could give target protein data, etc.<br /><br />I believe that users should be able to decide what kind of information they want/need. This aim could be solved when interacting with Bioclipse. My aim is to use substructure mining on the drugs but of course other aims should be possible(the use of other Bioclipse plug-ins than Moss).<br /><br />Perhaps a table representation is a nice way to display the data. And if lots of information about a drug is wanted perhaps info page is the way to display it.<br /><br />DBpedia is another valuable source way to get resources: names, descriptions, inchi's, smiles, images etc. One mayor disadvantage is that unknown compound will not be found.<br /><br />And now I got a link chem2bio2rdf from my supervisor. It has collected all chemical URI's in one place, I will immediately look into it and run queries!Annzihttp://www.blogger.com/profile/00745732977527862482noreply@blogger.com2tag:blogger.com,1999:blog-2731796976737539450.post-73354506469534168372010-02-03T23:43:00.000-08:002010-02-04T00:36:33.200-08:00As my project description changed a bit and there been other obstacles to get by I haven't got as far as I expected. But at least know I have primary goal.<br /><br />I'm now focusing on selecting two protein families to run a substructure mining on. Well actually now I even divided that one in to looking at one family(since they are big!). I want to find a protein that have ligands that are active. As there are many different types of activities I need to dig deeper in this area. I think I also have to find a threshold for activity to be able to reduce number of ligands--the higher the affinity the better, as there are so many.<br /><br />I also looked a bit on the substructure mining algorithm that I'm using, MoSS. I been trying to run random ligands but it takes for ever most of the time the run didn't finish. As I assumed that there are bugs to fix I will try to run through the an other software to be sure that it is possible to run such complex structures via this kind of algorithm. If it works great, MoSS has some improvements steps to look forward to.<br /><br />I also have to update MoSS in to the current Bioclipse standard, such as implementing a manager.<br /><br />Yesterday some parts of the SPARQL endpoint <a href="http://pele.farmbio.uu.se/chembl/snorql/">http://pele.farmbio.uu.se/chembl/snorql/</a> started to work again(!) which simplifies many things for me as I been able to run queries to find activities that are active and also to find their target id's (tid).<br /><br />A question that runs in my mind is how to find family information? I can't seem to find any class..Annzihttp://www.blogger.com/profile/00745732977527862482noreply@blogger.com0tag:blogger.com,1999:blog-2731796976737539450.post-60652691800540909262010-01-29T08:02:00.000-08:002010-01-29T08:17:28.999-08:00My first weekWell this week mainly consisted in reading book, articles and tutorials. I'm really eager to start programming now! There are lots of things the semantic web touches that I never heard of before. So my reading consisted of getting to know RDF, SPARQL and also some OWL. I also got to know git, I really enjoyed <a href="http://learn.github.com/">http://learn.github.com/</a>, a great tutorial. I also looked into ChEMBL <a href="http://www.ebi.ac.uk/chembl/">http://www.ebi.ac.uk/chembl/</a>, trying to get to know the structure of its database.<br /><br />I managed to checkout Bioclipse and was also able to get my old project MoSS <a href="http://wiki.bioclipse.net/index.php?title=MoSS_in_Bioclipse">http://wiki.bioclipse.net/index.php?title=MoSS_in_Bioclipse</a>, to run. Bioclipse changed a lot since I worked with it and need to catch up on it before I start working with it which I will do in the beginning of next week.Annzihttp://www.blogger.com/profile/00745732977527862482noreply@blogger.com1tag:blogger.com,1999:blog-2731796976737539450.post-49871409111093971982010-01-28T02:42:00.000-08:002010-01-28T02:45:18.780-08:00First post!Finally my first post on the blog!Annzihttp://www.blogger.com/profile/00745732977527862482noreply@blogger.com2