As my project description changed a bit and there been other obstacles to get by I haven't got as far as I expected. But at least know I have primary goal.
I'm now focusing on selecting two protein families to run a substructure mining on. Well actually now I even divided that one in to looking at one family(since they are big!). I want to find a protein that have ligands that are active. As there are many different types of activities I need to dig deeper in this area. I think I also have to find a threshold for activity to be able to reduce number of ligands--the higher the affinity the better, as there are so many.
I also looked a bit on the substructure mining algorithm that I'm using, MoSS. I been trying to run random ligands but it takes for ever most of the time the run didn't finish. As I assumed that there are bugs to fix I will try to run through the an other software to be sure that it is possible to run such complex structures via this kind of algorithm. If it works great, MoSS has some improvements steps to look forward to.
I also have to update MoSS in to the current Bioclipse standard, such as implementing a manager.
Yesterday some parts of the SPARQL endpoint http://pele.farmbio.uu.se/chembl/snorql/ started to work again(!) which simplifies many things for me as I been able to run queries to find activities that are active and also to find their target id's (tid).
A question that runs in my mind is how to find family information? I can't seem to find any class..