In this paper we present a method for evaluating the importance of GO terms which compose multi-attribute rules. The rules are generated for the purpose of biological interpretation of gene groups. Each multi-attribute rule is a combination of GO terms and, based on relationships among them, one can obtain a functional description of gene groups. We present a method which allows evaluating the influence of a given GO term on the quality of a rule and the quality of a whole set of rules. For each GO term, we compute how big its influence on the quality of generated set of rules and therefore the quality of the obtained description is. Based on the computed quality of GO terms, we propose a new algorithm of rule induction in order to obtain a more synthetic and more accurate description of gene groups than the description obtained by initially determined rules. The obtained GO terms ranking and newly obtained rules provide additional information about the biological function of genes that compose the analyzed group of genes.
@article{bwmeta1.element.bwnjournal-article-amcv20i3p555bwm, author = {Marek Sikora and Aleksandra Gruca}, title = {Quality improvement of rule-based gene group descriptions using information about GO terms importance occurring in premises of determined rules}, journal = {International Journal of Applied Mathematics and Computer Science}, volume = {20}, year = {2010}, pages = {555-570}, zbl = {1203.62200}, language = {en}, url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-amcv20i3p555bwm} }
Marek Sikora; Aleksandra Gruca. Quality improvement of rule-based gene group descriptions using information about GO terms importance occurring in premises of determined rules. International Journal of Applied Mathematics and Computer Science, Tome 20 (2010) pp. 555-570. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-amcv20i3p555bwm/
[000] Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules, VLDB'94, Proceedings of the 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, pp. 487-499.
[001] Agresti, A. (2002). Categorical Data Analysis, Wiley Interscience, Hoboken, NJ. | Zbl 1018.62002
[002] Al-Shahrour, F., Minguez, P., Vaquerizas, J., Conde, L. and Dopazo, J. (2005). Babelomics: A suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments, Nucleic Acids Research 33: W460-W464.
[003] An, A. and Cercone, N. (2001). Rule quality measures for rule induction systems: Description and evaluation, Computational Intelligence 17(3): 409-424.
[004] Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., Harris, M., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J., Richardson, J., Ringwald, M., Rubin, G. and Sherlock, G. (2000). Gene ontology: Tool for the unification of biology, Nature Genetics 25(1): 25-29.
[005] Bairagi, R. and Suchindran, C. (1989). An estimator of the cutoff point maximizing sum of sensitivity and specificity, Sankhya, Indian Journal of Statistics 51(B-2): 263-269.
[006] Baldi, P. and Hatfield, G. (2002). DNA Microarrays and Gene Expression, Cambridge University Press, Cambridge.
[007] Banzhaf, J. (1965). Weighted voting doesn‘t work: A mathematical analysis, Rutgers Law Review 19(2): 317-343.
[008] Benjamini, Y. and Hochberg, T. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing, Journal of the Royal Statistical Society: Series B 57(1): 289-300. | Zbl 0809.62014
[009] Bruckmann, A., Hensbergen, P., Balog, C., Deelder, A., de Steensma, H. and van Heusden, G. (2007). Posttranscriptional control of the saccharomyces cerevisiae proteome by 14-3-3 proteins, Journal of Proteome Research 6(5): 1689-1699.
[010] Brzezinska, I., Greco, S. and Slowinski, R. (2007). Mining pareto-optimal rules with respect to support and confirmation or support and anti-support, Engineering Applications of Artificial Intelligence 20(5): 587-600.
[011] Carmona-Saez, P., Chagoyen, M., Rodriguez, A., Trelles, O., Carazo, J. and Pascual-Montano, A. (2006). Integrated analysis of gene expression by association rules discovery, BMC Bioinformatics 7(1): 54.
[012] Carmona-Saez, P., Chagoyen, M., Tirado, F., Carazo, J. and Pascual-Montano, A. (2007). Genecodis: A web-based tool for finding significant concurrent annotations in gene lists, Genome Biology 8(1): R3.
[013] Eisen, M., Spellman, P., Brown, P. and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns, Proceedings of the National Academy of Sciences of the United States of America 95(25): 14863-14868.
[014] Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P. (1996). From data mining to knowledge discovery: An overview, in U. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy (Eds.), Advances in Knowledge Discovery and Data Mining, American Association for Artificial Intelligence, Menlo Park, CA, pp. 1-34.
[015] Fürnkranz, J. (1999). Separate-and-conquer rule learning, Artificial Intelligence Review 13(1): 3-54. | Zbl 0922.68030
[016] Fürnkranz, J. and Flach, P. (2005). Roc'n'rule learning - Towards a better understanding of covering algorithms, Machine Learning 58(1): 39-77. | Zbl 1075.68071
[017] Greco, S., Pawlak, Z. and Słowiński, R. (2004). Can Bayesian confirmation measures be useful for rough set decision rules?, Engineering Applications of Artificial Intelligence 17(4): 345-361.
[018] Greco, S., Słowiński, R. and Stefanowski, J. (2007). Evaluating importance of conditions in the set of discovered rules, RSFDGrC '07: Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Toronto, Ontario, Canada, pp. 314-321.
[019] Gruca, A. and Sikora, M. (2009). Ontological description of gene groups by the multiattribute statistically significant logical rules, in S. Safeeullah (Ed.), Engineering the Computer Science and IT, INTECH, Vukovar, pp. 277-303.
[020] Gruca, A., Sikora, M., Chróst, Ł. and Polański, A. (2009). Rulego. Bioinformatical internet service system architecture, Proceedings of the 16th Conference on Computer Networks. Communications in Computer and Information Sciences, Wisła, Poland, pp. 160-167.
[021] Grzymała-Busse, J., Stefanowski, J. and Wilk, S. (2005). A comparison of two approaches to data mining from imbalanced data, Journal of Intelligent Manufacturing 16(6): 565-573.
[022] Grzymała-Busse, J. and Ziarko, W. (2003). Data mining based on rough sets, in J. Wang (Ed.), Data Mining: Opportunities and Challenges, IGI Publishing, Hershey, PA, pp. 142-173.
[023] Guillet, F. and Hamilton, H. (2007). Quality Measures in Data Mining (Studies in Computational Intelligence), Springer-Verlag New York, Inc., Secaucus, NJ. | Zbl 1106.68359
[024] Hackenberg, M. and Matthiesen, R. (2008). Annotationmodules: A tool for finding significant combinations of multisource annotations for gene lists, Bioinformatics 24(11): 1386-1393.
[025] Hvidsten, T., Legreid, A. and Komorowski, H. (2003). Learning rule-based models of biological process from gene expression time profiles using gene ontology, Bioinformatics 19(9): 1116-1123.
[026] Iyer, V., Eisen, M., Ross, D., Schuler, G., Moore, T., Lee, J., Trent, J., Staudt, L., Hudson, J., Boguski, M., Lashkari, D., Shalon, D., Botstein, D. and Brown, P. (1999). The transcriptional program in the response of human fibroblasts to serum, Science 283(5398): 83-87.
[027] Kano, M., Morishita, Y., Iwata, C., Iwasaka, S., Watabe, T., Ouchi, Y., Miyazono, K. and Miyazawa, K. (2005). Vegfa and fgf-2 synergistically promote neoangiogenesis through enhancement of endogenous pdgf-b-pdgfrbeta signaling, Journal of Cell Science 118(Pt 16): 3759-3768.
[028] Khatri, P. and Drăghici, S. (2005). Ontological analysis of gene expression data: Current tools, limitations, and open problems, Bioinformatics 21(18): 3587-3595.
[029] Maere, S., Heymans, K. and Kuiper, M. (2005). Bingo: A cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks, Bioinformatics 21(16): 3448-3449.
[030] Mata-Greenwood, E., Meyrick, B., Soifer, S., Fineman, J. and Black, S. (2003). Expression of vegf and its receptors flt-1 and flk-1/kdr is altered in lambs with increased pulmonary blood flow and pulmonary hypertension, American Journal of Physiology: Lung Cellular and Molecular Physiology 285(1): L222-L231.
[031] Michalski, R., Bratko, I. and Kubar, M. (1998). Machine Learning and Data Mining: Methods and Applications, John Wiley and Sons, New York, NY.
[032] Midelfart, H. (2005a). Supervised learning in the gene ontology, Part I: A rough set framework, in J. Peters and A. Skowron (Eds.) Transactions on Rough Sets IV, Lecture Notes in Computer Science, Vol. 3700, Springer, Berlin/Heidelberg, pp. 69-97. | Zbl 1136.68493
[033] Midelfart, H. (2005b). Supervised learning in gene ontology, Part II: A bottom-up algorithm, in J. Peters and A. Skowron (Eds.) Transactions on Rough Sets IV, Lecture Notes in Computer Science, Vol. 3700, Springer, Berlin/Heidelberg, pp. 98-124. | Zbl 1136.68494
[034] Seghezzi, G., Patel, S., Ren, C., Gualandris, A., Pintucci, G., Robbins, E., Shapiro, R., Galloway, A., Rifkin, D. and Mignatti, P. (1998). Fibroblast growth factor-2 (fgf-2) induces vascular endothelial growth factor (vegf) expression in the endothelial cells of forming capillaries: an autocrine mechanism contributing to angiogenesis, The Journal of Cell Biology 141(7): 1659-1673.
[035] Sikora, M. (2006). Rule Quality Measures in Creation and Reduction of Data Role Models, Lecture Notes in Artificial Intelligence, Vol. 4259, Springer, Heidelberg, pp. 716-725. | Zbl 1162.68572
[036] Sikora, M. (2010). Decision rules-based data models using TRS and NetTRS-Methods and algorithms, in J., Peters and A. Skowron (Eds.), Transactions on Rough Sets XI, Lecture Notes on Computer Sciences, Vol. 5946, Springer, Berlin/Heidelberg, pp. 130-160.
[037] Stefanowski, J. and Vanderpooten, D. (2001). Induction of decision rules in classification and discovery-oriented perspectives, International Journal on Intelligent Systems 16(1): 13-27. | Zbl 0969.68135