In the next several posts, I’d like to discuss a bioinformatics project I’m involved with that takes advantage of the enormous amounts of research data that’s already available in the public domain. My collaborators and I have pulled together an RNA expression dataset consisting of 5,000 genes identified as dysregulated in 8 global gene expression studies looking at papillary renal cell carcinoma (pRCC). From these genes, I was able to cluster them based on their normalized expression patterns in 17 different comparisons and draw out biological networks specific to pRCC subtypes and survival outcomes. I will also demonstrate how we can correlate all these data to create expression signatures for single genes and functionally across pathways for sets of genes. Furthermore, I will demonstrate how to use this information to prioritize compounds, genetic perturbations and upstream regulators that affect this pattern and hopefully contribute novel insights to diagnosing, targeting and changing the course of disease.
Inspiration for this work on papillary kidney cancer came from an upcoming ‘Hackathon’ in May that brings together researchers, engineers and computer scientists to try to tackle challenging problems in life sciences. This year they are focusing on papillary renal-cell carcinoma type 1 (p1RCC), a disease that accounts for between 15 to 20% of all kidney cancers. Little is known about the genetic basis of sporadic papillary renal-cell carcinoma, and no effective forms of therapy for advanced disease exist. We felt that a rare and understudied disease like pRCC could benefit from using a ‘wisdom of crowds’ approach to learn as much as we can about how genes behave and interact in different forms and comparisons involving this type of cancer.
In the first installment, I will give a little background on papillary kidney cancer (from a bioinformatics perspective) as well as describe the pRCC dataset. I will also discuss how we used Illumina’s Correlation Engine to collect and combine data from 8 different genomic studies that looked at comparisons using human papillary renal tumors against normal kidney tissue or other forms of the cancer. I will then go on to discuss how we can use a statistical platform, which in this demonstration will be Partek Genomics Suite, to cluster and visualize this information to discover groups of genes acting together to promote/inhibit different forms of the disease. I will then discuss using bioinformatics networking platforms STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) and Elsevier’s Pathway Studio to derive biological meaning and interactions from these clustered genes. Finally, I will show how you can use Correlation Engine to look for compounds and/or genes that might be used to manipulate a group of genes of interest, and Cohort Analyzer (also from Illumina) to examine the clinical relevance of these findings.
It is my hope that these posts and videos might encourage others (especially people who love biology, with or without deep coding experience) to go out and search for cures for all kinds of diseases in the vast amount of research studies that are already at our fingertips. Even though I demonstrate using tools that I find are easy to use for the non-programmer, all the data mining, statistics and bioinformatics can be done in other ways with many different types of platforms and programs. I will also say that you don’t need complicated algorithms and machine learning to find interesting things, and just visualizing all this information in one place will spur discovery. I was amazed with what I found by combining all of this information and I look forward to sharing it with the research community.
If you’d like to be notified when these posts and instructional videos in this series come out, please sign up to my website and/or my YouTube channel, “Michael Edwards Bioinformatics”. I am also available for a free consultation to discuss specific bioinformatics or data analytics projects as a paid service.
-Michael Edwards PhD, Bioinfo Solutions LLC