Design and implementation of a knowledge graph-based framework for integrating multi-omics data with prior biological knowledge into dynamic trans-omic networks. The approach separates biological entities and quantitative evidence into distinct node types, enabling provenance tracking, semantic versioning, and incremental updates, and couples this with a semantically-enhanced GNN engine for large-scale systems biology and precision medicine applications.
Key Technologies: Property Graph Databases (AQL), Knowledge Graphs, Graph Neural Networks, Multi-Omics Integration, GO/CHEBI/RO Ontologies Status: Framework design, implementation, and validation on TCGA cohorts
Implementation of GNN methods (MOGONET, MOGAT, MPK-GNN) that integrate biological prior knowledge into graph architectures to improve multi-omics analyses such as cancer subtype classification. Exploring multimodal integration on extended graph layers, including TF–target, protein–protein, and miRNA–target interaction networks.
Key Technologies: Multi-modal Graph Learning, Prior Knowledge Integration, Omics Data Fusion Status: Methodology development and benchmarking
Development and application of GNN architectures (GCN, GAT, CompGCN) for biological data analysis. Focus on node classification, link prediction, and graph-level tasks using standard benchmarks (Cora, Cornell, Chameleon) and biological datasets.
Key Technologies: PyTorch Geometric, Deep Graph Library (DGL), NetworkX, Graph Neural Networks Status: Ongoing research at Cambridge University
Development of comprehensive knowledge graphs spanning multiple species for drug repurposing applications. Using GCN and CompGCN architectures for link prediction and drug-target interaction discovery.
Key Technologies: Knowledge Graph Embeddings, Cross-species Data Integration, Drug Repurposing Algorithms Status: In preparation for publication
Implementation of LLM-driven knowledge graph construction and retrieval-augmented generation systems for biological data. This project combines large language models with structured biological knowledge to enhance information retrieval and reasoning.
Key Technologies: Large Language Models, Knowledge Graph Embeddings, RAG Architecture, Neo4j Status: Active development with applications to nutrigenetics and general biological datasets
Advanced computational methods for single-cell RNA sequencing data analysis using graph-based approaches. Integration with tools like Seurat and development of novel network-based methods.
Key Technologies: scRNA-seq Analysis, Seurat, Graph-based Clustering, Cell Type Classification Status: Active research collaboration
Computational strategies for constructing reference datasets of nutrition-associated genetic polymorphisms. Application of graph-based methods for personalized dietary recommendations.
Programming Languages
Graph Learning Frameworks
AI/ML Frameworks
Bioinformatics Tools
Data Management
Visiting Researcher - Department of Computer Science, University of Cambridge
Contribution to the development of a Large Language Model-powered chatbot for RNASeq expression data analysis, making transcriptomic analysis more accessible to researchers.
Technologies: LLMs, Transcriptomics, RNA-seq Analysis Status: Delivered and operational
Development of Graph-based Retrieval-Augmented Generation systems applied to biological datasets, with potential for journal publication and scalability to larger datasets.
Technologies: GraphRAG, LLMs, Knowledge Graph Construction Status: Presented at workshop, under development for journal submission
Implementation of knowledge graph construction spanning multiple species with Graph Convolutional Networks (GCN) and Composition-based GCN (CompGCN) for drug repurposing applications.
Technologies: Cross-species KG, GCN, CompGCN, Link Prediction Status: In preparation for publication