Christopher W. Fletcher

Systems Biology

Project year: April, 2009 - August, 2010
Team Members: Narges Bani Asadi*, Eric Glass*, Stanford Nolan Lab*, Berkeley Reconfigurable Computing Group; *: Stanford University
Updated: June 1, 2009

The Systems Biology project is a cross-university effort between Berkeley and Stanford to learn Bayesian network structure on FPGAs.

The project's motivating application is to learn the structure of signal transduction networks (STNs). Broadly speaking, STNs are a cell's information highway: carrying and amplifying signals, detected on a cell's membrane, to cellular components such as the nucleus or mitochondria. Small differences between individuals' STNs have been correlated to clinical outcomes (which is to say that through a better understanding of STNs, we can better predict drug/treatment effectiveness across different patients). Traditionally, STNs were believed to be linear and independent chains of proteins from cell membrane to cell nucleus. This view has recently given way to STNs being more complex networks, where any protein can potentially impact any other protein. This project's goal is to build models of these more complex protein networks from quantitative (flow cytometry) data taken from a cell. The hope is that Biologists can use these models to better understand the causal relationships and conditional independences between different proteins in the network, and thus be in a better position to treat patients suffering from cancer or other diseases.

This project models STNs as Bayesian networks. Bayesian networks are an elegant notation for showing relationships and independences between parameters (proteins) in a vertex/edge network. That said, learning Bayesian network structure from flow cytometry data is an NP-hard problem. To tractably find the Bayesian network that best represents the implicit STN, we use Markov chain Monte Carlo sampling to perform a random walk in the space of graph orders. This approach exhibits an incredible amount of both course and fine-grained parallelism, which motivates our FPGA implementation.

At a high-level, the FPGA implementation strives to map the network of interest directly onto the FPGA fabric, parallelizing the MCMC kernel loops through replicating hardware data paths and hardware threading to maximize throughput. Each data path is coupled with FPGA block RAM FIFOs, which supply data generated (originally) from the flow cytometry process. Thus, the design's theme is to best utilize the FPGA's enormous potential block RAM bandwidth and gate-optimize algorithm reduction steps. To meet different networks' requirements, we have scaled this base design from one FPGA, to four FPGAs (one BEE3 board), and finally to multiple BEE3 boards.