Eurexpress
A Transcriptome Atlas Database for Mouse Embryo
     Quick Search:    for   
Quick Help:    Sitemap      Using Context Help  
Home > Tools > Data Analysis > Eisen Cluster Program  

The Eisen Cluster Program

The Eisen "Cluster" program is available from the Eisen Laboratory web page and descried in a PNAS publication. It performs hierarchical, k-means and self-organising map (SOM) clustering generating output files that can be read into a number of visualisation tools, notably the JavaTreeView program. Here we have used this code to cluster the gene-expression patterns as described by the anatomical annotations. For this use the advanced query tool for generating the interaction matrices which can then be directly input to the cluster program. Here is and example where we have used the "Cluster" program available from the Human Genome Centre, Tokyo [direct link to software]:

Using Cluster software with EurExpress data

Access the EurExpress database using the advanced interface to generate the interaction matrix. Options to be set include gene-expression strngth values and propogation options. For this application select tab-separated values and include the Eurexpress assay ID and gene symbol as columns in the output matrix.

It is also possible to select sub-parts of the anatomical tree if data is to be compared within part of the embryo.

With the interaction matrix generated you can chek its content using a text editor. The first rows should be the anatomical IDs and names, the first columns should be the EurExpress assay ID and gene symbol. The interaction matrix itself is represented as numerical values and will typically be very sparse.

The .txt matrix file is read into the Cluster program. Typically there is no need to filter the data and clustering is done on the unmidified data. To provide a more colourful presentation of the matrix in treeview the matrix values could be normalisd. Here we show the hierarchical cluster tab with a number of parameters shown. We have found that clustering using correlation or Euclidean distance seems to be the most effective.

The number of rows corresponds to the number of assays (genes) submitted for clustering. The number of columns are the number of independent anatomical terms used for the given set of assays.

When the Cluster program completes the cluster files are automatically saved to disk with matching names to the original file (unless changed in cluster) but with different file extensions. These can be used as input to the Java Tree View program. Please the the JavaTreeView page for further detail on using this application.

Note this application is also acessible as an applet with in the browser window provided you have enables Java. This will load automatically with pre-calculated cluster analyses. See for example Cluster_080320.

For further information and detail on the file structure sna dn formats, guides for using the tools please see web-sites for the tools respectively.