C3 Bio Data Challenge

The Problem

Crystallisation is absolutely necessary for biological atomic visualisation.  Crystallisation is also however, expensive, tedious, and more likely to fail than not.

The Challenge

Find clues within large amounts of image data to fundamentally change our way of viewing biological experiment results.

The CSIRO Collaborative Crystallisation Centre (C3) (www.csiro.au/C3) is one of a small number of facilities around the world offering protein crystallisation services to external clients.  C3 is a global leader in both initial screening and extensive optimisation.  Aside from human inspection of each image, C3 have no method of evaluating the trials e.g. to identify crystal growth.  The current tools for automatic crystal detection have proven to be unusable and are often seen as hindrances rather than time saving mechanisms.

C3 have conducted roughly 2 million experiments (crystallisation drops) and have accumulated just over 24 million images as a result.   Images can be naturally clustered into ‘timecourses’.  Timecourses are defined as ‘many images of the same droplet captured over the course of months’.  Typically, 14 images would be produced within an experiment – i.e. 4 in week one, 2 in week two, then weekly till 14 is reached.

Each drop is not set up in isolation: sets of 96 drops are done for each unique sample, each drop may be different in terms of the crystallisation condition that the sample is exposed to.  During the time of the experiment, the drop may change size and shape, and the focal plane of the image may vary as well.

Much effort has been put into the analysis of crystallisation images around the world.  This is because an estimated 99% of experiments don’t produce crystals.  As a result Crystallographers are overwhelmed with the number of images collected and are incapable of viewing each image manually.  An estimated 25% or fewer of images are ever viewed.

We would like to achieve two main objectives:

1.  Assuming that only 10-25% of images are ever viewed, distinguish the best images

2.  Find similarities in our images e.g. distribution of a certain texture of precipitate, which can be used for ‘similarity mapping’

2a.  Compare similarity mapping to crystallisation conditions and identify a correlation.

The goal is not really to ‘find crystals’ as crystals are intrinsically a rare outcome, and by focusing on finding crystals we would ignore most of the data we have generated.

As far as we know, none of the existing analyses use timecourse information.  Timecourse is potentially an enormously powerful way of distinguishing which images are best.  Images which show change from the previous image in the timecourse are worth viewing.  To find significant change in sequential images the drop in the image must first be found, then images aligned in the timecourse.

Challenge data:  we will provide a data set consisting of one or more sets of 96 image timecourses.  Each image will have associated data:

  • The location of the drop in the image
  • The boundary of that drop 2
  • The area of the drop 3
  • An alignment of the subsequent images to the first image in a timecourse.

We look forward to seeing your eloquent solutions at the eResearch conference.

For queries or information regarding the competition, please feel free to post comments here 



Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s