Botanists disperse some ‘big data’ – University of Bristol Botanic Garden

Recently, Botanists at Trinity College Dublin launched a database with information that documents significant ‘life events’ for nearly 600 plant species across the globe. The database is the result of contributions from individuals working across five different continents, who compiled information on plant life histories for a near 50-year span, and is an example of big data.

What is ‘big data’?

Black pine (Pinus nigra), one of the species whose life
history data is part of the database, is seen against a
stunning backdrop of New Zealand. Credit: Yvonne Buckley.

In academic circles, the buzz-term across all disciplines seems to be ‘big data’, and it means exactly what it sounds like – a whole lot of information. More formally, of course, big data refers to data sets that are so large and complex that traditional methods of processing the information contained within them simply aren’t adequate. Big data draw upon many sources of information and represent a body of work that far exceeds what a single researcher, or indeed an entire research group, could gather in their careers.

While there are many challenges of working with big data – storing it, analysing, visualising it and ensuring its integrity to name a few – the benefits of working with such large data sets may make overcoming these challenges worthwhile. Repositories of such vast amounts of information can not only help foster collaborations, but they can be used to answer questions surrounding some of the most complex and pressing issues society currently faces, including climate change, food security, and mass species extinctions.

Of course, what is considered to be big data today will not be big data tomorrow as our management systems and computing capacity improve. This is the inevitable path of technological advancement; the Human Genome Project took over ten years (1990-2003) to sequence the human genome and now it can be done in a day for a fraction of the cost.

The importance of sharing knowledge

Plantago lanceolata at Howth Head, Dublin, Ireland – one of
the near 600 plant species that researchers have gathered
extensive life history data on. Credit: Anna Csergo.

The researchers at Trinity have made their database, called COMPADRE, freely available in the hope that other scientists access the information to advance their research. The size of the database means it can be used to help answer an infinite number of questions – such as how plant communities may respond to climate change or physiological processes that might provide insights into our own aging and health.

“Making the database freely available is our 21^stCentury revamp of the similarly inspired investments in living plant collections that were made to botanic gardens through the centuries;” said Yvonne Buckley, Professor of Zoology at Trinity’s School of Natural Sciences, “these were also set up to bring economic, medicinal and agricultural advantages of plants to people all over the world. Our database is moving this gift into the digital age of ‘Big Data’.”

The approach of free knowledge sharing is becoming more common and is a critical step toward resolving some of our biggest challenges. The University of Bristol’s Cereal Genomics Group has made the wheat genome along with hundreds of thousands of molecular markers freely available through their searchable database CerealsDB. These data can be used in wheat breeding programmes to develop new varieties of wheat that are more resistant to disease or droughts or produce higher yields.

Our best chance of overcoming some of the global challenges of the 21^st Century is to work together. Sharing knowledge through databases, such as COMPADRE and CerealsDB, will ensure every scientific contribution counts towards this united effort.