Big data

Chief Scientist Phil Reeves APVMA
2 August 2017
Phil Reeves, Chief Scientist

Big Data is a term used to describe huge amounts of both structured and unstructured data. This data is so enormous it can’t be processed using traditional database and software techniques.They are generally measured in the International System of Units (SI) in yottabytes.

But it's not the mind-boggling amount of data we need to worry about—it's that we, as scientists, must be “tech savvy” to know how to access and use big data. In this new age of measurement science we need to be able to select what we want from swathes of data sets and then interpret the material. We also need to be able to make our own research data available to others. Our ability to make the best use of big data requires us to develop new skills as data scientists, for big data will continue to grow exponentially.

Just 30 years ago we would have been researching most data in hard copy. Now the amount of material available online is almost overwhelming. There is so much data available on almost everything. Thankfully the new technologies that allow us to access all this information also allow us to select and store what we want.

Data management has become a highly specialised discipline in itself. Dealing with big sets of data, even large organisations face difficulties in being able to create, manipulate, and manage them due to the sheer scale of electronic “noise”. One can only imagine how many times scientific discoveries have been overlooked because they were hidden in enormous data sets.

We have had to develop entirely new hard and soft data systems to deal with all this material. We have come a long way from 1923 when Sir Joseph Thomson, the English physicist and Nobel laureate who discovered electrons and subatomic particles, founded the Journal of Scientific Instruments to log and describe measurement science. Interestingly, it is now named the Journal of Measurement Science and Technology.

Computer generated image of how data is interpreted showing data curves and peaks.