Like gold, the form that data takes directly affects the ease in which you can extract, export and form it into something more valuable. The more data you have the greater the potential value, and everywhere the eye can see data prospectors are feverously extracting volumes of data from patients. We are living through a data rush, and when you stand still and watch the rush an incontrovertible truth emerges; transforming gold nuggets into bars is easy; semantically integrating different forms of health data to create true wealth of knowledge is not.

This is translational research

There are many clinical research studies taking place globally, and each produces large quantities of data. To get the best out of each dataset we need to align them and find patterns that lead to improved treatments for patients. The more data you align, the greater the chance a pattern will be found. However, before you can harmonise data you must first ensure a certain level of data quality through curation/standardization, so that it can be meaningfully integrated with all other available data. Aligned data enables high quality: data storage, access, extraction and analysis. It is this type of data management that provides the fertile soil for the development of new treatments.

Standardising data is challenging, standardizing all health data is socially, technically, economically and politically very challenging. However, non-standardised data runs the very real risk of becoming old and forgotten. It will get lost.

Who is best placed to address the data standards issue

The European Medical Agency and the Federal Drug Administration work closely with organisations that establish data standards; for example the International Organization for Standardisation (ISO) and the International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH). These groups provide a data standards regulatory framework through consultation with key pharmaceutical companies. The JIC was setup in 2007 to interface six of the main SDOs in the industry to determine what projects fall with each SDO and how standards can be integrated ‘end to end’ to provide seamless integration from data capture to analysis. To a lesser extent, publishers also play a role in data standardisation by providing stringent requirements to be met prior to publication in prestigious journals.

Clinical and Omics data

For clinical data types (surgery, laboratory and hospital analysis), Standard Development Organisations (SDOs) such as CDISC and HL7 work with the regulatory agencies and have to date, provided a wealth of standards. The standardisation of the omics field (proteomics, lipidomics, transcriptomics, metabolomics, etc.) is also rapidly evolving, but standardization efforts here are largely driven by communities of interest rather than formal SDOs. Individual initiatives are often funded as part of research grants, which have finite life times. When these projects conclude, ongoing support of the standards developed is difficult to obtain. Continued funding to maintain and evolve these standards is hard to achieve due to the loss of novelty, and as a consequence initially valuable standards can quickly become neglected and outdated.

How much data has been standardized?

Perhaps 10% of the world’s health data is standardized, but this figure could be optimistic. There are data standards already available and recommended, for example the multiple suites of clinical trial standards recommended by CDISC. The adoption of these standards by the pharmaceutical industry is rapidly progressing while the pace of adoption by academic institutions is picking up.

What makes the uptake and implementation of data standards so slow?

Landscape Blockers – Complexity of the environment is a key roadblock, there are tens of thousands of groups producing health data for many different purposes. Ontologies (structured descriptions of knowledge domains) typically have a defined purpose and scope and work well within this scope. However, there is frequently a need to adapt/modify/extend existing ontologies and terminologies for new applications, and in the absence of rigorous yet efficient processes to adjust existing terminologies, the creation of new competing/overlapping terminologies frequently becomes a short term solution which ultimately compounds the difficulties in the strive for broader improvement of data quality and standardization.

Technical Blockers – the standards themselves need to be defined, agreed, and adopted., and be machine-readable. There are no specific blockers here as the technology is available. Technology today is largely modular, and interestingly it’s the shear breadth of available technology that has become the blocker. The specific issue here is one of module compatibility. To apply standards efficiently you need good quality expertise to curate the data and of course time. A highly modular environment requires vast efforts to link through curation. This is counter productive, and we need to slim down the use of modules.

Automated curation is something being worked on, but this will never be 100%, as language is too complex and new data types are constantly being produced – there will always be a need for a human hand. Currently there is a lot of work being done on vast volumes of legacy data. This requires a lot of focused resource, again preventing timely progress in the field.

Mind blocks – I think the penny has dropped but the commitment in the uptake of standards is lagging behind due to many organisations and institutions fearing the costs but not fully appreciating the time savings and efficiency gains in the longer term. It is a complex business with a rapid learning curve but with many expensive legacy data standardization projects being carried out the value of historical data is not being ignored and the advantages and benefits of standards has long since begun with many governments and foundations leading the way.

The most expensive miscomprehension is that standards are there to retrofit existing data to. The standards are designed right from the point of data gathering and clinical trial protocol design to the data analysis. The cost savings in application of standards all the way through the lifecycle of the data far outweigh the costs and you are also then creating data that can go on to be compared with similar studies giving greater gains from these valuable medical studies; studies where patients have often given their blood, some sweat, but hopefully not any tears for the greater good.

How does eTRIKS contribute to the data standards landscape?

Work Package 3 of the eTRIKS project (standards research and coordination) is lead by Michael Braxenthaler of Roche, Paul Houston of CDISC and Philippe Rocca-Serra of the University of Oxford. Their team works with global standards organisations on new standards development, and they frequently engage a world class advisory board ensuring a cutting edge view on the data standards landscape. The team establishes and maintains interaction with IMI and non-IMI translational research projects concerning their data standard and data interchange requirements. Through CDISC inter alia, they promote gold standards for translational research knowledge sharing, and these standards are being applied by eTRIKS client projects.

We also provide recommendations for omics standards. The current efforts are not about enforcement but rather recommendations as a guideline for standards decisions. With the recent inclusion of the University of Oxfords e-Research Centre as an eTRIKS partner with highly valuable expertise around biomedical standards we are now in a position to not only select standards for recommendations but also begin to provide critically important tools to support the application of terminology standards in the experiment and trial design phase as well as in the data acquisition and curation phases.

Finally, we are defining a metadata registry and repository approach to enable consistent application of standards. In close collaboration with other work packages of eTRIKS, namely the curation work package (WP4) and the tranSMART platform development work package (WP2) we are focussing on ensuring the highest possible quality of translational data as a basis for progressing the development of new treatments and options to improve patients’ lives.

March 7, 2017

Paul Houston and Michael Braxenthaler – Quality data management translates into quality treatment

Like gold, the form that data takes directly affects the ease in which you can extract, export and form it into something more valuable. The more data you have the greater the potential value, and everywhere the eye can see data prospectors are feverously extracting volumes of data from patients. We are living through a data rush, and when you stand still and watch the rush an incontrovertible truth emerges; transforming gold nuggets into bars is easy; semantically integrating different forms of health data to create true wealth of knowledge is not. This is translational research There are many clinical […]
October 18, 2016

Bartha Knoppers – Complex ethical and legal questions on data use are increasingly pertinent.

When Darwins theories of evolution were substantiated by Mendel and Correns society found itself having to rapidly adapt to bioethical questions posed by these advances. Should we be tampering with nature? Can we choose the type of community we have? Earthquake moments kept coming; the discovery of chromosomes, genetic modification and recent advances in information technology, data storage and analysis. We are forced to ask questions on data privacy, security and sharing against a background of ever expanding data availability. Informatics and analytical technologies allow us to explore the link between our genes and disease on an industrial scale. Vast […]
May 6, 2016

Chris Marshall – Hacking through the thorny challenge of establishing project-to-project support and data sharing agreements.

The IMI (Innovative Medicines Initiative) drive large public private partnerships (PPPs) to work collaboratively towards the development of new and effective medicines. IMI PPPs such as U-BIOPRED, ABIRISK and OncoTrack produce vast amounts of data that with the right management will enable development of new therapeutic approaches for diseases such as cancer and asthma. However to do this, PPP data has to be shared. Intellectual Property matters Data in its various forms need to be formatted, shipped and stored to enable analysts to get the best they can out of it. This requires specialist help that compel owners to share […]
March 23, 2016

Angus McAllister – Protecting your data from harmful threats

Learn to protect your data. Health data owners assure patients through contractual agreements that the data drawn from samples that patients provide will be protected and unavailable only to a few specialised experts. Having spent millions on accumulating that data, it would be professionally negligent as well as ethically unacceptable if data owners process patient information insecurely. EU law reflects this view by applying harsh penalties to those that don’t respect health data protection rules. High fines coupled with the prospect of very damaging media attention could irreversibly affect public confidence in companies who promise to protect patient information, but […]