Medical research remains largely focused on disease striation, but its quickly moving towards personalisation. Consequently, medical research data is being produced at a mind-boggling rate. Innovation in sample analysis has seen a surge in recent years; so much so affiliated domains are struggling to keep up.
Biomedical analysis techniques that measure thousands of parameters simultaneously, places a huge burden on those that manage the data. It is generally accepted now that data collection will become even more extreme with the advent of validated personal health monitoring applications. Instead of hundreds or thousands of clinical study participants, we will see millions of participants.
How can this volume of medical research data be managed effectively and safely?
There are six steps:
- During the development of a research proposal, plan the management of your data in collaboration with experts in data management.
- Generate the data methodically and according to an accredited data management plan.
- Standardise and store the data using a platform that provides a framework that enables the efficient import, export and sharing of data.
- Secure the data.
- Provide access to the data.
- Study the data.
eTRIKS facilitates all of these requirements.
At the heart of eTRIKS services stands Work Package One lead, Ghita Rahal of the Centre National de la Recherche Scientifique (CNRS) French National Computer Centre -IN2P3 in Lyon. Ghita contributes to the management of the data produced by high-energy physics, including those produced by the CERN hadron collider. Recently Ghita has adapted her skills to accommodate biomedical research.
Ghita explains that high-energy physics has been the principle focus of CC-IN2P3. Decades ago, it was realized in France and further afield that the huge amount of data produced by the next generation colliders could only be supported by large-size, mutualized and sophisticated data-hosting facilities. As such, the French national computer centre was built with the role to provide computing power and storage that would accommodate data heavy fundamental research. A lot of knowledge was acquired during the subsequent two decades and so it was felt that CC’s role would extend to other sciences, for example, computing in biomedical research.
“CCIN2P3 must evolve rapidly to keep up with high-energy physics, and so cutting edge technology and expertise is essential – we provide this technology and expertise to eTRIKS supported projects.”
Our facility holds four robotic tape libraries that contain 40,000 magnetic tapes and large amounts of disk storage for fast analysis. We provide long-lived storage, back up and archival capability. We understand that researchers need their data to be constantly available, batched, secured and hosted to enable sequential and/or parallel processing. Our centre belongs to the world-wide grid infrastructure for many virtual organizations but we accommodate to provide each group with its own solution, from cloud to high throughput computing, and from high powered computing to hosting. The computing centre also fulfils the most stringent requirements concerning electrical redundancy, and infrastructure services.
We make petabytes of space available but to face the hard challenges of the present and upcoming needs in big data, we have already prepared the future by building a new computing room that will reach its full occupancy by 2019. CCIN2P3 have been part of the eTRIKS project since its conception, and we provide our capability to all eTRIKS supported projects.
In our capacity within eTRIKS work package 1 (platform service delivery) we provide a platform to enable researchers to analyse their data as efficiently as possible. Using the best suited tools and applications that can be found on the market as well as those that we develop, we focus on a robust and reliable platform with high power and storage to host project data. We make sure that the processes we use to manage data are secured and that any requested level of confidentiality is respected. We provide our finest experts at the computer centre to design a service platform that enables flexibility and reliability using cloud technology.
We listen to the requirements requested within the communities we work in, and we translate those requirements into reality, whilst working with and respecting national and international legislation on data management.
eTRIKS provides a great deal of guidance in all facets of data management.
We constantly communicate with our supported projects and help users in any way we can to enable them to do the work they wish to do. Our experience in high energy physics has taught us that break-throughs and discoveries need cutting-edge detectors and computing infrastructure. We apply this paradigm to bio-computing. eTRIKS has put together a suite of expertise that are able to address issues such as data curation, standardization, legal challenges and analysis.
In translational research, integrating data from different origins with different contexts from large samples can only be done if a quality computing platform is available. With our breadth of expertise and technology we can address most challenges. We are more than capable of accommodating the needs of most European projects that focus on systems biology.
Secure, open access to data is the ideal
We are particularly proud of the eTRIKS public server that we host. We have now eighty medical research datasets available on the public server. We use this data for purposes of curation and platform training.
We believe that to get the best out of clinical research data, it needs to be publically available. This will enable the wider community to combine public datasets with newly produced work. This provides a much richer study experience that leads to more valuable outcomes and hypothesis generation.
Ultimately we see this as the future of biomedical research. Unfortunately, to date none of the projects that have joined eTRIKS have provided data that can be shared publically. In short this prevents the best that translational research can offer. What we have on the public server has value in terms of training. We have trained well over 300 people to help manage their data, all made possible because of the platform we set up in eTRIKS.
Open access to innovation and development is the future
More recently CCIN2P3 has been hosting a growing variety of eTRIKS “labs” and analytical developments. eTRIKS and WP1 expertise provide their capacity of infrastructure and knowledge to help new bright bioscientists to develop their own applications or develop on top of existing ones, and to share them with the community. These applications include SmartR, Disease Maps, a new Data Integration platform and a SHINY server to aid future developments. Collectively we call this the eTRIKS labs. The public has direct access to eTRIKS labs through eTRIKS.org website and the CC-IN2P3 eTRIKS portal.
eTRIKS Labs enables users and developers alike to work openly and together on our latest platform and analytical developments.
Previously we provided a robust production platform to any kind of project using one application (tranSMART). eTRIKS labs sees public access to our new developments establishing a co-creative experience for both interested users and our developers. Our users can watch in real time the progress eTRIKS developers make on their new applications, providing users the opportunity to contribute to developments and make requests.
CCIN2P3 provides a test version of the eTRIKS platform as well as maintain our own robust production platform. Each new lab comes with a new implementation. More often than not, developers do not have systems administrative skills, which means that deployment of our applications requires very close collaboration between our system administrators and developers. This is all possible within eTRIKS and we are very capable of accommodating a variety of new software.
But what we would really like to see is more projects joining eTRIKS as we have the resources capable of helping a wide range of project requirements. Remember the eTRIKS mantra:
Integrated and explorable data are valuable data.
However, we must first protect the donors of the data with the level of data protection they require. This is our first and foremost consideration. We do recognize that different disease domains have different requirements. If the medical research project participants want quick escalation of data analysis, requiring less protection but maximum research exposure, this is possible. If they want total protection of their data, this is also very possible. We are also open to any view that falls between those two extremes, so that the best can be made of medical research data whilst staying true to the view of those who most need it.
eTRIKS is in a position to accommodate innovation in biomedical sample analysis and the data produced, regardless of the volume. Enabling all affiliated domains to manage and study their data effectively, and in a timely manner.