Meetings Stub Page [mx-stub]
Research Technology Services: The Data Challenge
7:30am – 9:00am Registration and Coffee
9:00am – 9:30am Welcome and Introduction
9:15am – 10:00am The Data Science Promise
Speaker: Mark Hahnel, CEO and Founder figshare
No industry can capitalise on the promise of 'data science' like academia, so what is the hold up? Openly-available academic data on the web will soon become the norm. Funders and publishers are already making preparations for how this content will be best managed. With the coming open data mandates meaning that we are now talking about ‘when’ not ‘if’ the majority of academic outputs live somewhere on the web, the big question now becomes, ‘what next?’ Through the power of linked open (accessible) data, the web should evolve in order to return more accurate data in response to any question that it is posed with. As the world largest driver of knowledge, the academic system should provide data to better answer queries at all stages of the learning and educational process. So where are we with this? What tools exist to achieve this and where will the stumbling blocks most likely appear from?
10:00am - 10:15am Break
10:15am – 11:00am Not Living the Big Data Dream: Real World Experiences at Petabyte Scales
Speaker: Chris Dwan, Director, Research Computing and Data Services, Broad Institute
The Broad institute is host to perhaps 18 petabytes of unique data that is housed on approximately 30 petabytes of usable file storage capacity. Living with data at this scale has required us to become expert at "the basics" of data management, since any change can take days to implement.
While file storage is necessary, it is far from interesting to our scientific research staff. We do not create data at this scale merely to archive it. Instead, we need to make our data available, in-context, to a global community of researchers and collaborators. This is driving us to adopt object storage technologies both on premise and in various public clouds. This is coupled with a drive towards virtualization (via Openstack) and containerization (via Docker) of services that will allow us to send compute tasks to the data, rather than the other way around.
All of this has demanded that we re-think the relationships between traditional IT, the software engineers, and the scientific methods developers.
This talk will be a retrospective on a year's journey through this technical landscape.
11:00am – 11:15am Break
11:15am - 12:00pm Service Challenges and Solutions
Speaker: Stephen Litster, Global Lead for High Performance and Scientific Computing, Novartis Institutes for Biomedical Research
The scientific research environment is becoming ever more complex and traditional IT models are beginning to break down under today’s challenges. The data deluge problem is just one area that is becoming more challenging as research activities shift from a local to global scale and collaborations increasingly span beyond our institutional “walls”. In this talk, Stephen Litster will attempt to highlight a number of these issues, how we are beginning to address them and more importantly seek ideas, answers and initiatives from the audience.
12:00pm – 1:00pm Lunch
1:00pm – 1:50pm Group Activity #1
Team discussions around two topics: “Service Broker vs Service Provider” and “On Premise vs Off Premise”
2:00pm – 3:00pm Panel discussion: Software systems for “Big Data”
Panelists:
Chris Dwan, Director, Research Computing and Data Services, Broad Institute
Jacob Farmer, Cambridge Computer
Mark Hahnel, CEO and Founder figshare
Kiran Keshav, Sr Director, Research Technologies, IT Services, Yale University
Stephen Litster, Global Lead for High Performance and Scientific Computing, Novartis Institutes for Biomedical Research
Lionel Zupan, Director, Research and Geospatial Technology Services, Tufts Technology Services, Tufts University
Moderator: James Cuff, Assistant Dean for Research Computing, Faculty of Arts and Sciences, Harvard University
3:00pm End