Difference between revisions of "Data Engineering"

From PHUSE Wiki
Jump to: navigation, search
m (Objectives and Timelines)
m (Project Members)
(32 intermediate revisions by 2 users not shown)
Line 9: Line 9:
 
The aims of this project are two-fold. Firstly, to gather the myriad resources available on traditional methods of data engineering, to provide a breadth of knowledge that could immediately bring benefit to our existing clinical data estate. This will be achieved by curating and organising content into an easy-to-use structure (such as Wiki).  
 
The aims of this project are two-fold. Firstly, to gather the myriad resources available on traditional methods of data engineering, to provide a breadth of knowledge that could immediately bring benefit to our existing clinical data estate. This will be achieved by curating and organising content into an easy-to-use structure (such as Wiki).  
  
Secondly, to prepare us for the “big data tsunami” which is to arrive shortly in our sector, so that we can learn about the more thought-leading subjects in this area and help disseminate and share this information with the Data Science & AI/Machine Learning co-projects – a natural fit for these new types of data analysis – alongside the more tradition methods on the PhUSE Wiki.
+
Secondly, to prepare us for the “big data tsunami” which is to arrive shortly in our sector, so that we can learn about the more thought-leading subjects in this area and help disseminate and share this information with the Data Science & AI/Machine Learning co-projects – a natural fit for these new types of data analysis – alongside the more tradition methods on the PhUSE Wiki.<br>
 +
<br>
 +
 
 +
'''Terms of Reference:'''
 +
 
 +
*Team meetings frequency will be no more than fortnightly and no less than monthly<br>
 +
*Meeting agendas and minutes are stored in the projects Teamwork area (access to project members only) <br>
 +
*The Data Engineering section of the Educating for the Future Working Group [https://education.phuse.eu Squarespace] website will be maintained by admin volunteers and the co-leads. Any content for the website can be prepared by any member of the project and uploaded to our project space on Teamwork for review before uploading to Squarespace by the admins and co-leads
  
 
= Project Leads  =
 
= Project Leads  =
  
{| class="wikitable"
+
{| class="wikitable" style="width:100%"
 
|-
 
|-
 
| Guy Garrett || Project Co-Lead || Achieve Intelligence|| guy.garrett@achieveintelligence.com  
 
| Guy Garrett || Project Co-Lead || Achieve Intelligence|| guy.garrett@achieveintelligence.com  
Line 19: Line 26:
 
| Bev Hayes || Project Co-Lead || JNJ || bhayes2@its.jnj.com  
 
| Bev Hayes || Project Co-Lead || JNJ || bhayes2@its.jnj.com  
 
|-
 
|-
|Wendy Dobson || Project Manager || PhUSE || wendy@phuse.eu
+
|Wendy Dobson || PHUSE Project Manager || PHUSE || wendy@phuse.eu
 
|-
 
|-
 
|}
 
|}
Line 25: Line 32:
 
= Project Members  =
 
= Project Members  =
  
{| class="wikitable"
+
{| class="wikitable" style="width:100%"
 
|-
 
|-
 
| Amy Gillespie || Participant || Merck|| Amy_Gillespie@merck.com
 
| Amy Gillespie || Participant || Merck|| Amy_Gillespie@merck.com
Line 31: Line 38:
 
| Beate Hientzsch || Participant || HMS || Beate.Hientzsch@analytical-software.de
 
| Beate Hientzsch || Participant || HMS || Beate.Hientzsch@analytical-software.de
 
|-
 
|-
| Jagdev Bhogal || Participant || BCU || Jagdev.Bhogal@bcu.ac.uk
+
| Ralf Goetzelmann || Participant || Bayer || Ralf.Goetzelmann@bayer.com
 
|-
 
|-
 
| Mike Carniello || Participant || Astellas || michael.carniello@astellas.com  
 
| Mike Carniello || Participant || Astellas || michael.carniello@astellas.com  
Line 38: Line 45:
 
|-
 
|-
 
| Paul Slagle || Participant || Inventiv Health || Paul.Slagle@inventivhealth.com  
 
| Paul Slagle || Participant || Inventiv Health || Paul.Slagle@inventivhealth.com  
|-
 
| Sascha Ahrweiler || Participant || Bayer || Sascha.Ahrweiler@phuse.eu
 
|-
 
| Shaaz Ansari || Participant || Gene || ansari.shaaz@gene.com
 
 
|-
 
|-
 
| Vince Marinelli || Participant || MDSOL || Vmarinelli@mdsol.com
 
| Vince Marinelli || Participant || MDSOL || Vmarinelli@mdsol.com
Line 48: Line 51:
 
|-
 
|-
 
| Jatin Patel || Participant || Parexel || Jatin.Patel@parexel.com
 
| Jatin Patel || Participant || Parexel || Jatin.Patel@parexel.com
 +
|-
 +
| Sagar Jain || Participant || Independent || jainwave@gmail.com
 +
|-
 +
| Berber Snoeijer || Participant || Clinline || b.snoeijer@clinline.eu
 +
|-
 +
| Mohit Juneja || Participant || Lyfe Science || mohit.juneja@lyfescience.com
 +
|-
 +
| Rohit Banga || Participant || Lyfe Science || rohit.banga@lyfescience.com
 +
|-
 +
|Parag Shiralkar || Participant || Sumptuous || paragraph.shiralkar@sumptuous-ds.com
 +
|-
 +
| Renu Shukla || Participant || JNJ || RShukla3@its.jnj.com
 +
|-
 +
|Allison Covucci || Participant || BMS|| allison.covucci@bms.com
 +
|-
 +
| Andy Richardson || Participant || Independent || Andy.Richardson@phuse.eu
 +
|-
 +
| Susan Olson || Participant || EDJ Analytics || Solson@edjanalytics.com
 +
|-
 +
| Xiaohui Wang || Participant || Merck || Xiaohui_Wang@merck.com
 
|}
 
|}
  
 
= Project Updates =
 
= Project Updates =
Provide project updates in this section.
+
 
<br />
+
{| class="wikitable" style="width:100%"
Date: Description of Update
+
|-
<br />
+
| '''Topic''' || '''Presentation''' || '''Presenter'''
 +
|-
 +
| PHUSE EU Connect Frankfurt - 2018 || [https://www.phusewiki.org/docs/Frankfut%20Connect%202018/DH/Papers/DH02-dh02-19305.pdf Presentation on Data Engineering] || Guy Garrett & Bev Hayes
 +
|}
  
 
= Objectives and Timelines =
 
= Objectives and Timelines =
Line 76: Line 102:
 
|Material shared is not to endorse or prescribe, but is to present data engineering techniques in a manner that allows the audience to draw their own conclusion regarding the potential application in pharma || ongoing
 
|Material shared is not to endorse or prescribe, but is to present data engineering techniques in a manner that allows the audience to draw their own conclusion regarding the potential application in pharma || ongoing
 
|}
 
|}
 
= Project Activities =
 
This section can document project activities or serve as a jumping off point to other pages in the project.
 
<br />
 
 
= Meeting Minutes  =
 
<br />
 
 
= Archived Content =
 

Revision as of 01:42, 21 October 2019


Project Overview

Huge efficiencies have been made in BioPharma companies over the last few decades, notably in the area of data capture (with moving to eCRFs) business process improvements, and data standardisation efforts by CDISC.

However, the sector is increasingly competing on the basis of their analytical capabilities, which requires a centralised, combined, and, as much as possible, automated data environment to support these deeper insights. It is clear that data is R&D needs a major transformation; it is too siloed, fragmented and manually intensive, to be utilised effectively.

This project will explore how established Data Engineering techniques, successfully deployed in other industries, could be utilised in our industry. From traditional data warehousing; to the arrival of the big data lake; with data marketplaces; ePRO and IoT; the challenge is on – to identify analytical value from all of these disparate data sources.

The aims of this project are two-fold. Firstly, to gather the myriad resources available on traditional methods of data engineering, to provide a breadth of knowledge that could immediately bring benefit to our existing clinical data estate. This will be achieved by curating and organising content into an easy-to-use structure (such as Wiki).

Secondly, to prepare us for the “big data tsunami” which is to arrive shortly in our sector, so that we can learn about the more thought-leading subjects in this area and help disseminate and share this information with the Data Science & AI/Machine Learning co-projects – a natural fit for these new types of data analysis – alongside the more tradition methods on the PhUSE Wiki.

Terms of Reference:

  • Team meetings frequency will be no more than fortnightly and no less than monthly
  • Meeting agendas and minutes are stored in the projects Teamwork area (access to project members only)
  • The Data Engineering section of the Educating for the Future Working Group Squarespace website will be maintained by admin volunteers and the co-leads. Any content for the website can be prepared by any member of the project and uploaded to our project space on Teamwork for review before uploading to Squarespace by the admins and co-leads

Project Leads

Guy Garrett Project Co-Lead Achieve Intelligence guy.garrett@achieveintelligence.com
Bev Hayes Project Co-Lead JNJ bhayes2@its.jnj.com
Wendy Dobson PHUSE Project Manager PHUSE wendy@phuse.eu

Project Members

Amy Gillespie Participant Merck Amy_Gillespie@merck.com
Beate Hientzsch Participant HMS Beate.Hientzsch@analytical-software.de
Ralf Goetzelmann Participant Bayer Ralf.Goetzelmann@bayer.com
Mike Carniello Participant Astellas michael.carniello@astellas.com
Mark Bynens Participant JNJ Mbynens@its.jnj.com
Paul Slagle Participant Inventiv Health Paul.Slagle@inventivhealth.com
Vince Marinelli Participant MDSOL Vmarinelli@mdsol.com
Vijay Pasapula Participant Gilead vijay.pasapula@gilead.com
Jatin Patel Participant Parexel Jatin.Patel@parexel.com
Sagar Jain Participant Independent jainwave@gmail.com
Berber Snoeijer Participant Clinline b.snoeijer@clinline.eu
Mohit Juneja Participant Lyfe Science mohit.juneja@lyfescience.com
Rohit Banga Participant Lyfe Science rohit.banga@lyfescience.com
Parag Shiralkar Participant Sumptuous paragraph.shiralkar@sumptuous-ds.com
Renu Shukla Participant JNJ RShukla3@its.jnj.com
Allison Covucci Participant BMS allison.covucci@bms.com
Andy Richardson Participant Independent Andy.Richardson@phuse.eu
Susan Olson Participant EDJ Analytics Solson@edjanalytics.com
Xiaohui Wang Participant Merck Xiaohui_Wang@merck.com

Project Updates

Topic Presentation Presenter
PHUSE EU Connect Frankfurt - 2018 Presentation on Data Engineering Guy Garrett & Bev Hayes

Objectives and Timelines

Initialize project & build team 01 Febuary 2018
Pick initial topic areas 01 March 2018
Gather & curate information 01 June 2018
Produce papers/Posters for EU Connect 01 November 2018
To facilitate education of the pharmaceutical industry on data engineering solutions successfully implemented in other industries ongoing
Topics should relate to current and foreseen challenges in pharma, e.g. vast data streams, big data, real time learning and decision making ongoing
Audiences may include but are not limited to, data managers, statistical programmers/analysts, IT experts, executives ongoing
Material shared is not to endorse or prescribe, but is to present data engineering techniques in a manner that allows the audience to draw their own conclusion regarding the potential application in pharma ongoing