DEPARTMENT OF EPIDEMIOLOGY, STATISTICS AND INFORMATICS (DESI)

BIOMEDICAL RESEARCH DATA RE-USE & MACHINE LEARNING

Who are we?

The Department of Biostatistics and Bioinformatics (DBB) has been established in response to KEMRI’s mandate, vision, mission, strategic theme 1 on Research and Innovation, and strategic theme 3 on Research Infrastructure. This establishment is supported by strategic objectives 1: to strengthen investment in health Research and Innovation and 3: to upgrade research infrastructure and automate processes, as outlined in the Strategic Plan of KEMRI for the period 2018 to 2023. Recognizing the Volatility, Uncertainty, Complexity, and Ambiguity (VUCA) of the research enterprise and global environment, the Department has been redesigned and established as a Centre of Excellence in accordance with institute policies, laws of Kenya, and international guidelines, to be self-adaptive. The establishment of this department took into consideration benchmark standards including national, regional and global comparisons as well as KEMRI’s growth needs for the years to come.

KEMRI’s Large Pool of Research Data

There exists “massive” research data within the KEMRI’s research ecosystem. Every year, KEMRI research scientists generate a large pool of primary data from the many ongoing projects in diverse fields of biomedical research. Beyond the primary data use, there is a growing school of thought that these strategic institutional datasets could be re-used to provide insights and trends for modeling and utility purposes. This necessitates the need to establish a centralized in-house research data-hub from where, data can be pooled from for secondary use. Thus, the use of data for reasons other than originally intended is broadly termed as secondary use or, more appropriately, re-use of data. The re-use of health related / epidemiological data is not new and has been the basis for advancement of health systems strengthening. This is made possible by deploying High Performance Computing (HPC) using Application Programming Interface (API) modules, Machine Learning (ML) algorithms and coding for Artificial Intelligence (AI).

DESI strategic position to support modeling and gain new insights through re-use of biomedical research data

Our Strategic Objective

To create a scalable cutting-edge data science platform for solutions that promote innovation and effective solutions and decision-making in a rapidly changing research world.

Initial DESI Activities

The 1^st KEMRI Data Re-use Hackathon Challenge 2022

To register; https://forms.gle/W6nfctkJL1QC2G9j8

The 1^st KEMRI Data Re-use Hackathon Challenge 2022

· DESI plans to host the 1^st KEMRI Data Re-use Hackathon Challenge in the month of November 2022.

· It is open to students from public and private universities who are passionate about the use of computer apps, tools, techniques or packages to solve societal problems.

· RULES:

a) There is no limit on the number of submissions. The teams are allowed to make as many submissions as they like until the submission deadline date. If there are multiple submissions, the one with the highest accuracy score will be considered your best and final submission.

b) No submissions will be accepted after the deadline.

c) The teams can use any technique they are aware of. There are no restrictions as far as any techniques or tools or data coding platform are concerned.

d) The winners of the hackathon will be announced by 31^st December 2022.

e) With this being a team hackathon, the maximum number of team members is 3 per team irrespective of the university.

f) Participants will ONLY use the data provided by the KEMRI Judges. The teams are not allowed to use any external data to train and test their model.

g) The winners of the 1^st KEMRI Data Re-use Hackathon Challenge 2022 will be presented with a trophy during the 13^th KASH Conference to be held in February 2023. It is envisaged that the winning team will make a plenary presentation during the conference. Posters of all winners and runners-up will also be displayed during the 3-day conference.

Potential Benefits from this initial activity

1. Demonstration of high-performance computing (HPC) capabilities to support re-use of existing biomedical data.

2. Data visualization and analysis platform, to accelerate knowledge extraction from secondary data to surveillance and prediction of events based on available epidemiological and surveillance data.