Genomics analysis is at the front of the fight against triple-negative breast cancer. However, not all populations are equally impacted

9 min readMay 6, 2022

Precision oncology, or the approach of customizing cancer treatment to an individual’s genetic makeup, has been touted as the next frontier in oncology. Genetics is already being used to direct clinical decision-making in breast cancer (BC), and the growing usage of biomarkers for personalized cancer therapy is expected to propel the precision medicine market to $119 Billion by 2026. Among the different breast cancer subtypes, triple-negative breast cancer (TNBC), which is more prevalent among younger age women, is the most aggressive form. The proportion of TNBC ranges from 6.7% to 27.9% in different countries, with one of the highest reported percentages being in Indian population [1]. Given the poor prognosis of this breast cancer subtype and the lack of targeted therapies, it becomes essential to identify predictive markers to facilitate the development of personalized medicine for TNBC.

In order to address this problem and to advance precision oncology for TNBC, EURECOM’s Professor Raja Appuswamy [2], an expert in cloud computing, Dr. Nimisha Chaturvedi from ACCELOM [3], an expert in multi-omics data analysis and Dr. Jugnu Jain and Dr. Soma Chatterjee from SAPIEN Biosciences [4], one of the 10 largest biobanks in the world, based in India, have come together to start a project which will develop a cloud-native, genomic data analysis platform and use it for analysing multi-omics data generated from TNBC tumour samples of Indian ancestry.

Q. Why is genomics important for cancer diagnosis and therapy, especially for TNBC?

ACCELOM: With recent technological advances, we now know that cancer is a group of diseases caused by genetic alterations. Mainly, the genes involved in cancer can be divided into 2 classes: the proto-oncogenes, which help cells grow, and the tumour suppressor genes, which slow down cell division and cell growth. The genetic mutations in these two types of genes can disrupt the normal cell growth and death cycle, causing uncontrollable cell growth and eventually tumour formation. For example, mutations in the tumour-suppressor genes BRCA1 and BRCA2, have been linked to a much higher risk of breast, ovarian and prostate cancer. Similarly, HER2-positive breast cancers involve a mutated HER2 oncogene, which produces a protein that increases the growth of cancer cells. Identifying these cancer-causing mutations can not only help us in understanding the biology of the disease but this knowledge can also be used for developing new targeted diagnostic and treatment strategies.

Another way in which genomics can be a very useful tool to combat the burden of this disease on our society is to understand the family cancer syndrome. Just as you inherit your genes from your parents, you can also inherit gene mutations from them. Knowing more about our inherited genetic mutations can tell us if we are more likely to develop certain diseases, such as breast cancer. A famous example is Angelina Jolie, who underwent prophylactic double mastectomy and hysterectomy after being found to carry the BRCA1 gene mutation, which put her chances of developing breast and ovarian cancer at 87% and 50% respectively.

One of the most debilitating breast cancer subtypes is Triple Negative Breast Cancer or TNBC. TNBC is a highly heterogeneous disease and women with TNBC have a worse 5-year prognosis than women with other BC subtypes. Therefore, finding targeted treatment approaches for this aggressive disease is of utmost importance. Moreover, the alarming burden of TNBC in Indian population makes the discovery of actionable genomic targets an extremely important endeavour, to address the lack of targeted therapies for TNBC.

Q. What is the current approach for treatment right now for TNBC especially in Indian populations?

SAPIEN: TNBC is a devastating cancer with extremely poor prognosis compared to other types of BCs. Lacking hormone receptors, ER and PR, as well as HER2 amplification, these cancers cannot be treated with modern, targeted therapies such as tamoxifen or trastuzumab that have been used successfully with other BC subtypes. The current standard of care for TNBC patients in India is generalized cytotoxic chemotherapy with varied clinical response. However, all patients experience unwanted and debilitating side effects regardless of response. Researchers have already demonstrated that a genomics-driven precision medicine approach is a promising direction for managing TNBC. However, it is difficult to apply such an approach in India, as current precision medicine approaches are predominantly based on genomic data from Caucasian or African ancestry. Very little data has been collected and published when it comes to the genomics of TNBC in Indian population. Given the differential genetic mutations between different ethnic populations, the lack of diversity in genetic research in Asia, and bias in representation the underlying knowledge of precision medicine is missing out on critical Indian population-specific genomic variations if they have a low frequency or are absent in Caucasian or African populations. This can not only hold us back from understanding the genomic landscape of such a deadly disease in different populations but will also exacerbate existing disease and healthcare disparities.

Q. Could you tell us more about this project and your partnership?

EURECOM: Given the growing burden of TNBC in India, the goal of project SIGO is to bridge the knowledge gap by sequencing and analysing richly annotated tumour samples from patients of Indian ancestry. In order to achieve this goal, SIGO brings together the biological expertise of Dr. Jugnu Jain & Dr. Soma Chatterjee from SAPIEN Biosciences, India’s largest biobanking SME, the AI expertise of Dr. Nimisha Chaturvedi from ACCELOM, a French biotech SME, and the computational expertise of EURECOM’s Prof. Raja Appuswamy.

Q. What is the role and contribution of each partner in the collaboration on TNBC?

SAPIEN: The availability of a large collection of patient samples with well-annotated patient clinical, pathological and outcomes data is a critical requirement for developing new paradigms for personalized medicine. Biobanks such as Sapien play a crucial role in this as they curate high-quality patient samples that can be sequenced to generate valuable genomic data [5]. In a unique partnership with a pan-India network of 71 multi-specialty hospitals (Apollo Hospitals), we at Sapien Biosciences have systematically archived patient tissue samples removed from tumours during surgery, together with the associated medical and diagnostic records. BC has been a longstanding focus for Sapien. We have curated a longitudinal database of 18,500 BC cases collected from 8 geographically dispersed hospitals to avoid regional bias. Approximately 20% of these cases are TNBC cases. For this SIGO project, we will use our curated collection of these ~3500 TNBC tissue samples to generate multi-omics data (Exome and mRNA). Since not much research has been done for understanding the genomics of Indian TNBC cases, we expect to use the genomic data generated through this project for the development of early detection diagnostics, treatment-optimizing theranostics, while also enabling IP-rich discoveries such as identification of new drug targets for TNBC.

ACCELOM: The interest in decoding the DNA and understanding human genomics started around 2001 when scientists announced the completion of the human genome sequencing. Since then, over the past 22 years, genomics has seen immense technological advancements and a giant shift from the $1,000 genome decade towards a $100/genome future. These advancements and decrease in prices have made it possible today to obtain different kinds of large genomic datasets on thousands of individuals. Now, obtaining these genomic datasets is one thing but trying to extract the important information out of it is like trying to solve a 3 billion piece jumbled up puzzle. On top of that, we have multiple kinds of genomic datasets, like copy number data (does a person have the normal 2 copies of a gene, or an abnormal number of copies) and gene expression data (which tells us the level at which a gene is expressed within a cell), for the same set of individuals. Each individual dataset describes only a part of the genome and analysing them separately will provide only a partial view of the big picture., similar to blindfolded people describing an elephant by touching its separate body parts. An integrated analysis across multiple datasets can provide a complete picture of genotype (your genetic makeup) — phenotype (physical traits) associations, and can not only identify disease driving genomic mutations, but can also help in predicting the response of tumours towards various treatments, and match cancer patients to personalized treatment regimens and clinical trials.

EURECOM: Performing integrated analysis is a computationally-intensive process that involves a series of secondary processing steps that need to be performed for each data set, followed by the execution of complex machine learning models in a tertiary stage. As the amount of genomic data grows exponentially, it becomes increasingly difficult to perform such an analysis in a timely and cost-efficient manner. Thus, EURECOM will lend its expertise on the computational front to enable scalable analysis of genomic data. My team at EURECOM has been working on accelerating computationally-intensive tasks in several domains, from relational databases [6] to bioinformatics [7]. The central idea is to develop novel data parallel algorithms and heterogeneous programming techniques that can execute these tasks on hardware accelerators, like Graphics Processing Units (GPU) or Field Programmable Gate Arrays (FPGA), instead of executing them on CPUs. In SIGO, we will collaborate with ACCELOM on developing such a hardware-accelerated version of their multi-omics pipeline that can process data 10x faster than their current CPU-based pipeline.

Q. What is the message you would like to give for the future?

SAPIEN: We would like to see cancer becoming a chronic disease, such as hypertension or diabetes, manageable by affordable, less toxic and easy-to-take medicines. We chose to work on breast cancer because it affects women at a fairly young age. The median age of breast cancer is around 55 years old, an age at which a woman is very active, with full family and work life, contributing so much to the society. If we can succeed in increasing the lifespan of breast cancer patients by one or even two decades, it would be life-changing. In the UK, over the last 40 years, they have cut the mortality of breast cancer by half, by better understanding how to personalise treatment approaches. If we could achieve something meaningful for breast cancer, we could do it as well for other cancers such as colon, pancreatic, renal etc. Through this project we are building a collaborative model sharing Indian BC patient samples & data resources, genomic data integrating publicly available TCGA data, coming up with a hypothesis, that can be tested in the clinic with our data. You can see the complete journey of patients by forming this type of datasets and it is important to use them in order to design drugs that optimally treat cancer patients.

ACCELOM: In the future, I would like to have Indian populations better represented in the field of genomics. It is a country of 1.5 billion people after all. I have worked with medical practitioners and hospitals in the western world and I can see how the advancements made in the field of genomics is bringing precision medicine to the forefront when it comes to patient care and personalized treatment plans. This can be done in India as well, and I really hope that with our small step, the research community will be able to realise the potential genomics hold for this type of disease in under-represented populations.

EURECOM: In our collaborative SIGO project, we bring together machine learning, computing, disease expertise and biobanking to address a gap in knowledge for one type of cancer, in one underrepresented population. With this effort, we hope that we can build a blueprint for a Biobanking Data Analytics Platform — one that can make extracting knowledge from genomic data easy and cost-effective, one that can be used by both Sapien and other biobanks around the world to bring genomics-driven precision oncology to everyone.

by Dora Matzakou for EURECOM

REFERENCES

[1] Clin Breast Cancer 2018, Alarming Burden of Triple-Negative Breast Cancer in India, Krishan K Thakur, Devivasha Bordoloi, Ajaikumar B Kunnumakkara

[2] https://www.eurecom.fr

[3] https://accelom.com

[4] https://sapienbio.co.in/

[7]. Int J Mol ImmunoOncol 2016, Using human medical waste to build biobanks, Jugnu Jain, Sreevatsa Natarajan, Soma Chatterjee

[6] VLDB 2021, XJoin: Portable, parallel hash join across diverse XPU architectures with oneAPI, Eugenio Marinelli, Raja Appuswamy

[7] BMC Bioinformatics 2021, Accel-align: A fast sequence mapper and aligner based on the seed–embed–extend method, Yiqing Yan, Nimisha Chaturvedi, Raja Appuswamy

Genomics analysis is at the front of the fight against triple-negative breast cancer. However, not all populations are equally impacted

Written by EURECOM Communication