Batch effect correction of amplicon read counts data using dimensionality reduction
The strong correlation between cancer survival rates and progression stage at diagnosis highlights the importance of non-invasive, affordable, and robust early-detection procedures. A promising approach that has been tested in large-scale studies is based on liquid biopsies, or more precisely blood tests. Differences in cell-free DNA (cfDNA) fragmentation patterns between cancerous and healthy samples constitute an important biomarker in liquid biopsies that is detectable using an amplicon-based approach. This approach presents the advantage of being sensitive, non-invasive, and cheap. However, data generated by this method is usually vulnerable to batch effects, indicating a strong need for data correction. Here, we use a linear dimensionality reduction technique combined with the Wilcoxon test to find and remove signatures highly influenced by batch effect sources. We then implement a classifier to illustrate and identify the amount of batch effect signals used in the non-corrected version of the classifier. This method shows promise, and future research may focus on refining the correction method to capture non-linearities and remove the need for prior identification of batch effect sources.