INA Research Group

BWA-MEME: Design Overview

Summary

The growing use of next-generation sequencing and enlarged sequencing throughput require efficient short-read alignment, where seeding is one of the major performance bottlenecks. The key challenge in the seeding phase is searching for exact matches of substrings of short reads in the reference DNA sequence. Existing algorithms, however, present limitations in performance due to their frequent memory accesses.
BWA-MEME is the first full-fledged short read alignment software that leverages learned indices for solving the exact match search problem for efficient seeding. BWA-MEME is a practical and efficient seeding algorithm based on a suffix array search algorithm that solves the challenges in utilizing learned indices for SMEM search which is extensively used in the seeding phase. Our evaluation shows that BWA-MEME achieves up to 3.45 speedup in seeding throughput over BWA-MEM2 by reducing the number of instructions by 4.60, memory accesses by 8.77 and LLC misses by 2.21, while ensuring the identical SAM output to BWA-MEM2.

Code

Publications

Bioinformatics

BWA-MEME: BWA-MEM emulated with a machine learning approach

Youngmok Jung, and Dongsu Han

Bioinformatics Mar 2022

Paper Project Code

Members

Youngmok Jung

Alumni

Dongsu Han

Principal Investigator