Scientists may now be one step closer to understanding the internal logic of artificial intelligence (AI) models used for genomics thanks to a new tool from a group at the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory (CSHL). In a new article published this week in Artificial intelligence of naturethey describe a computational tool called Surrogate Quantitative Interpretability for Deep Networks (SQUID) which uses deep neural networks (DNNs) to help interpret how AI models analyze the genome.
In their paper, which is entitled “Interpretation of cis-regulatory mechanisms from genomic deep neural networks using surrogate models,” the developers explain that SQUID uses “simple models with interpretable parameters” to “approximate the DNN function to localized regions of sequence space.” They claim that unlike other methods, SQUID “removes the confounding effects that nonlinearities and heteroscedastic noise in functional genomics data can have on model interpretation.” As proof of its effectiveness, they present results from experiments showing that SQUID “consistently identifies binding motifs of transcription factors, reduces noise in attribute maps, and improves variant effect predictions.”
“The tools that people use to understand these patterns have mostly come from other fields like computer vision or natural language processing. While they can be useful, they are not optimal for genomics,” explained Peter Koo, an assistant professor at CSHL and senior author on the paper. “What we did with SQUID was leverage decades of quantitative genetics knowledge to help us understand what these deep neural networks are learning.”
SQUID works by generating a in silico library of variant DNA sequences, training a surrogate model called a latent phenotype model on the data using a program called Multiplex Assays of Variant Effects Neural Network or MAVE-NN, and then visualizing and interpreting the parameters of model. With this tool, scientists can run thousands of virtual experiments simultaneously and identify which algorithms make the most accurate predictions for variants.
While virtual experiments can’t exactly replace lab tests, “they can be very informative” in helping scientists generate hypotheses about how a particular region of the genome works or how a mutation might have a clinically relevant effect ,” Justin Kinney, a CSHL associate professor and one of the study’s co-authors.
The scientists also describe using SQUID to study epistatic interactions at cis-regulatory elements as a way to evaluate its performance. To test whether SQUID could work at this task, they “implemented a surrogate model that describes all possible pairwise interactions between nucleotides within a sequence.” They then used the model “to quantify the effects of pairs of putative AP-1 binding sites.” Their results showed that the “pairwise interaction models” they created produced more accurate results than the “additive surrogate models.” Specifically, SQUID was able to “quantify epistatic interactions that were otherwise obscured by global nonlinearities in DNNs.”
Compared to some other methods, SQUID is more computationally demanding, its developers note. They suggest it may work better for researchers working on in-depth analysis of specific sequences such as disease-associated loci than those working on large-scale genome analyses.
#computational #tool #elucidates #deep #neural #networks #interpret #genomic #data
Image Source : www.genengnews.com