Martina Elena Tarozzi Science Reviews - Biology, 2024, 3(1), 9-15
12
Applications of AI in genomics and transcriptomics
The complexity of data generated by high-
throughput sequencing technologies can make tra-
ditional analysis methods insufficient for identify-
ing patterns and extracting insights.ML and DL
methods have been applied to sequencing data with
a vast number of scopes. Here we provide a selec-
tion of some of the most relevant fields of applica-
tion. This section aims at providing common, prom-
ising or exemplifying applications of AI methods on
NGS data in biology and bioinformatics, while it
should not be considered a complete overview of all
its possible applications in biology.
Liquid biopsies and personalized medicine
Liquid biopsies are minimally invasive diagnostic
methods that analyze bodily fluids, such as blood,
urine, or cerebrospinal fluid, to detect and monitor
diseases, and are especially relevant in early diag-
nosis of cancer and neurodegenerative diseases
[14,15]. These samples contain cell-free Nucleic Ac-
ids (cfNA), circulating tumor DNA (ctDNA), circu-
lating tumor cells (CTCs), exosomes, and other bi-
omarkers that allow the extraction of genomic, tran-
scriptomic and epigenomic information, which can
be used for early detection, monitoring of progres-
sion and support personalized therapeutic deci-
sions to target the disease [15]. These types of data
are extremely complex, subject to many confound-
ers and for most features characterized by a high
signal-to-noise ratio. AI algorithms have signifi-
cantly advanced data analysis and interpretation of
this data and consequently the whole field in sev-
eral aspects, such as in risk assessment and early di-
agnosis [16], disease subtype classification [17],
treatment response prediction [18] and in monitor-
ing minimal residual disease [19]. For example,
SVM were effectively used to predict the probabil-
ity of reoccurrence based on gene expression data
or specific gene signature in different types of can-
cers[20,21], improving the monitoring of the molec-
ular profile of the patient’s tumor and the predic-
tion of personalized treatments at different times.
Furthermore, ctDNA methylation patterns have
been extensively studied with several ML classifica-
tion or regression methods as well as with neural
networks to achieve effective early detection both in
cancer research[22] and in the context of neuro-
degenerative diseases[23]. In this context, AI
reaches some of the most notable results in terms of
tangible impacts in molecular biology and medicine,
and it is expected that its role in personalized med-
icine will increase in the near future.
Regulatory genomics
Regulatory genomics is the field of genomics that
studies gene expression regulation trying to iden-
tify regulatory regions (such as enhancers, promot-
ers, transcription start sites (TSS), and genome ac-
cessibility) and the regulatory hierarchy between
these regions and other genes. In this context, deep
learning and more specifically Convolutional Neu-
ral Networks have been applied with the best re-
sults. One of the commonly used architectures in-
volves treating the input DNA sequence as categor-
ical variables. Each position in the sequence is one-
hot encoded, resulting in a vector where only one
channel corresponds to the A-C-G-T nucleotides
(with a value of 1) provided to the input layer.
These kernels are followed by convolutional layers,
which simplify the information to extract the most
relevant concepts. Convolutional filters are initially
trained on specific regions of interest with known
regulatory properties. The knowledge gained by
the convolutional neural network (CNN) during
training can then be applied to new regions for ac-
curate predictions. This architecture has been suc-
cessfully applied to various types of sequencing
data, particularly in the context of epigenomic stud-
ies. This overall architecture has been used on dif-
ferent types of sequencing data and has provided
particularly interesting results in terms of epige-
nomic studies. For example, this type of architec-
ture has been applied to DNAase-seq data to pre-
dict cell-type specific regions of accessible chroma-
tin [24], to identify promoters and distal regulatory
regions along mammalian genomes [25], to predict
cell-type specific gene expression from DNA se-
quencing data and alterations of it associated to var-
iant alleles [26], and to identify genomic regions re-
sponsible for the three-dimensional chromatin fold-
ing in the nucleus [27] from genomic and Hi-C data.
Considering that both the experimental and compu-
tational technologies used in these studies are rela-
tively young, this is arguably one of the most prom-
ising research fields for the next decades, with the
potential to answer many of the open questions in
functional genomics.
Improvement of genome editing specificity