Signal Processing And Information Theory

This is a class for the Bachelor of Science in Bioinformatics for the Academic Year 2024-25. The course is in English, so this page is in that language. If you are looking for the page about AY 23-24, it's there.
A few words of welcome
I would like these few lines to serve as an incentive to favour the choice of this course over others that are also offered as possible alternatives for completing one's study plan.
Open me
To get an idea of the course content, you can consult the official pages, browse through last year's course, and view the latest version of the specially prepared slides.
In past years, in addition to those enrolled in the Bioinfomatics degree, a roughly equivalent number of students came from the Applied Computer Science and Artificial Intelligence (ACSAI) degree, probably attracted (in addition to the language of delivery, English) by the desire to learn concepts that are not addressed in the ACSAI degree course, even though the same concepts often re-emerge in the context of the applications and problems they face.
Whilst every effort has been made to give the theoretical part several practical approaches to the biological topics covered in the Bioinformatics degree, as discussed in these slides (1) (2) (3), the course itself is an introduction to the theory behind signal processing, including Fourier analysis of signals and images, sampling theorem, frequency response and convolution, correlation among signals and among random processes, numerical techniques (DFT, FFT and Zeta transform), filter analysis and synthesis. An important appendix deals with information theory, which makes it possible to quantify the information content of messages produced by a source described in statistical terms, as well as the information shared between pairs of observations, whether they are in an input/output or cause-and-effect relationship, or even express the observation of epiphenomena of the same system.
So, both categories of students can find in the course useful tools that they can use in their respective fields of interest.
But for any further curiosity, interested students need only contact me!
Course outline
In this box you find descriptions already given in the Google Classroom pages. For space reasons, these are given in an accordion-style fashion, so click the title to open one item, or show all , or close all
Schedule, location and links
- Start date: March 3 – End date: end of May
- Tuesday and Friday from 11:00 until 13:00 (or 14:00 on Tuesday)
- Classroom A1 Dipartimento Scienze Biochimiche (CU010)
- Google classroom is mainly used as a communication and storage tool, so if you plan to attend this year, please subscribe with code esr6xyn
Program
- Continuous signals in time and amplitudes, discrete time symbolic sequences; the case of the genome and proteins
- Operations on signals, sinusoids and complex numbers
- Space of signals. Energy and power of signals and sequences.
- Frequency representation of signals: series and Fourier transform, DTFT, DFT, zeta transform, relation between all of those. Bandwidth of a signal
- Sampling theorem. A/D and D/A conversion.
- Filtering of a signal, convolution, spectral product
- Indicator sequences and their transform, identification of exons on a frequency basis, mention of other techniques.
- Two-dimensional Fourier transforms as the basis for image processing in the spatial frequency domain, and three-dimensional Fourier transforms applied to X-ray crystallography, used to discover the shape with which proteins are folded
- Random signals. Autocorrelation of a signal and of a process, Wiener's theorem. Interpretation as a dot product. Evaluation of the SNR arising from quantization noise
- Long-distance correlation of genomic sequences, 1/f spectral density. Serial coding of protein sequences, its transform, consensus spectrum and characteristic frequency for homofunctional protein groups.
- FTIR spectrometry based on an interferometer, which from a SP point of view is a comb filter. By moving the interferometer mirror an autocorrelation function is measured, and by the Wiener theorem its anti-transform gives the power density at the output of the sample material under examination, giving cues about its chemical constituents
- Information by symbol and source entropy, without and with memory. The genetic code.
- Entropy of a Gaussian process, information measures for a pair of random variables: joint entropy, conditional entropies, average mutual information I(X;Y). Channel equivocation and noise entropy, channel capacity.
- Information theory concepts in biochemical signaling systems, estimation of mutual information I(X;Y) in between transcription factors and gene expression. Measure of the capacity of biological "channels", data processing inequality, bias, and rate-distortion theory
- Types of analog and digital filters. Numerical filtering via DFT for batch processing: discrete and circular convolution, convolution between finished duration sequences, convolution between an infinite input sequence and an impulse response with finite length. Method of Overlap and Add. FIR and IIR digital computational architectures used for real time processing
Texts adopted
During the previous years in which the course was taught, I created a series of slides to be projected in the classroom, which in my opinion represent a good compromise between expository synthesis and clarity of content, and which are available individually as indicated on https://teoriadeisegnali.it/to-slide-or-not-to-slide/, or all together in zipped format at this other address https://teoriadeisegnali.it/story/pub/bioinf/slide.zip. Furthermore, the part about X-ray crystallography and 3D signal processing can be found here https://teoriadeisegnali.it/items/le-armoniche-di-un-cristallo/
I have been working on a book on signals and telecommunications for over twenty years, and this course is giving me the opportunity to translate it. I will be very grateful if you would point out any corrections to me! You can download it's actual shape at https://teoriadeisegnali.it/items/signal-processing-and-information-theory/. Italian speakers may prefer to read the Italian version of the book, which can be accessed at https://teoriadeisegnali.it/sfoglia-il-testo-trasmissione-dei-segnali-e-sistemi-di-telecomunicazione/
Other material is still being identified, or open the next topic
Reference bibliography
These are some of the insights I have preliminarily found, I will give indications on what to read during the course
Genomic Signal Processing
- D. Anastassiou, Genomic Signal Processing (2001) - http://www.ece.iit.edu/~biitcomm/research/references/Other/Genomic%20Signal%20Processing/GSP.pdf
- P. Ramachandran, A. Antoniou, Genomic Digital Signal Processing (slides) - https://www.ece.uvic.ca/~andreas/RLectures/GenomicDSP04-Paramesh-Pres.pdf
- P.P. Vaidyanathan, Genomics and Proteomics: A Signal Processor’s Tour (2004) http://gladstone.systems.caltech.edu/dsp/ppv/papers/CASGeneGalley.pdf
- R. Palaniappan, Biological Signal Analysis, https://bookboon.com/en/introduction-to-biological-signal-analysis-ebook
- J.V. Lorenzo-Ginori et al, Digital Signal Processing in the Analysis of Genomic Sequences (2009) - https://www.researchgate.net/profile/Juan-Lorenzo-Ginori/publication/228359227_Digital_Signal_Processing_in_the_Analysis_of_Genomic_Sequences/links/0fcfd5111298c0ae8d000000/Digital-Signal-Processing-in-the-Analysis-of-Genomic-Sequences.pdf
Insights into some topics
- G. Alterovitz, M.F. Ramoni Ed., Systems Bioinformatics - An Engineering Case-Based Approach (2007) - https://www.codecool.ir/extra/202033231410343Systems%20Bioinformatics_%20An%20Engineering%20Case-Based%20Approach.pdf
- Edward R Dougherty et al, Genomic Signal Processing and Statistics (2005) - https://downloads.hindawi.com/books/9789775945075.pdf
More advanced topics
- Steven W. Smith, The Scientist and Engineer's Guide to Digital Signal Processing, (2011) - http://www.dspguide.com/
- W. Zhang et al, Network-based machine learning and graph theory algorithms for precision oncology (2017) - https://www.nature.com/articles/s41698-017-0029-7
- A. Ortega et al, Graph Signal Processing: Overview, Challenges and Applications (2018) - https://arxiv.org/abs/1712.00468
- Y. Li et al, Deep learning in bioinformatics: introduction, application, and perspective in big data era (2019) - https://arxiv.org/abs/1903.00342
- X.M. Zhang et al, Graph Neural Networks and Their Current Applications in Bioinformatics (2021) - https://www.frontiersin.org/articles/10.3389/fgene.2021.690049/full
All the drive of downloaded articles
Progress of individual lessons
In addition to summarizing the topics covered from time to time, references to the support material used in class are also provided
tue 4th march - Course overwiev
- After finding the classroom and managing to connect the projector, we introduced ourselves to each other, and then with the help of slides a first introduction is made regarding the nature of signals and their processing.
- Then the comparison is made between the transfer of information typical of the TLC world with that which occurs at the cellular level, taking the first steps in information theory.
- Some applicative aspects of signal processing and information theory in the biochemical, medical, genetic, microscopic and structural fields are then outlined, up to a mention of the most recent applications.
- link to the 1st slide set Course Overview
- some waveform and spectral analysis tools, such as Audacity and Friture
fri 7th march - Signals and Systems
- This group of slides illustrates basic concepts and forms a vocabulary on which the continuation of the course will be based
- Concepts such as energy, power, impulsive and periodic signals, bandwidth occupation, impulse and frequency response, filtering, autocorrelation, analog and digital transmission, sampling and quantization, are introduced
- The lesson ends by illustrating the definitions of mean value, even and odd symmetry, causality, and with the conditions for defining a signal as periodic, power, energy, impulsive and of limited duration
- link to the 2nd slide set Signals and Systems
- a small Octave (Matlab clone) script to listen to short segments of speech and display their frequency content
- A very nice filtering app dfilter
- VLC is a nice player with effects and frequency equalizer
tue 11 march - Signals and Systems & Fourier Series
- We have dealt with signal operations such as time shift, inversion, scale change, combinations of these, and how to draw the result. Commonly used signals are introduced, and then complex numbers are addressed with their properties and different representations. Euler's formula therefore allows us to link complex numbers with trigonometric functions. The set of slides ends with definitions relating to the characterization of the physical systems through which the signals pass.
- A new series of slides is started on the topic of Fourier series for periodic signals. Phasors are introduced first, by which an alternative expression for a cosine wave can be obtained. After illustrating the concept of harmonic frequencies, the expression of the Fourier series is introduced, as well as the formula for calculating the homonymous coefficients that appear in it.
- link to the 3rd slide set - Fourier series and signal spaces
- A very nice graphic frequency analysis tool for periodic signals
- An handy audio manipulation tool - Audacity
- An in-depth look at vector sum or parallelogram rule
fri 14 march - Fourier Series
- After some reflections (carried out using Genius) on the imaginary zeros of the function x2+1, and on the rotation property of the phase following the product by the imaginary unit j, we continued the exposition with the properties of the Fourier series, which will also be found in the case of the transform
- Conjugate or Hermitian symmetry for the Fourier coefficients of a real signal, trigonometric series, rectangular wave, truncated series, Parseval’s theorem, orthogonality of complex exponentials, Power spectrum of periodic signals.
tue 18 march - Signal spaces & Fourier transform
- A review of linear algebra notions is undertaken, defining the nature of a vector space, and of its orthogonal representation basis, then addressing the definition of normed space, and that of scalar (or inner, or dot) product for finite-dimensional spaces
- The validity of Schwartz's inequality allows the scalar product to induce a norm, and the orthogonality of the basis allows to obtain the vector coefficients through a scalar product. If the basis is orthonormal, the scalar product can be evaluated as the sum of products between the homologous representation coefficients
- These concepts are then applied to the case of infinite-dimensional spaces, allowing to give a geometric interpretation to quantities and operations already defined (or still to be defined) for periodic, energy and power signals.
- it is time to start studying the Fourier transform, which we apply immediately to a rectangular signal; the demonstration of the connection between the Fourier series of a periodic signal, and the transform of a single period, allows us to find a result that is already known
- the concepts illustrated for vector spaces are then adapted to define the mutual energy between energy signals, the relative orthogonality condition, to extend Parseval's theorem to them, to demonstrate the unitarity of the Fourier transform, and to define the energy density spectrum.
- finally, the linearity characteristics of the transform, of conjugate symmetry for real signals, of duality, and of the initial value are illustrated.
- link to the 4rd slide set - Fourier transform and convolution
fri 21 march - Fourier transform and convolution
- Transform of a leading or lagging signal, effect of the phase spectrum on the shape of the signal, lack of shape information in the energy density, frequency shift, change of time scale
- Definition of the Dirac pulse, its genesis as a demonstration of the transform of a constant in time and frequency, its application in the transform of a periodic signal
- Sampling and sieving properties of the Dirac delta, definition of impulse response for an LTI system, superposition of effects in the case of multiple input impulses, definition of convolution integral as an extension of the representation of the input signal by sieving. Graphical analysis of how convolution calculates each single output value
- Audacity has been been used for listening to some audio impulse responses recorded in reverberant places around the world
- Falstad's tool has been used for checking the linear phase effect on the signal waveform
tue 25 march - Fourier transform properties
- Convolution with a translated impulse, convolution theorem, Frequency multiplication (filtering), measure of the frequency response and of the phase difference, linear phase and delay, cascaded systems, intro to filter synthesis, transform of a triangle
- Time multiplication (modulation and windowing). Rectangular vs. triangular (or Bartlett) windows, spectral resolution and leakage. Short time spectral analysis
- Transform of the derivative and integral of a signal
- Pulse train definition, its transform, and its applications to the transform of a periodic signal
- Enjoy playing with the Falstad's Dfilter app - hint: download the java version (the .jar file) and run it locally. Try different kind of input signals such as the frequency sweep, and also watch at the phase response of different kind of filters (do you remember about the need for a linear phase?)
- Experiment with spectrogram analysis using different window lengths and type by Audacity
- This ends the telling about 4rd slide set - Fourier transform and convolution
fry 28 march - Sampling
- There were only four students, one of whom was absent last time. So we started with a review, carried out with the help of the discussion of the mid-year test of the previous academic year, as well as experimenting with windowing options using Audacity.
- Finally, it's the turn of the fifth set of slides, which begins with a demonstration of how a band-limited analog signal can be completely described by its time samples, provided they are taken at a rate at least twice the maximum frequency present. Otherwise, a distortion called aliasing occurs. This type of processing happens inside any cell phone, PC, television set, audio player, so that humans can no longer live without it.
- The link to the 5rd slide set - Sampling and digital signal processing
- The course web index also contains the text of the intermediate tests carried out in previous years, complete with solutions, so that they can be used to check one's understanding of the subject. These pdf files are those whose name is a date.
- By the way! This year's intermediate test is planned to be held on April 11th or 15th
tue 1 April - A/D & D/A implementation, Discrete Fourier Transform
- The manufacturing aspects that allow analog-to-digital conversion (and vice versa) have been discussed
- Numerical calculation methods are introduced that allow to carry out the spectral analysis, based on the numerical sequence obtained by sampling the analog signal: DTFT, DFT and its inverse, zeta transform, the relations that connect them.
- The link to the set of slides involved
- Together with the students, we chose to set the date for the intermediate verification test as April 15th. At the end of the lesson, the link to the track will be distributed in PDF format, and there will be time for two more days to dedicate ourselves to solving the questions asked. On Friday 18th we will meet in classroom A1 where the teacher will collect the papers (those who are away from Rome can send a CLEARLY READABLE photo instead)
fry 4 april - Filterbank, DNA Frequency analysis, circular convolution, overlap and add
- First of all, I would like to thank for their presence the only two students who came today, hoping to find many of you next Tuesday in class, since we will approach the new topic of two-dimensional DFT, applied to images.
- It has been discussed how the result of a DFT can be thought of as an output vector from a bank of band-pass filters. It has also been observed how the DFT can be computed starting from numerical sequences not derived from the sampling of an analog signal, or even starting from purely symbolic sequences, made numerical following some embedding method.
- This aspect was then applied to the case of identifying the coding regions within the biological sequences represented by the chain of bases (A, T, G, C) that compose the DNA hosted in the cellular nucleus of every living being, animal or plant, identifying those regions for which the short-term spectrum highlights the presence of a periodicity of three positions within the sequence.
- We then started to deal with the topic of numerical filters, which can be achieved either by convolution in time or by frequency product: we then reasoned on how this last method is equivalent to a circular convolution, and on how the overlap and add expedient allows to perform the convolution via DFT even in the case in which one of the two sequences has an indefinite duration.
- The link to the set of slides about DNA Frequency analysis (the one about the topics on filtering is given in the upper accordion)
- We played with some Octave code to show how to set up some code that can perform filtering: files
filtra_audio2.m
,filtra_audio2_conv.m
,filtra_audio_passabanda.m
tue 8 April - Crystallography in a Signal Processing mood
- The opportunity has come to extend Fourier frequency analysis to two dimensions, applying it to the case of image signals. Attention is paid to the nature of the representation basis functions, themselves 2D signals, and to how the pair of indices that distinguish them can be represented within a matrix, or lattice, whose elements are associated with the values obtained by means of the 2D DFT
- After the rearrangement of the quadrants which places the zero frequency at the center of the lattice, the effect of some properties already known for the one-dimensional case is illustrated by examples
- We then proceed to a further dimensional leap, defining the three-dimensional Fourier transform, to be used in the context of X-ray crystallography. Here the 3D signal is a (electrons) density, as are the functions of the representation basis, the volumetric harmonics. The lattice made up of the relative triples of indices now takes on the appearance of a 3D matrix, which in crystallography is indicated with the name of reciprocal space
- Since a crystal is a 3D periodic structure obtained by repeating the same molecule, the overall electron density is the result of the convolution between the density of a molecule and a box of stars, which in frequency corresponds to the product between the 3D transforms, from which the definition of structure factors is derived
- In the remaining time we retrace the description of things in the terms adopted in crystallography, introducing the Bravais lattices, the Miller indices, and the relation of the latter with the reciprocal space, that is, the lattice in which the volumetric harmonics are mapped. But the most peculiar aspect is that crystallography is an inverse problem, seeking to discover the electronic density, starting from the structure factors, which in turn are observable only through the diffraction undergone by X-rays
- The link to the study material about Crystallography in a DSP mood
- Play with Fourier Transform Filtering Techniques in Optical Microscopy Primer - noise removal and 2D Inverse Fourier Transform Playground lets to draw 2D filters by hand, also shows 2D basis functions in reciprocal space
fry 11 april - Crystallography, Tomography
- The exposition on X-ray crystallography has been concluded, which has given the opportunity to address the extension of DFT to 2 and 3 dimensions. The concepts underlying the functioning of crystallography have been briefly illustrated, and how these lead to the measurement of structure factors, which in terms of spectral analysis are nothing other than Fourier coefficients.
- We then moved on to describe the basics of the functioning of tomography devices, which also use X-rays, which allow us to obtain a measurement (the sinogram) called projection, which represents the absorption due to the tissues crossed by the rays, as the angle varies. The back projection of the lines of the sinogram allows us to trace the image corresponding to the absorption, which however is blurred due to the excessive importance given to the low spatial frequencies. Thanks to the Fourier slice theorem, we understand the reason, and we find the solution
- Study material about tomography is inside of the slide set about Signal processing in Bioinformatics
tue 15 april - Intermediate test and on-line Q&A
- As agreed, the text of the intermediate test will be published, to be carried out at home, and returned on Friday 18
- The classroom is busy for other purposes, so I will connect online between 12 and 13 (or later, if necessary)
- Lessons will resume starting from Tuesday 29 April, continuing continuously for 5 weeks, during which the following topics will be carried out: analog and numerical filters, random processes, information theory
- Link to the PDF of the intermediate test
- Link to the on-line Q&A (12-13h)
tue 29 april - Test outcome and Filters
- My own solution to the intermediate test has been published, as well as the outcome of the evaluation.
- Slide set #7 has been approached, starting with analog filters, the Laplace representation for the transfer function H(s), from which the H(f) derives. H(s) takes the form of a ratio of well-known polynomials, synthesized by applying approximation algorithms to satisfy a tolerance scheme mask, and factorized in order to build the filter as a cascade of elementary cells.
- Then we approach delay-based filters, in between the analog and the digital domain, defined through a computational scheme. They exhibit periodicity of H(f) in frequency, thus allowing to derive taps by a Fourier series expansion, and a FIR design allows to fullfill linear phase requirements. Some examples are given, as the moving average and the first order FIR, also working as a differentiator or comb. Lastly, we introduce a feedback in the architecture, used by the first-order IIR and the exponential average.
- Link to the 7th slide set
- It's time to compile the OPIS questionnaire, code D1WAY0PW
fri 2 may - Design of digital Filters
- Design techniques for digital filters were introduced. The taps of FIR filters can be computed by Fourier series of the periodic, truncated and delayed H(f). multiplying h(t) by a window function can reduce oscillations and lateral tails of H(f). Otherwise, the values of hn can be obtained as IDFT of an iteratively obtained H(f) starting from a tolerance template, achieving a reduced complexity.
- After thinking a bit about the zeta transform of a delayed sequence, and of a shifted Kronecker delta, we obtained the expression of the transfer function H(z) for a generic IIR filter, which takes on the appearance of a ratio between polynomials, and which is stable as long as its poles fall within the unit circle. The two calculation schemes obtained directly from the polynomial coefficients are then shown.
- The last aspect covered concerns how to obtain the coefficients of H(z). We use the consolidated techniques for the synthesis of analog filters, applying to the result H(s) a change of variable capable of correctly mapping the s and z planes, and therefore obtaining the corresponding H(z).
- Link to the Signal sub-package of the SciPy library, from which to experiment some coding
- Link to the 7th slide set
- It's time to compile the OPIS questionnaire, code D1WAY0PW
tue 6 may - Random signals ergodicity and correlation
- Since it is not possible to compute a Fourier transform for random signals, it is necessary to adopt a statistical description of it. The concept of random process is then introduced, its ensemble means and time averages are illustrated, then the conditions for defining it as a stationary and ergodic process are explained. The new results allow us to return to slide set #5, to address the calculation of the quantization SNR. After some examples, a 2D random variable is extracted from the process, defining correlation and covariance between the two components, both from a probabilistic and a statistical point of view
- The autocorrelation function defined for both random processes and deterministic signals is then introduced, as well as the intercorrelation between different signals; the relevant properties are discussed, and links to convolution are clarified. Few words are finally spent on the link between Pearson’s coefficient and Schwartz’s inequality, just as the matched filter can also be thought of in this way
- Link to the 8th slide set
- It's time to compile the OPIS questionnaire, code D1WAY0PW
fri 9 may - Wiener theorem and filtering of random processes
- Wiener’s theorem is a valuable tool for obtaining the spectral density for any type of signal, be it deterministic or random, as well as for opening an alternative path towards already known results. Examples of its application are given
- Then, the issues of random signals and filtering are jointly addressed: in fact, the expression of the spectral density at the filter output is considered, and then its statistical description is evaluated. The section ends by analyzing the result of adding or multiplying random processes
- Link to the 8th slide set
- As an example of application, some questions proposed in the previous intermediate tests have been discussed, such as in the cases of 31 maggio, 3 giugno, 30 maggio
- It's time to compile the OPIS questionnaire, code D1WAY0PW
tue 13 may - Multivariate Gaussian and spectral estimation
- A set of random variables is jointly Gaussian if their joint probability density function takes a particular form, completely described by the vector of means, and the covariance matrix. If the r.v. are uncorrelated, then the covariance m. is diagonal, and this allows us to show that in this case the random variables are also statistically independent
- A random process is Gassian if the random variables taken from it at different instants of time are jointly Gaussian. If the process is also ergodic, the elements appearing in the corresponding covariance matrix can be obtained from the autocorrelation function calculated on one of its members
- Spectral estimation using the periodogram may seem like a procedure already seen, but it is now approached from the point of view of estimation theory. Even if not biased, the periodogram turns out to be inconsistent, since for each frequency its variance does not decrease as the interval on which it is calculated increases. On the other hand, this does not happen in the case of a sinusoid immersed in noise
- A different spectral estimation technique known as LPC, of parametric nature, relies on the signal production model as the result of a pole-only filtering. The IIR filter coefficients are obtained from the difference equation describing the filter, with the aim of minimizing the expected value of the prediction error energy
- Link to the 8th slide set
- A student asked me how active noise cancelling headphones work. I put together some links on the subject
- In the next three lessons I hope complete also the part about information thoery, so we can fix the date
- In the next three lessons I hope to also complete the part on information theory, so we can set the date on which the second test is assigned to May 27, with time to take it until 30. For those who intend to take a project, it is not necessary to take the second test. A proposed grade will follow, which can be modified after an optional oral interview. For those who have not taken the tests or the project, an oral interview will be held following the appeal dates that will be open from June onwards
- It's time to compile the OPIS questionnaire, code D1WAY0PW
At the end of each lesson, a reference to the slides used is provided. Please note that this material is often updated based on deficiencies observed during the lesson, so possession of the most recent version can be verified by checking the date reported in the online index