Signal Processing And Information Theory

This is a web page supporting the course held as part of the three-year degree in Bioinformatics for the academic year 2025-26. The course is in English, so this page is in that language. If you're looking for the (previous) academic year 2024-25 page, you can find it on this same site.

A few words of welcome

I would like these few lines to serve as an incentive to favour the choice of this course over others that are also offered as possible alternatives for completing one's study plan.

Open me

To get an idea of the course content, you can consult the official pages, browse through last year's course, and view the latest version of the specially prepared slides.

In past years, in addition to those enrolled in the Bioinfomatics degree, a roughly equivalent number of students came from the Applied Computer Science and Artificial Intelligence (ACSAI) degree, probably attracted (in addition to the language of delivery, English) by the desire to learn concepts that are not addressed in the ACSAI degree course, even though the same concepts often re-emerge in the context of the applications and problems they face.

Whilst every effort has been made to give the theoretical part several practical approaches to the biological topics covered in the Bioinformatics degree, as discussed in these slides (1) (2) (3), the course itself is an introduction to the theory behind signal processing, including Fourier analysis of signals and images, sampling theorem, frequency response and convolution, correlation among signals and among random processes, numerical techniques (DFT, FFT and Zeta transform), filter analysis and synthesis. An important appendix deals with information theory, which makes it possible to quantify the information content of messages produced by a source described in statistical terms, as well as the information shared between pairs of observations, whether they are in an input/output or cause-and-effect relationship, or even express the observation of epiphenomena of the same system.

So, both categories of students can find in the course useful tools that they can use in their respective fields of interest.

But for any further curiosity, interested students need only contact me!

Course outline

In this box you find descriptions already given in the Google Classroom pages. For space reasons, these are given in an accordion-style fashion, so click the title to open one item, or show all , or close all

Schedule, location and links

  • Start date: March 3 – End date: end of May
  • Tuesday and Friday from 11:00 until 13:00 (or until 14:00 on Tuesday)
  • Classroom III Dipartimento Scienze Biochimiche (CU026 – E01PS1L086)
  • Google classroom is mainly used as a communication and storage tool, so if you plan to attend this year, please subscribe with code esr6xyn

Program

  • Continuous signals in time and amplitudes, discrete time symbolic sequences; the case of the genome and proteins
  • Operations on signals, sinusoids and complex numbers
  • Space of signals. Energy and power of signals and sequences.
  • Frequency representation of signals: series and Fourier transform, bandwidth of a signal. Properties of the Fourier transform, filtering of a signal, convolution, spectral product. Pulse train.
  • Sampling theorem. A/D and D/A conversion. Numerical frequency analysis: DTFT, DFT, zeta transform, relation between all of those. Numerical filtering via DFT for batch processing: discrete and circular convolution, convolution between limited duration sequences, convolution between an infinite input sequence and an impulse response with finite length. Method of Overlap and Add.
  • Indicator sequences and their transform, identification of exons on a frequency basis, mention of other techniques.
  • Two-dimensional Fourier transforms as the basis for image processing in the spatial frequency domain, and three-dimensional Fourier transforms applied to X-ray crystallography, used to discover the shape with which proteins are folded
  • Basic principles of tomography for the reconstruction of volumetric cross-sections from radial projection data
  • Frequency response of analog filters starting from the ratio of polynomials in the Laplace variable. Tolerance mask, phase linearity and second-order cell cascade. Delay-based filters, periodic frequency response, analysis and synthesis of FIR filters, special cases, IIR filters. Synthesis of fully numerical FIR filters, difference equation described by polynomials in the zeta transform variable, canonical architecture, synthesis of IIR filters by change of variable.
  • Random signals. Autocorrelation of a signal and of a process, Wiener's theorem. Interpretation as a dot product. Matched filter concept. Evaluation of the SNR arising from quantization noise. Filtering of random processes. Multivariate Gaussian and spectral estimation: periodogram and linear predictive coding.
  • Long-distance correlation of genomic sequences, 1/f spectral density. Serial coding of protein sequences, its transform, consensus spectrum and characteristic frequency for homofunctional protein groups.
  • FTIR spectrometry based on an interferometer, which from a SP point of view is a comb filter. By moving the interferometer mirror an autocorrelation function is measured, and by the Wiener theorem its anti-transform gives the power density at the output of the sample material under examination, giving cues about its chemical constituents
  • Information by symbol and source entropy, without and with memory. The genetic code.
  • Entropy of a Gaussian process, information measures for a pair of random variables: joint entropy, conditional entropies, average mutual information I(X;Y). Channel equivocation and noise entropy, channel capacity.
  • Information theory concepts in biochemical signaling systems, estimation of mutual information I(X;Y) in between transcription factors and gene expression. Measure of the capacity of biological "channels", data processing inequality, bias, and rate-distortion theory

Exam Procedure

The exam dates found on the Infostud system correspond to the exam periods scheduled by Sapienza University. Students must register on Infostud before I can record the exam results.

Regarding the exam, there are two midterm tests scheduled during the lecture period. These tests are to be taken at home and are primarily aimed at those who are attending the lectures, with whom I hope to establish a more direct relationship. These midterm tests are also intended to incentivize attendance. The results of these midterm tests, if accepted, will subsequently be recorded via the Infostud system.

For those who do not participate in the midterm tests, the dates published on Infostud will still apply, but I do not plan to complete a new written assignment each time. On the contrary, those who take the exam after the conclusion of the lessons will do so orally.

Texts adopted

During the previous years in which the course was taught, I created a series of slides to be projected in the classroom, which in my opinion represent a good compromise between expository synthesis and clarity of content, and which are available individually as indicated on https://teoriadeisegnali.it/to-slide-or-not-to-slide/, or all together in zipped format at this other address https://teoriadeisegnali.it/story/pub/bioinf/slide.zip. Furthermore, the part about X-ray crystallography and 3D signal processing can be found here https://teoriadeisegnali.it/items/le-armoniche-di-un-cristallo/

I have been working on a book on signals and telecommunications for over twenty years, and this course is giving me the opportunity to translate it. I will be very grateful if you would point out any corrections to me! You can download it's actual form at https://teoriadeisegnali.it/items/signal-processing-and-information-theory/. Italian speakers may prefer to read the Italian version of the book, which can be accessed at https://teoriadeisegnali.it/sfoglia-il-testo-trasmissione-dei-segnali-e-sistemi-di-telecomunicazione/

Other material is still being identified, or open the next topic

Reference bibliography

These are some of the insights I have preliminarily found, I will give indications on what to read during the course

Genomic Signal Processing

Insights into some topics

More advanced topics

All the drive of downloaded articles

Progress of individual lessons

In addition to summarizing the topics covered from time to time, references to the support material used in class are also provided. It may happen that after the lesson I am late in updating the page, but you can still find all the slides of the course here, in reverse chronological order

tue 3th march - Course overwiev

  • After finding the classroom and managing to connect the projector, we introduced ourselves to each other, and then with the help of slides a first introduction is made regarding the nature of signals and their processing.
  • Then the comparison is made between the transfer of information typical of the TLC world with that which occurs at the cellular level, taking the first steps in information theory.
  • Some applicative aspects of signal processing and information theory in the biochemical, medical, genetic, microscopic and structural fields are then outlined, up to a mention of the most recent applications.

fri 6th march - Signals and Systems

  • This group of slides illustrates basic concepts and forms a vocabulary on which the continuation of the course will be based
  • Concepts such as energy, power, impulsive and periodic signals, bandwidth occupation, impulse and frequency response, filtering, autocorrelation, analog and digital transmission, sampling and quantization, are introduced
  • Then the definitions of mean value, even and odd symmetry, causality are given, along with the conditions for defining a signal as periodic, power, energy, impulsive and of limited duration
  • The lesson ends by illustrating signal operations such as time shift, inversion, scale change, combinations of these, and how to draw the result.

tue 10 march - Signals and Systems & Fourier Series

  • Commonly used signals are introduced, and then complex numbers are addressed with their properties and different representations. Euler's formula therefore allows us to link complex numbers with trigonometric functions. The set of slides ends with definitions relating to the characterization of the physical systems through which the signals pass.
  • A new series of slides is started on the topic of Fourier series for periodic signals. Phasors are introduced first, by which an alternative expression for a cosine wave can be obtained. After illustrating the concept of harmonic frequencies, the expression of the Fourier series is introduced, as well as the formula for calculating the homonymous coefficients that appear in it.
  • We then continue the exposition with the properties of the Fourier series, which will also be found in the case of the following formulations of the transform.
  • Finally, the conjugate (or Hermitian) symmetry property of the Fourier coefficients obtainable for a real signal is explored
    • link to the 3rd slide set - Fourier series and signal spaces
    • A very nice graphic frequency analysis tool for periodic signals
    • VLC is a nice player with effects and frequency equalizer, with which you can experience the effect of signal filtering
    • a small Octave (Matlab clone) script to listen to short segments of speech and display their frequency content
    • An in-depth look at vector sum or parallelogram rule
    • You can use Genius) for the drawing of complex functions, as done for the complex exponential, of for finding the imaginary roots of the function x2+1

fri 13 march - Fourier Series

  • Interpretation of Fourier coefficients as phasors and trigonometric series
  • Fourier coefficients for a rectangular wave and how they change while varying τ
    • During this discussion, an apparent inconsistency emerged regarding the case where τ>T/2. Fortunately, Claude AI was able to provide the correct interpretation, showing how the graph of the Xn modulus follows the expected trend, while the graph of the absolute values ​​instead shows alternating values ​​that lend themselves to incorrect conclusions. Here's the link to the explanation, in Italian, but you can easily get a translated version using a browser plugin, or by asking Claude herself.
  • Truncated series and Gibbs phenomenon
  • Parseval’s theorem, orthogonality of complex exponentials, Power spectrum of periodic signals. Fourier series of a cosine

tue 17 march - Signal spaces & Fourier transform

  • There was no one in class today from the Bioinformatics program! Was it a coincidence, or did the idea spread that they couldn't follow the mathematical arguments? Please come and ask me for clarification on anything you don't understand!
  • A review of linear algebra notions is undertaken, defining the nature of a vector space, and of its orthogonal representation basis, then addressing the definition of normed space, and that of scalar (or inner, or dot) product for finite-dimensional spaces
  • The validity of Schwartz's inequality allows the scalar product to induce a norm, and the orthogonality of the basis allows to obtain the vector coefficients through a scalar product. If the basis is orthonormal, the scalar product can be evaluated as the sum of products between the homologous representation coefficients
  • These concepts are then applied to the case of infinite-dimensional spaces, allowing to give a geometric interpretation to quantities and operations already defined (or still to be defined) for periodic, energy and power signals.
  • it is time to start studying the Fourier transform, which we apply immediately to a rectangular signal; the demonstration of the connection between the Fourier series of a periodic signal, and the transform of a single period, allows us to find a result that is already known
  • the concepts illustrated for vector spaces are then adapted to define the mutual energy between energy signals, the relative orthogonality condition, to extend Parseval's theorem to them, to demonstrate the unitarity of the Fourier transform, and to define the energy density spectrum.
  • then the linearity characteristics of the transform, of conjugate symmetry for real signals, of duality, and of the initial value are illustrated
  • Finally, we dealt with the transform of a leading or lagging signal, effect of the phase spectrum on the shape of the signal, lack of shape information in the energy density, frequency shift, change of time scale
    • link to the 4rd slide set - Fourier transform and convolution
    • Falstad's tool has been used for checking the linear phase effect on the signal waveform

fri 20 march - Fourier transform and convolution

There was only one student in class today. Will the others return, or will they only show up for the midterms? Perhaps I wasn't entirely clear. The midterms are for those who attend classes; otherwise, the exam is oral at the end of the course. I'll post the dates on Infostud now.

  • Definition of the Dirac pulse, its genesis as a demonstration of the transform of a constant in time and frequency, its application in the transform of a periodic signal
  • Sampling and sieving properties of the Dirac delta, definition of impulse response for an LTI system, superposition of effects in the case of multiple input impulses, definition of convolution integral as an extension of the representation of the input signal by sieving. Graphical analysis of how convolution calculates each single output value
  • Convolution with a translated impulse, convolution theorem

tue 24 march - Fourier transform properties

  • Frequency multiplication (filtering), measure of the frequency response and of the phase difference, linear phase and delay, cascaded systems, intro to filter synthesis, transform of a triangle
  • Time multiplication (modulation and windowing). Rectangular vs. triangular (or Bartlett) windows, spectral resolution and leakage. Short time spectral analysis
  • Transform of the derivative and integral of a signal
    • Enjoy playing with the Falstad's Dfilter app - hint: download the java version (the .jar file) and run it locally. Try different kind of input signals such as the frequency sweep, and also watch at the phase response of different kind of filters (do you remember about the need for a linear phase?)
    • Experiment with spectrogram analysis using different window lengths and types with Audacity – You can generate a DTMF signal consisting of two tones and/or open a voice audio file, or even record your own voice, and then play with the spectrogram display
    • Some principles of vocal signal production have also been provided, and anyone interested can consult the book and translate it into their own language
    • I forgot to mention the multitude of window functions that have been defined over time, they are listed in this Wikipedia page, along with the associated spectral shape
    • The 4rd slide set - Fourier transform and convolution exposition is almost finished, next lesson will deal with the sampling theorem

fry 27 march - Pulse train and Sampling

Today we have finally addressed the sampling theorem, which marks the boundary between the analog and digital worlds, allowing us to speak more properly of signal processing! It's a shame that there were only two students in the classroom. I hope that those studying remotely will find my slides well-crafted.

  • Pulse train definition, its transform, and its applications to the transform of a periodic signal
  • We started the fifth set of slides - Sampling and digital signal processing: sampling theorem statement, understanding, frequency domain aspects and aliasing.
  • Implementation aspects of sampling: Oversampling, decimation, reconstruction, interpolation. Approximation of Dirac impulses, A/D and D/A conversion

tue 31 March - Discrete Fourier Transform

  • Brief introduction to Uniform quantization and binary coding
  • Numerical calculation methods are introduced that allow to carry out the spectral analysis, based on the numerical sequence obtained by sampling the analog signal: DTFT, DFT and its inverse, zeta transform, the relations that connect them. DFT as a filter bank

fry 17 april - DNA Frequency analysis, circular convolution, overlap and add, polynomial representation of filters

  • The DFT can be calculated from numerical sequences not derived from sampling an analog signal, or even from purely symbolic sequences, rendered numerical using an embedding method. An example of this viewpoint is DNA, where a "frequency" within the coding regions is found equal to 2/3*pi, corresponding to a periodicity of three bases, those that make up a codon.
  • We then started to deal with the topic of numerical filters, which can be achieved either by convolution in time or by frequency product: we then reasoned on how this last method is equivalent to a circular convolution, and on how the overlap and add expedient allows to perform the convolution via DFT even in the case in which one of the two sequences has an indefinite duration.
  • Slide set #7 has been approached, starting with analog filters, and the Laplace representation for the transfer function H(s), from which the H(f) derives. H(s) takes the form of a ratio of well-known polynomials, characterized by the position of the poles on the left of the complex s plane - or inside the unit circle of the z plane.
    • The link to the set of slides about DNA Frequency analysis (the one about the topics on batch filtering is given in the upper accordion)
    • Link to the 7th slide set - Filters
    • We played with some Octave code that performs filtering via a library call that accepts the input xn and the hn samples, and execute an overlap and add implementation through repeated frequency multiplications. The code is here, it works on an audio file that you can find in this web directory, and.. I HAVE JUST TRANSLATED COMMENTS AND PRINTOUTs FOR YOU! Try it if you can, it will ask you something at the end. Change parameters and kind of hn. Or try writing an equivalent program in your favorite language (maybe Python?), albeit with the help of an AI assistant..

tue 21 april - Filters

  • The Laplace representation for the transfer function H(s), from which the H(f) derives, takes the form of a ratio of well-known polynomials, synthesized by applying approximation algorithms to satisfy a tolerance scheme mask, and factorized in order to build the filter as a cascade of elementary cells.
  • Then we approach delay-based filters, in between the analog and the digital domain, defined through a computational scheme. They exhibit periodicity of H(f) in frequency, thus allowing to derive taps by a Fourier series expansion, and a FIR design allows to fullfill linear phase requirements. Some examples are given, as the moving average and the first order FIR, also working as a differentiator or comb. Lastly, we introduce a feedback in the architecture, used by the first-order IIR and the exponential average.
  • Design techniques for digital filters were introduced. The taps of FIR filters can be computed by Fourier series of the periodic, truncated and delayed H(f). multiplying h(t) by a window function can reduce oscillations and lateral tails of H(f). Otherwise, the values ​​of hn can be obtained as IDFT of an iteratively obtained H(f) starting from a tolerance template, achieving a reduced complexity.
  • After thinking a bit about the zeta transform of a delayed sequence, and of a shifted Kronecker delta, we obtained the expression of the transfer function H(z) for a generic IIR filter, which takes on the appearance of a ratio between polynomials, and which is stable as long as its poles fall within the unit circle. The two calculation schemes obtained directly from the polynomial coefficients are then shown.
  • The last aspect covered concerns how to obtain the coefficients of H(z). We use the consolidated techniques for the synthesis of analog filters, applying to the result H(s) a change of variable capable of correctly mapping the s and z planes, and therefore obtaining the corresponding H(z).
    • Link to the 7th slide set
    • Link to the Signal sub-package of the SciPy library, from which to experiment some coding
    • Link to the Python code (copied from examples) used to generate the picture in the last page of the Filters slide set
    • Link to the Python code (made by Claude AI) that designs and implements a Butterworth band-pass filter for audio signals
      • The idea is to stimulate you (the student) to experiment the concepts illustrated by means of some working code examples. Probably, there will be some request in this sense in the upcoming intermediate test (1st of May)

fry 24 april - Random signals ergodicity and correlation

  • Since it is not possible to compute a Fourier transform for random signals, it is necessary to adopt a statistical description of it. The concept of random process is then introduced, its ensemble means and time averages are illustrated, then the conditions for defining it as a stationary and ergodic process are explained. The new results allow us to return to slide set #5, to address the calculation of the quantization SNR. After some examples, a 2D random variable is extracted from the process, defining correlation and covariance between the two components, both from a probabilistic and a statistical point of view
  • The autocorrelation function defined for both random processes and deterministic signals is then introduced, as well as the intercorrelation between different signals; the relevant properties are discussed, and links to convolution are clarified. Few words are finally spent on the link between Pearson’s coefficient and Schwartz’s inequality, just as the matched filter can also be thought of in this way
    • Link to the 8th slide set
    • It's time to compile the OPIS questionnaire, code 2P7U9KBC

tue 28 april - Wiener theorem, filtering of random processes, FTIR, gaussian process, and spectral estimation

  • After having illustrated the properties of the autocorrelation function, we move on to the statement of Wiener's theorem, which is a valuable tool for obtaining the spectral density for any type of signal, be it deterministic or random, as well as for opening an alternative path towards already known results. Examples of its application are given
  • Then, the issues of random signals and filtering are jointly addressed: in fact, the expression of the spectral density at the filter output is considered, and then its statistical description is evaluated. The section ends by analyzing the result of adding or multiplying random processes
  • The operating principle of an instrument for the analysis of the chemical composition of a material, known as FTIR (Fourier Transform Infrared spectrometry), is then explained. It is based on a device (the interferometer) in which a moving mirror implements a first-order FIR filter, or rather a comb, in which the delay varies over time. This causes the detector to observe a signal (the interferogram) that represents an autocorrelation function, whose FFT (by Wiener's theorem) is the received power density. Since the latter depends on the frequency response of the material under observation, it is ultimately possible to trace its chemical composition by identifying the absorption peaks.
  • A set of random variables is jointly Gaussian if their joint probability density function takes a particular form, completely described by the vector of means, and the covariance matrix. If the r.v. are uncorrelated, then the covariance m. is diagonal, and this allows us to show that in this case the random variables are also statistically independent
  • A random process is Gaussian if the random variables taken from it at different instants of time are jointly Gaussian. If the process is also ergodic, the elements appearing in the corresponding covariance matrix can be obtained from the autocorrelation function calculated on one of its members
  • Spectral estimation using the periodogram may seem like a procedure already seen, but it is now approached from the point of view of estimation theory. Even if not biased, the periodogram turns out to be inconsistent, since for each frequency the variance of the estimate does not decrease as the interval on which it is calculated increases. On the other hand, this does not happen in the case of a sinusoid immersed in noise
  • While the periodogram can be used in the spectral estimation of a process that is assumed to be stationary, when this assumption is unfounded as in the case of speech, the short-term spectral estimation can be conducted by the technique known as LPC, which is parametric in nature. It relies on the signal production model as the result of an all-pole filtering. The IIR filter coefficients are obtained from the difference equation describing the filter, with the aim of minimizing the expected value of the prediction error energy
    • Link to the 8th slide set
    • Link to the 6th slide set where at page 42 the FTIR is illustrated
    • There will be no class next Friday (May 1st, Labor Day). Instead, the text of the midterm exam will be distributed, and students will have a week to complete it, as homework. For those who would like to have an idea of ​​the type of questions that will be asked, at the bottom of this page you can find the files of the previous exam tests, together with the solutions.
    • It's time to compile the OPIS questionnaire, code 2P7U9KBC

tue 5 may - Signal Processing in Bioinformatics

  • Today we return to address some applications of the theory presented so far to the field of bioinformatics.
  • After a brief summary of the concepts related to the cell cycle and the nature of the genetic code, we discuss how the identification of the genes responsible for cyclin production was made possible by defining an indicator known as the "Fourier score."
  • We then move on to describe how the coding regions of DNA can be located after defining an "embedding" of the nucleotide sequence in terms of an indicator sequence, on which a DFT-based spectral analysis can be performed, revealing the presence of a periodicity of three bases, which is exactly the length of the codons. But better results can be obtained by proceeding instead of with the DFT, with a very narrow bandpass filter.
  • We then moved on to illustrate the frequency analysis of protein sequences, which shows the presence of a spectral peak characteristic of proteins that perform a similar function.
  • A further interesting result is obtained by calculating the autocorrelation function of genomic sequences, from which a power density spectrum is obtained which highlights the strong correlation existing between particularly distant regions of the genome, which finds its explanation in an evolutionary mechanism.
  • At this point, we begin the discussion of X-ray crystallography, used to determine how proteins fold. Since a crystal is a three-dimensional periodic structure, it seems logical that it could be described by a three-dimensional Fuorier series, but... three-dimensional!!! To address this aspect, we begin by discussing how to perform a "merely" two-dimensional frequency analysis, as is done in the case of image signals.

All the teaching material shown in the classroom, in the form of slides or additional material, has been collected in a single zip file, updated to May 28, 2025. Or you can still find the individual files by accessing the accordions of the lessons held, or through the online index online index

Questions in the past tests, with answers

In order to facilitate preparation for the exam, we report the links to the verification tests carried out both this year and in previous years