Signal Processing And Information Theory

This is a class for the Bachelor of Science in Bioinformatics. The course is in English, so this page is in that language. If you was looking for the page about AY 22-23, it's there.

The lesson cycle is over

This year I added new material to the topics covered, namely two-dimensional Fourier transforms, the basis for image processing in the spatial frequency domain, and three-dimensional Fourier transforms applied to X-ray crystallography, used to discover the shape with which proteins are folded. You can find these aspects in Google Drive (ask permission to access them)

All the other updated slides have instead been enclosed in a single zip file, either as is, or in a four-page-per-sheet format

OPIS code 5I43PU9I - I think you should need it for answering the questionnaire about the course. Please be clement happy smiley

Schedule, location and links

  • Tuesday and Friday from 11:00 until 13:00 (or 14:00 on Tuesday)
  • Classroom Psicologia II, Fisiologia Generale e Antropologia Farmacia e Medicina (CU026, E01PS1L084)
  • Google classroom link, to be used as a communication and storage tool

Study material

It consists in the slides shown in class, to add to

  • my book in the original or translated version (please report errors!)
  • the material linked in the classroom pages
  • other material proposed from time to time in the classroom
  • during the lesson cycle I updated the slides from the previous A.A. You can find them all enclosed in a single zip file, either as is, or in a four-page-per-sheet format, plus the document on Crystallography in a SP mood (ask for permission)

Past verification tests

In order to facilitate preparation for the exam, we report the links to the verification tests carried out both this year and in previous years

Progress of individual lessons

In addition to summarizing the topics covered from time to time, references to the support material used in class will be are also provided

  • tue 5/3, fry 8/3 - Course Overview: After the illustration of the reference telematic tools for the course, we introduce ourselves. A brief introduction to the nature of signals and their processing follows, in order to provide an overview of the objects we will be dealing with. Some applicative aspects of signal processing and information theory in the biochemical, medical, genetic, microscopic and structural fields are then outlined, up to a mention of the most recent applications. But, really, don't be afraid of that, I just wanted to pique your curiosity to study signal processing!
    • link to the 1st slide set Course Overview
    • some waveform and spectral analysis tools, such as Audacity and Dfilter
    • a new PDF with some aspects that were introduced at the blackboard but are not reported in the slides. The first issue is about binary notation
  • tue 12/3 - Signal and Systems: This series of slides covers the contents of chapter 1 in the Signal Basics book. After a brief introduction on the (co)domains of signals, on the family of Fourier analyzes and on the relationships between impulse response, convolution, frequency response and filtering, different classes of signals are defined, on the basis of their behavior over time and asymptotic, and some notes on common operations performed on signals, their combination and on graphs are provided. Then the more commonly used signals are illustrated.
  • fri 15/3 - Signal and Systems 2: a complex algebra review is provided and an in-depth analysis is undertaken on the role of the complex exponential in frequency analysis. Since signal processing is typically implemented as the transit of the signal itself through a linear operator, the characteristics of the latter are explained, and therefore the differences with respect to non-linear systems are highlighted.
Fourier series and signal spaces: After an introduction to phasors notation, the representation of periodic signals as an ordered set of Fourier coefficients is given, together with their reconstruction formula known as Fourier series. Then the alternative representations for real signals are given
  • tue 19/3 - Today there was no one in the classroom, not even the two students of the last time. I want to hope that it is just a slight influence - I also feel a little beaten! I think we could also recover today's lesson, maybe next week. If you think I go too quickly, or too slow, tell me!
  • fry 22/3 - Today we were three again! Have decided to recover the two hours lost last time one at a time, in the queue at the next two lessons happy smiley
Fourier series and signal spaces 2 After a short discussion about the effects of using only a limited set of coefficients, the section closes with the proof of the Parseval's theorem which gives a typical effect of the exponentials orthogonality property. Then we deal with the concepts of vector algebra when applied to signal spaces and, as a way to motivate the Fourier series validity, beginning to talk about the basis of representation of a vector space and the norm of a vector
After deepening the concepts of dimension of a vector space and of linear independence for its basis of representation, the concept of norm of a vector is introduced, followed by that of scalar product, and of the norm that it induces, by virtue of the existence of the Schwartz inequality which allows to define an angle between vectors. Therefore, having an orthonormal basis, the scalar product between vectors can be calculated on the basis of the respective coefficients, as happens in a Euclidean space, obtaining an identical definition of distance. These concepts are then applied to the spaces of periodic, energy and power signals, after having defined for them the formula that evaluates the inner product, and showing how many of the properties valid for signals are correlated in a simple way with the particularities of metric spaces. Furthermore, its shown how any linear transformation or operator can be thought of as the result of evaluating an inner product, so that many signal processing results such as Fourier transform, convolution, signal correlation, fall into this case.
Fourier transform and convolution: The extension of Fourier series analysis to non-periodic signals is introduced, the Fourier transform defined, as well as its anti-transform, and the relationship in between the Fourier series and the transform of a single period is illustrated. The concepts of cross energy, orthogonality and Schwartz's inequality are stated, as well as Parseval’s theorem for energy signals;
  • tue 26/3 - Fourier transform and convolution 2: a first list of properties for the Fourier transform is proved. The definition of the Dirac pulse allows to obtain an expression for the transform of constant and periodic signals, to define the impulse response of a physical system and the convolution integral. A graphical interpretation of the convolution integral evaluation was illustrated, showing the memory function realized by the impulse response, with many interactive tools introduced. Equivalence between a product in time and a convolution in frequency (and vice versa) is asserted, thus opening an alternative path to perform signal processing operations, such as to measure the frequency response instead of h(t). The concepts of linear phase, delay and cascade systems have been revised. But instead of analyzing a filter, we can also synthesize it, as discussed thanks to Falstad's tool
    • Test on the Falstad simulator how a periodic wave changes shape by altering only the phase of its Fourier coefficients
    • Experiment yourself with the convolution animations whose links are on page 34 of the 4th slide set
    • Experiment yourself with the Falstad's filter synthesis tool
  • fry 5/4 - Easter holidays ended! Today in four hours of lessons we have made good steps forward!
Fourier transform and convolution 3: After evaluating the transformation of a triangle, the effects of the product between signals over time and frequency were analyzed, thus discussing filtering and frequency response, as well as modulation and windowing. Finally, the impulses train is defined, and its spectrum assessed.
Sampling and digital signal processing: The theory that allows the transition from analog domain to digital domain is illustrated, showing how a signal with a limited bandwidth can be completely described starting from the knowledge of its samples only, provided that they are taken at a sufficiently high pace. The phenomenon of aliasing due to an incorrect sampling period is therefore explored, and the implementing techniques necessary to comply with the requirements are illustrated, without having to incur an excessive complexity of implementation.
  • link to the 5th slide set Sampling and digital signal processing
  • fry 12/4 - Last Tuesday I was unable to carry out lessons, we will see if and when to recover those two hours.
Sampling and digital signal processing 2: After mentioning how to make the quantization and binary coding processes, we alculated the DTFT as a zeta-transform value on the unit circle, allowing us to express the positive-frequency bandwidth of the spectrum as π as well as fc/2, or N/2 for the points of DFT, or 1/2 as normalized frequency. Finally, some programming resources were described. In addition, the intermediate test was sent to be done at home.
  • tue 16/4 - Sampling and digital signal processing 3: Considerations about DFT as a filterbank led us to a first look at Signal processing in Bioinformatics: we commented on a method introduced in 1998 to identify which genes were responsible for cell division, interpreting it from a signal processing perspective. Crystallography in a SP mood: This is the first year that I have lesson on this topic. The role of Fourier's analysis in X-ray crystallography is addressed, and since multidimensional transforms are better explained in two dimensions, the discussion begins from images. We talked about bidimensional DFT and basis functions, the frequency (or reciprocal) lattice, the 2D DFT values, and started to discuss about the visual evaluation of the properties of a 2D DFT, up to the convolution theorem
  • fry 19/4 - Crystallography in a SP mood 2: Continuing in the extension of frequency analysis to image signals, we extended to them the concept of filters defined in frequency, and reflected on how easy it is to remove periodic noise, masking the spatial harmonics caused by it.
After highlighting the relevance of phase information, we lay the foundations to treat the periodicity of the crystal. We define the 2D discrete impulse, and we extend its forklift property, while the impulses train now becomes a bed of nails, an instrument whose convolution with a limited support signal makes it periodic. The transform of the previous convolution is equal to the product between the transform of the signal, and that of a bed of nails (BON). Since the DFT of the latter is in turn a frequency BON, the periodization in space involves a frequency BON, the area of whose impulses area is given by the (frequency) samples of the signal spectrum, preceding its periodization.
We are therefore able to face the 3D transform, the meaning of a 3D integral, and that of the density that is its integrand function.
A periodic 3D density can be expressed in terms of its Fourier series, to whose coefficients correspond three indices (h,k,l) related to volumetric harmonics. The set of possible values for the latter is related to the coordinates of the nodes of a 3D reciprocal lattice. Some views are therefore shown for the basis functions of the Fourier's expansion series. The concept of impulse train is now further extended to the 3D case, giving it the name of box of stars (BOS)
At this point, the application of these concepts to the representation of crystals or rather of their 3D electronic charge density is extremely direct, and allows the definition of the structure factors.
It continues by facing the geometric aspects, such as the coordinate system and the Bravais lattices, as well as the indexing of Miller for the crystal planes, to find that these are the same indices used for the structure factors.
In crystallography, electronic density is obtained by calculating the Fourier series relating to structure factors, which in turn are obtained from the diffraction image. The argument winds through the topics of X-rays plane waves, scattering and diffraction of X-ray, diffraction by Bragg planes, varying the angle θ, diffraction from approximately positioned particles, intensity of a diffraction spot.
  • tue 23/4 - Crystallography in a SP mood 3: After a summary for those who were not there last time, we have reviewed the Miller indices multiple personality, identifying a set of parallel planes as a vector orthogonal to them, and expressing the coordinates of a vector touching the node in the reciprocal lattice which is identified by a triple hkl of volumetric frequencies indexes; the vector has a length reciprocal of that of the inter-planes distance. Then, the total extension of the reciprocal lattice is explained, as well as the rotation property of an n-dimensional Fourier transform. The Ewald's sphere makes us finally able to label the diffraction spots with the hkl triple of the volumetric harmonic frequencies, and to estimate the structure factors. The reciprocal lattice can be sampled when its nodes intersects the Ewald's sphere, and this happens as the crystal is rotated. Finally, we discussed some facts about bandwidth, dimensions, shape
  • fri 26/4 - Signal Processing in Bioinformatics 2: We discussed about the frequency analysis of the genome, aimed at detecting DNA coding regions within ORFs by FFT analysis, due to the repeated presence of codons with length 3. The sequence coding of the genome can be based on indicators, or on more compact encodings either based on tetrahedra or complex mapping. Some encodings have also been proposed for the codons sequence. A further topic concerns the spectral analysis of the amino acid sequence of proteins, carried out on the basis of their EIIP. This allows a functional role to be attributed to proteins based on a characteristic frequency revealed by the consensus spectrum method. But before of continue, we need to face a new slide set:
Random signals and Wiener's theorem: Since it is not possible to compute a Fourier transform for random signals, a statistical description of it must be adopted. The concept of a random process is therefore introduced, and how to calculate its ensemble and time averages is discussed, then the conditions for defining it as a stationary and ergodic process are explained. We therefore applied the concepts exposed by returning to address the determination of the SNR for the quantization noise according to the number of bits/sample adopted
  • tue 30/4 - Random signals and Wiener's theorem 2: After some examples on the equivalence between time and ensemble averages, a 2D random variable is extracted from the process, and the correlation and covariance between its components is discussed, both from a probabilistic and statistical point of view. The autocorrelation function defined for both random processes and deterministic signals is then introduced, as well as the intercorrelation between different signals, and links to convolution are clarified. The properties of autocorrelation are enunciated, and a few words are spent on the link between Pearson's coefficient and Schwartz's inequality. Wiener's theorem is a valuable tool for obtaining the spectral density for any type of signal, be it deterministic or random, as well as for opening an alternative path towards already known results. Examples of its application are given.
    • link to the Octave simulation on autocorrelation and more
    • OPIS survey - I received the following communication: invite students to fill in the OPIS questionnaire during the lesson, providing them with the OPIS code and dedicating 15 minutes of the lesson to filling it out. OPIS code: 5I43PU9I . Dedicated instructions are available for courses in English
  • fri 3/5 - Today I carried out a summary of the last two lessons, given that the audience present had lost them, and that whoever had followed them was absent. The only new topic was the adapted filter, as an additional aspect of the similarity between the convolution integral and correlation function between signals
  • tue 7/5 - As an immediate application of Wiener's theorem, we returned to the slides on bioinformatics applications, and discussed the properties of the autocorrelation of the genome, observing how it presents long-distance similarities, which support an evolutionary hypothesis of duplication, mutation and concatenation. Then we returned to the random signals slides, to study the power density at the filter output, in the case of the transit of certain and random signals. This time the bioinformatics application is that of FTIR spectrometry, which is based on an interferometer which in practice implements a comb filter. By moving a mirror, the autocorrelation function of the input signal is observed at the detector, and the power density of the source is obtained as a Fourier transform. By inserting the sample under measurement before the detector, it is thus possible to measure its absorption spectrum. Back to random signals, we completed the review of the statistical properties of a process at the output of a filter, and the effect of their sum or product. Finally, we defined the multidimensional Gaussian PDF, and showed how in this case incorrelation implies statistical independence. We then show how to obtain the covariance matrix if the r.v. are extracted from a Gaussian process
  • fry 10/5 - Spectral estimation by means of a periodogram has been explored. Information theory and biochemical applications: After formalizing the scenario of transmitting information from a source through a channel, the measure of information is derived, and the concept of entropy is defined, together with its bounds for a discrete source, without and with memory, finding that the entropy gives the minimum possible coding speed. An example of a finite states Markov source introduces the concept of conditional entropy. We then move on to continuous sources for which differential entropy is suitable for comparison between sources with the same variance. It is then shown that the Gaussian source has the greatest differential entropy, so that it is the preferred model in cases of partial knowledge. A very brief introduction to the topic of source coding, both without and with loss of information, concludes the first part
    • link to the 9th slide set Information theory and biochemical applications
  • tue 14/5 - Information theory and biochemical applications 2: Now is the time to operate on two different random variables, and define for them the joint entropy and two conditional entropies, as well as the more useful average mutual information I(X;Y), symmetric with respect to the variables, which can be evaluated by combining different types of entropy, and which leads to the definition of channel equivocation and noise entropy. The section concludes with the definition of channel capacity and related properties
  • fri 17/5 - Information theory and biochemical applications 3: The last part deals with the application of information theory concepts to biochemical signaling systems, showing how the variability of chemical reactions can make the biological signal ambiguous, and cause erratic cellular behavior. Average mutual information I(X;Y) has very useful propertied, but its value can be limited due to a small number of states for the signal. The section concludes with some comments on how to measure the capacity of biological channels and how to apply the data processing inequality, bias, and rate-distortion theory to them
  • tue 21/5 - Today we did not add new topics, in the classroom there was the student who was absent the last time, so we took the opportunity to review the contents of the last lesson
  • fri 24/5 - The last topic that is treated are filter devices. Let's go back to the slides on Sampling and digital signal processing, to whose last section the topic of the batch filtering is addressed. The numerical filter via DFT is discussed, the discrete and circular convolution is introduced, and the convolution between sequences of finished duration is described, as well as the convolution between an input sequence of indefinite duration, and a discrete impulse response with finite length. So, let's move on to the slides on Filters. Analog filters are briefly discussed, introducing terms still valid to describe different classes of numerical filters, since the result of the synthesis techniques of analog filters can be converted into a Zeta transform which describes the transfer function of a numerical filter. Digital filters are a class of filters that operate according to a calculation architecture rather than a circuit, and represent the effect of natural causes, or be defined by man. The transversal filter (or FIR) is then discussed, both from the point of view of analysis and synthesis, highlighting some aspects of the implementation, as well as the particular case of the moving average filter.
  • tue 28/5 . The last lesson of the course runs out the brief exposure regarding the filters, treating the first-order FIR filter with its variants of differentiator and comb filter, followed by the first order IIR filter, with the particular cases of integrator, and exponential moving average. Then we move on to the general problem of the synthesis of FIR filters of any order, either with the method of sampling the windowed impulse response, or to design the frequency response first, and then sample it to obtain hn through an IDFT. Finally, the synthesis of IIR filters is faced, whose input-output relationship is described by an equation to the finite differences, which corresponds a zeta transform H(Z) equal to a ratio between polynomials, which in turn is reflected in a canonical computational architecture. The filter coefficients can therefore be obtained starting from those of the transfer function H(S) obtainable for an analog filter through established methods, after applying a change of variable, of which three typical example cases are provided.
    • link to the 2022 [1] [2] and 2023 previous intermediate and final tests
  • fri 31/5 - The last evaluation test has been carried out, here we find the trace, complete with the relative solution