Riemannian Geometry can be safely tagged as a "revolutionary" theory in mathematics. Firstly, the theory put forward a radical view of space and geometry by generalizing the "flat" Euclidean space to curved manifolds. Later, it was the basis for a major Physics revolution when Albert Einstein made use of the theory to explain space and gravity which we know as the "Theory of General Relativity".

There have been uses of Riemannian geometry in Machine Learning as well. In this article, we will learn about Geodesic Regression which is an extension of Linear Regression to Riemannian space. It is assumed that readers have a good understanding of the least squares linear regression model and some knowledge of Riemannian geometry. Before diving into the subject matter, it is worth discussing the basics of geometry. The explanation will be in simple terms just required for understanding the subject matter and we are not going into fine details of geometry. Hence we may have a casual explanation that may not conform precisely to standard mathematical definitions.

#### Let's talk about space!

In school and high school, we are taught tons of geometric theorems built on the foundation of Euclidean space. Euclidean geometry was the only thing prevalent until the genius of Gauss and Riemann hinted that things can be different. They put forward the theory of curved space which does not necessarily require to hold Euclidean properties. Given that the underlying foundation is broken, the theorems no longer hold for curved space (manifold). Thus, it requires us to rethink everything we have learnt so far.

Euclidean space is well-known, however, let's define it technically: A space of real numbers of finite dimension (a1,a2,a3,…an) with a well defined inner product(dot product). We commonly visualize it as a vector space* spanned by linear mutually perpendicular axes. In contrast, a Riemannian space is a span of non-linear axes. A common example is the surface of a sphere which is a span of the lines of latitude and longitude. Things seem strange when we dive into the geometry of curved space. For example, no such notion of vector space exists, Pythagorean theorem is no longer valid and the lines of longitude which seem parallel at the equator eventually converge and meet at poles. Having said that, let's go on and explain some of its concepts.

*vector space is a space that is closed under vector operations (addition, subtraction, etc.)

**Distance**

Calculating distance between points is a common geometrical operation. In Euclidean space, the Pythagorean theorem allows us for easy distance calculation. Now let's think about finding distance between two points in a sphere. It becomes obvious that the Pythagorean theorem does not work here once we realize that the curve connecting the points cannot be straight. However, if we zoom into a sufficiently small segment on the curve, it can safely be considered straight and we have a locally defined vector space. The distance between points is an integral of this small segment.

Note:Since the co-ordinate axes can be non-orthogonal in the local vector space, calculating length of the small segment is not straightforward. We need to make use of something called "Riemannian metric" for distance calculation. If interested, the following link has covered good details of Riemannian metric: https://www.ime.usp.br/~gorodski/teaching/mat5771/ch1.pdf .

**Geodesic**

This is the generalization of the notion of "straight line" to curved manifold. A geodesic is the shortest path between two points in space, the "straightest possible path" in a curved manifold. As depicted in Figure 2, there can be an infinite number of paths connecting point * P* to

*. But the shortest one, highlighted in red, is the geodesic. The mathematical notation for geodesic distance is*

**Q**

**d(P,Q)***.*

The thing to note here is that we cannot connect points P and Q by a straight line to get the shortest path between them because the line will lie in the inside of the sphere. Remember, to define a curve every point on it has to lie on the manifold it is defined at. And anything inside of the sphere does not lie in the spherical manifold.

**Tangent Space**

It is the space spanned by tangent vectors at a point in a manifold. This can simply be visualized as a tangent plane at a point in a sphere. Please note that we do not have vectors in Riemannian manifolds. Therefore, tangent space is not a part of it. It is a separate Euclidean space and tangent vectors in it are a functional mapping from points in Riemannian manifold.

Please note that vectors can only be defined on the tangent space at a point, not on the manifold.

**Exponential Map**

This is an important concept for this article. Let's recall the fact that translating one point to another via the shortest route in Euclidean space simply requires vector addition. Since it is already mentioned that the notion of vector space does not exist in Riemannian manifolds, any vector operation (in this case, addition) does not make sense.** **We need a different mathematical operation to be able to get from one point to another. This is where exponential map comes into the picture.

We will take help from tangent space at a point in a manifold * M*. As depicted in Figure 3, we can have a flat tangent plane at a point in a spherical manifold. The tangent plane is a vector space and we can carry out translations here. For instance in Figure 3,

*is a tangent space at point*

**T_pM***of a spherical manifold and we have translated point*

**P***by a vector*

**P**

**v****in the tangent plane. Now the point is to map the vector back to the spherical manifold like the mapping of**

*to point*

**v***in the sphere. This mapping is called Exponential Mapping and finally, we are able to travel from*

**A***to*

**P***on the manifold, and the path we get is the shortest path between the points.*

**A***Image by author*)

Please note that there are mathematical formalities involved with the existence of exponential mapping which will not be discussed here. In short, finding the shortest path between points in a manifold is basically 3 operations. (1) Realize a tangent space at a point, (2) translate the point in the tangent space by a vector, (3) map the translated point back to the manifold. This way, we will end up with a geodesic curve on the manifold. Also, the mapping is defined such that the norm of the vector * v(||v||)*is equal to the geodesic distance

*. Mathematically exponential mapping is defined as:*

**d(p,A)**The mapping function **Exp(p,v)**** **depends upon the manifold and can be determined analytically.

Exponential map in Riemannian space is similar as equation of straight line in Euclidean space.

**Log Map**

This is an inverse of an Exponential map and maps points from manifold to the tangent plane. In Figure 2, this is a mapping from point * A* to vector

*. Since the norm of vector*

**v***is equal to the geodesic distance, we can infer that |*

**v***It is mathematically defined as:*

**Log(p,A)| = d(p,A).****Parallel Transport**

There is another thing we need to take care of when we define tangent space and vectors in tangent space. In Euclidean geometry, a vector with the same magnitude and direction carries the same meaning at any point in space. In contrast, for curved manifolds vectors and tangent spaces are defined only locally, thus we cannot have the same global interpretation.

To understand this, let's bring spherical manifold into the picture again as in Figure 4. We can imagine the sphere as our Earth and **A** and **B** as two places on it. For a person living at point **A**, vector **V_A **is a horizontal direction. But when we go on and define the same vector at point **B, **it becomes a vertical direction. To define horizontal direction at point B, we need to modify the original vector (in this case, rotate it). Vector **V_B **is a modification in vector **V_A **such that **V_B** holds the same meaning for point **B **as **V_A** holds for point **A. **This modification is called parallel transport and we call it **"**Vector **V** is parallel transported from point **A** to **B**"**. **The mathematical notation for parallel transport is as shown in Figure 4.

*Image by author*)

Parallel transport is a crucial concept in understanding curved surfaces and defining curvatures in a Riemannian Manifold. However, for our purposes here, we stick with our simple definition. Like exponential map, the expression for parallel transport is dependent on the nature of the manifold itself.

Now, we are equipped with fairly sufficient mathematical concepts to dive into the main topic. Let's review the case of linear regression first*.*

**Linear Regression**

Given a parametric variable **X **and the corresponding dependent variable * Y= (y_1,y_2,y_3….,y_n),* the goal of linear regression is to represent

*as a linear function of*

**Y***.*

**X**The parameters *w* and *b* are determined by minimizing the least-squares error:

We have a closed-form solution for parameters *w* and *b* that can be determined analytically by the use of linear algebra. We use a similar approach to define geodesic regression parameters as a least-squares minimization task in the subsequent section.

**Geodesic Regression**

The motivation behind GR is that not all data points in real-life lie in Euclidean space. For example for a spherical manifold, a data point is represented by points on the surface of a sphere. However, any manifold can still be realized in Euclidean space of higher dimensions just like a two dimensional spherical manifold can be realized as a three dimensional surface in a Euclidean space. Thus, non-linear regression in Euclidean space can model the data points residing on a manifold. But, such models will be rather complex because of dimensionality increment and complex non-linear relationship. Model complexity can lead to high variance.

For data residing in a non-Euclidean manifold, it is natural to model it with GR. Let's assume an independent variable * X* in Euclidean space and dependent variable

*in a manifold*

**Y***. The GR model is the generalization of the LR model as:*

**M**Note that for Euclidean space, exponential map is just vector addition. Thus, the equation above reduces to LR model when * M* is a Euclidean space. The variable

*in the above equation is a geodesic and hence the name Geodesic Regression. The parameters*

**Y***and*

**p**

**v***are determined by the following minimization:*

Unlike LR where we had a closed-form solution for the error minimization task, here we need to perform gradient descent on minimizing the error function. Given the error function,

the gradient should be computed with respect to parameters * p* and

**v**

*.*The gradient of geodesic distance with respect to one of the points is just the log map (let's take that for granted for now):

The challenge here is to derive the gradient of exponential map * Exp(p,xv). *Let us first develop an intuition for this.

* M* is a Riemannian manifold containing point

*. The left part of Figure 5 refers to differentiating*

**p***by parameter*

**Exp(p,xv)***. Red lines are exponential mappings (geodesics) with the same translation vector*

**p***but on different points*

**v***. The result of differentiation is a vector field denoted by blue arrows across the geodesic lines. The right part of the figure explains differentiation with respect to parameter*

**p***keeping point*

**v***constant. The result is a vector field denoted by blue arrows. The resulting vector fields in both cases are combinedly known as*

**p****Jacobi Fields**.

**Jacobi Fields**can also be understood as a separation vector between nearby geodesics.

**Alright, how are they computed?**

Like Exponential mapping, Jacobi Fields are properties of the manifold itself. Thus, they are computed analytically and are dependent on the manifold. Note that the derivation of Jacobi Fields requires some core concepts of Riemannian geometry and is out of the scope of this article. Details can be found in standard online sources.

Now that we have figured out determining gradients of geodesic distance and exponential map, we can go on and compute the gradient of the error function and carry out gradient descent. The resulting parameters are our parameters of the GR model.

So we see, not a lot of things for GR once we are well-known about LR. Still, everything we discussed is theoretical, and if we are learning about GR for the first time, things can get really confusing regarding implementation, especially around Exponential maps and Jacobi Fields. So let's walk through a simple example.

**Implementation on a Spherical Manifold**

For LR, we required a set of independent and corresponding dependent variables. We need one additional information for GR: the nature of manifold where dependent variables lie. Let's take an example of a two dimensional spherical manifold embedded in a 3-dimensional Euclidean space.

First of all, let's define the expressions for geodesic distance, exponential map, log map, and parallel transport for a spherical manifold.

(A unit sphere with center at (0,0,0) is assumed where p and q are points on it).

Below is the python code for function definitions:

As mentioned before, the regression model is

with error function

where p and v are model parameters. Computing gradient involves calculating Jacobi fields for a sphere and can be approximated by following equations.

The exact solution for these gradients can be found in the paper **Robust Geodesic Regression** which is referenced below.

The final step is updating the parameters which are a little tricky. Please note that the gradient we obtain is a vector. In LR, we can simply scale the gradient vector by learning rate and subtract it off from the current parameter value. But as discussed already, **p** is a point on a manifold and we have no such vector operations. Hence we have to apply exponential mapping to update parameter **p**.

For updating parameter **v, **let us recall Figure 3: **v **is a vector in tangent space itself. So, this can simply be updated by vector subtraction.

However, there is one more thing to take care of. Referring to Figure 3, we see that vector **v **is defined at the same point where **p **lies. So, when we update **p** to **p_new, **the new vector **v_new **must lie at point **p_new. **But since the updated vector **v' **still lies at the same old point **p**, we need to find a way to define **v' **at new point **p_new**. This can be done by parallel transporting updated vector **v'** from **p** to **p_new.**

This completes our parameter update, to implement this in python, let us first generate synthetic data on a spherical surface and implement GR on the data.

Now that we have independent variable **x** and corresponding data points **y**, we are ready to learn a GR model to fit the data:

The decreasing value of error function (normalized over the number of data points) for 100 iterations is as below:

*Image by author*)

The obtained parameters values are:

Which are pretty good estimations of the true values:

The data points and resulting geodesic curve as described in the example above can be visualized as in Figure 7:

*Image by author*)

The red dotted points are noisy data points on the surface of a sphere and the red solid curve is the geodesic fitted to the data points. This is not the exact scenario for the example in the text but very close to it.

**Conclusion:**

In this article, we extended our understanding of regression from Euclidean to Riemannian space. More generally, we approached a machine learning problem with a different perspective. We discussed an example of GR on a spherical manifold but more practical examples lie on shape analysis and computer vision tasks, especially on medical imaging and biometrics. Machine learning algorithms for data lying in Riemannian space are being researched and developed for those applications.

Moreover, we not only explored a new regression technique but discussed a new geometrical theory itself. The adoption of Riemannian geometry in machine learning is still in its nascent stage but its effectiveness is promising.

**References:**

1. Geodesic Regression on Riemannian Manifolds. https://www.sci.utah.edu/publications/fletcher11/Fletcher_MFCA2011.pdf

2. Robust Geodesic Regression: https://arxiv.org/abs/2007.04518

3. https://ronnybergmann.net/mvirt/manifolds/Sn.html

4. A nonlinear regression technique for manifold valued data with applications to Medical Image Analysis. https://www.cv-foundation.org/openaccess/content_cvpr_2016/app/S18-44.pdf