Summer Internship at IBM as a Data Scientist

My Summer internship at IBM as a Data Scientist has ended gracefully. I am very grateful for the opportunity. During the time, IBMers from various positions and background were asked to contribute filling the educational gaps exist in Data Science and Big Data by the team behind BDU and DSWB. Undoubtedly, BDU is becoming more well-respected among both beginners and professionals outside of these areas wanting to step in Data Science and Big Data and advance their careers.

I helped creating some basic to advanced courses in BDU that all of them will become available any time soon and I will update this post accordingly. Among them, I have enjoyed creating tutorials and supervising a junior intern for Apache Spark Makers Build which was a competition hosted by IBM promoting Apache Spark, Machine Learning and other applications. That comes with various flavours as follows;

Graph computations using GraphX to analyze Amazon datasets with Scala and verify some of the claims made by J. Leskovec et. al. See this tutorial here.

Analyzing and visualizing Yelp dataset challenge with Apache Spark SQL with PySpark. See this tutorial here

Analyzing and visualizing Bitcoin transactions here with PySpark.

Only need to fire up your workbench and enjoy.

General Monty Hall Simulation

The idea of this post came up to my mind last night. I’m assuming you have already heard about the famous Monty Hall Problem (if you haven’t, watch the quicker intro in Numberphile clip). Here I’d like to demonstrate a simulation taking the general case into account, i.e. assume we have n bins (boxes or doors, whatever) and there’s a prize in one of them and you don’t know which one has the prize. You pick one of those bins at random and since I’m the host and I know where the prize is located, I’d choose k boxes and discard them from the game (obviously not the prize and not your first choice, so 1 \leq k \leq n - 2). Then, I’d ask you whether you want to switch to another box or want to stick to your first choice. Finally, I reveal your choice and see if it contains the prize or not.  It’s not hard to compute the probability of winning if you do switch. That’s in fact,

P(\text{Winning if switching}) = \dfrac{n-1}{n(n-k-1)} > \dfrac{1}{n}=P(\text{Winning if not switching})

Thus, the best strategy is to always switch!

Now, I’d like to confirm this with data by doing simulation in Python.

Note: All codes are available in my github and in nbviewer.

So first, I create a MontyHall class representing my simulation object as follows:

#!/usr/bin/env python3

import numpy as np
from numpy.random import RandomState
from random import sample

class MontyHall(object):
    Creates simulation game object

    n_games:    int         Number of games
    n_bins:     int         Number of bins
    n_discards: int         Number of discard options. between 1, n_bins-2
    switch:     boolean     Switch or not
    seed:       int         Seed number
    def __init__(self, n_games, n_bins, n_discards, switch=False, seed=123):
        self.n_games = n_games
        self.n_bins = n_bins
        if 1 <= n_discards <= n_bins-2:
            self.n_discards = n_discards
            raise ValueError("n_discards must be between 1 and n_bins-2")
        self.switch = switch
        self.seed = seed

    def set_prize(self):
        """ Set prize randomly for each game with fixed
             RandomState to get same numbers in different calls
        prng = RandomState(self.seed)
        return prng.randint(0, self.n_bins, self.n_games)

    def player_first_choice(self):
        """ Player first choice in each game with fixed
            RandomState to get same numbers in different calls
        prng = RandomState(2 * self.seed)
        return prng.randint(0, self.n_bins, self.n_games)

    def player_final_choice(self):
        """ Player final choice after discarding some options by host"""
        if not self.switch:
            return self.player_first_choice()
            opts = list(range(self.n_bins))
            arr = np.vstack([self.player_first_choice(), self.host_discard()])
            final = self._col_choice(opts, arr, 1)
            return final

    def host_discard(self):
        """ Host choice for removing n_discards bins"""
        if self.switch:
            opts = list(range(self.n_bins))
            arr = np.vstack([self.set_prize(), self.player_first_choice()])
            disc = self._col_choice(opts, arr, self.n_discards)
            return disc

    def _col_choice(self, opts, arr, n_disc):
        """ Possible choices per game"""
            res = np.apply_along_axis(
                lambda x:
                sample([v for i, v in enumerate(opts)
                if i not in set(x)], n_disc),
            return res
            print(self.n_discards, 'must be less than', self.n_bins - 1)

    def score(self):
        """ Calculate the number of wins"""
        return np.sum(self.player_final_choice() == self.set_prize())

    def proba(self):
        """ Compute the winning probability"""
        return self.score() / self.n_games

    def __str__(self):
        if not self.switch:
            return 'Probability of winning if not switching: %.4f' \
                % self.proba()
            return 'Probability of winning if switching: %.4f' \
                % self.proba()

Now, for example, we can confirm the famous Monty Hall result by defining

def simulation_proba(n_games, n_bins, n_discards, switch):
    """ Compute simulation probability of n_games with n_bins
        and n_discards options
    g = MontyHall(n_games=n_games, n_bins=n_bins, n_discards=n_discards,
    return g.proba()

and then calling

print(simulation_proba(100000, 3, 1, switch=True))

We’ll get 0.6665 \simeq \dfrac 23 as the winning probability if you switch, if you were to play the game 100,000 times and record all the results for the case where n = 3, k = 1.

Let’s see the results for n = 4, k = 1, 2

print(simulation_proba(100000, 4, 1, switch=True))
# 0.3746
print(simulation_proba(100000, 4, 1, switch=False))
# 0.2500
print(simulation_proba(100000, 4, 2, switch=True))
# 0.7500
print(simulation_proba(100000, 4, 2, switch=False))
# 0.2499

Okay! to better see the results, let’s plot the probabilities against the number of bins.


import matplotlib.pyplot as plt

def simulation_2dplot(n_games, max_bins, n_discards=1, switch=True):
        """ Simulation 2D plot"""
        X = np.array(range(n_discards+2, max_bins))
        Y = np.array([simulation_proba(n_games, b, n_discards, switch)
                    for b in X])
        plt.plot(X, Y, linestyle='-', color='b', lw=2)
        plt.xlabel('Number of Bins')
        if switch:
            plt.ylabel('Winning Probability after switching')
            plt.ylabel('Winning Probability if not switching')
        plt.title('Monty Hall Simulation with %d games and %d discards'
                % (n_games, n_discards))
        plt.ylim(0.0, 1.0)
        plt.savefig('simulation_2dplot.png', dpi=300)

simulation_2dplot(n_games=100, max_bins=101, n_discards=1, switch=True)




or even with 20 discard options!




Monty Hall Surface

Now, let’s see what the 3D surface plot looks like.

from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter

def simulation_3dplot(n_games, max_bins, max_discards, switch):
        """ Simulation 3D plot"""
        X = np.array(range(3, max_bins))
        Y = np.array(range(1, max_discards))
        X_grid, Y_grid = np.meshgrid(X, Y)
        triu_idx = np.triu_indices(n=max_discards-1)
        X_grid_utri, Y_grid_utri = X_grid[triu_idx], Y_grid[triu_idx]

        vect_simulation_proba = np.vectorize(simulation_proba)
        Z = vect_simulation_proba(n_games, X_grid_utri, Y_grid_utri, switch)
        nZ = np.zeros((max_discards-1, max_discards-1))
        nZ[triu_idx] = Z

        fig = plt.figure(figsize=(8, 6))
        ax = fig.gca(projection='3d')
        surf = ax.plot_surface(X_grid, Y_grid, nZ, rstride=1, cstride=1,
                               cmap=cm.coolwarm, linewidth=0,
        ax.set_zlim = (0.0, 1.0)
        ax.set_xlabel('Number of Bins')
        ax.set_ylabel('Number of Discards')
        if switch:
            ax.set_zlabel('Winning probability after switching')
            ax.set_zlabel('Winning probability if not switching')
        ax.set_title('Monty Hall Probability Surface for %d games' % n_games)

        fig.colorbar(surf, shrink=0.5, aspect=5)

        fig.savefig('3d_simulation.png', dpi=300)

simulation_3dplot(100, 11, 9, switch=True)


Restaurant Revenue Prediction with BART Machine

In this post, I’d like to show you how to use the newly written package on Bayesian Additive Regression Trees i.e. BART Machine for Restaurant Revenue Prediction with R. The datasets are part of the passed Kaggle competition that can be found here.

What is BART?

BART is the Bayesian sibling of Random Forest. For some applications, Bayesian approaches with probabilistic assumptions work better than pure algorithmic ensemble of trees due to underlying prior assumptions. BART utilizes prior assumptions for parameters estimate and splits that occur in classic tree models. There are multiple distinctions between BART and Random Forest. One is the user defined ability to set generalized Bernoulli priors on features to be selected for each split, whereas in Random Forest, this prior is uniform on features. The other ones are having multiple hyperparameters controlling the growth of trees more than Random Forest and the ability to find explicit interactions among features, which can hardly be found in general Machine/Statistical learning algorithms/models.

BART in action

Speaking of competition, TFI has been one of the most unusual Kaggle competitions so far and has confused and hurt many contestants as well as myself because of the following reasons:

      Great disparity among the number of training and testing samples. The training set has 137 samples with 37 obfuscated features out of 42 with the


    as the response variable. The test set has 100,000 samples (poisoned data and robust model indications!).
    It was very hard to get consistent cross-validation scores even with repeated CV.
    Almost all of the features are weak and some are incompatible in train and test sets.


    of the top seed Kaggle Masters did poorly in this competition which adds to its strangeness and perhaps if you knew less, you could have been more successful in this competition!

So, let’s briefly look at the data first;

data <- read.csv('train.csv', sep=',', stringsAsFactor=F)
test <- read.csv('test.csv', sep=',', stringsAsFactor=F)
revenue <- data$revenue
train <- data[,-ncol(data)]
all <- rbind(train, test)

As you go through exploring all dataset, you will figure out many odd cases, like the feature City, the number of unique cities used in training set length(unique(data$City)) is 34 whereas the number of unique cities used in test set is 57. You can check the forum to see more odd cases. Of course, these cases could happen in real data science and everyone should now how to handle them. This post, however, does not deal with that kind of behaviors.

By plotting the histogram of revenue, one can see its skewness easily, so the natural log(revenue) better demonstrates its (nearly) log normal distribution and perhaps suggests existence of outliers.

      main="Histogram of Revenue",


To plot the correlations among (numeric) features and the response, we can use the corrplot package

correlations <- cor(data[,6:43])
corrplot(correlations, method="ellipse", order="hclust")


Before going further, I’d like to emphasize that when the data is small, it’s very likely to find some seemingly good complicated interactions among features, for example, by looking at the p-values for a simple linear regression and by tweaking couple of significant ones, I found a (good-turn-out-to-be-useless) interaction between logRevenue~I(P6*P20)+P28+I(P28^2)+Age+I(Age^2) , including Age of the restaurant, which at the time, I thought it might be a good feature to add, but the deserved winner argued the opposite!

Now, let’s dig into bartMachine. First, I’m going to throw out the incompatible features including Open.Date, City, City.Group, Type as well as Id to make our training and testing sets unified and initialize the bartMachine as follows;

SEED <- 123
options(java.parameters="-Xmx5000m") # must be set initially
y <- log(revenue)
X <- all[1:nrow(train), -c(1,2,3,4,5)]
X_test <- all[(nrow(train)+1):nrow(all), -c(1,2,3,4,5)]
bart <- bartMachine(X, y, num_trees=10, seed=SEED) # default num_trees might be excessive

The results are

bartMachine v1.2.0 for regression

training data n = 137 and p = 37
built in 0.3 secs on 4 cores, 10 trees, 250 burn-in and 1000 post. samples

sigsq est for y beforehand: 0.159
avg sigsq estimate after burn-in: 0.19178

in-sample statistics:
L1 = 41.5
L2 = 22.15
rmse = 0.4
Pseudo-Rsq = 0.2953
p-val for shapiro-wilk test of normality of residuals: 0.08099
p-val for zero-mean noise: 0.97545

We can also see the effect of changing the num_trees parameter by

                  tree_list=c(seq(5, 50, by=5)),


So perhaps num_trees=20 works better with less out-of-sample error.

As, bartMachine is essentially doing MCMC and heavily using Gibbs sampler, checking convergence seems necessary (see the above paper for more details);

bart <- bartMachine(X, y, num_trees=20, seed=SEED)


We can also check mean-centeredness assumption and heteroskedasticity of the data by the QQ-plot



One of the main features of bartMachine is showing variable importance via permutation tests

var_selection_by_permute(bart, num_reps_for_avg=20)


Moreover, bartMachine is capable of finding explicit interactions among feature

interaction_investigate(bart, num_replicates_for_avg=5)


However, since we want a simple model to start and score well in the poisoned test set, focusing too much on interactions can hurt a lot, as happened to me by inevitable overfitting. Now, I’m going to pick P20, P17, P6 and retrain my model and perform a CV and finally submit it.

nX <- X[, c("P6", "P17", "P28")]
nX_test <- X_test[, c("P6", "P17", "P28")]
nbart <- bartMachine(nX, y, num_trees=20, seed=SEED)
nbart_cv <- bartMachineCV(nX, y, num_tree_cvs=c(10,15,20),
                          verbose=TRUE) # bartMachine CV win: k: 3 nu, q: 3, 0.9 m: 20

log_pred <- predict(nbart_cv, nX_test)
pred <- exp(log_pred)
submit <- data.frame(Id=test$Id, Prediction = pred)
write.csv(submit, file="bartMachine_P6P17P28.csv", row.names=F) 

The result of this submission, is 1731437.14026 in public and 1839841.77388 (ranks around 700 out of 2257 teams) in private leaderboards.

Rank below 100 out of 2257:

To improve our model and make it more robust, we should remove outliers by deciding what threshold to use perhaps with some trial-and-error first, though there’re many methods that can be helpful, which is not within the scope of this post. Looking at the log(revenue) distribution should suggest where you should put the threshold. Doing so, it can hugely boost your private leaderboard ranking to position around 80 with score 1786605.94364 which is a fairly good incentive to explore bartMachine more.

You can find the codes including the improved model and plots in my github. Enjoy!

P.S. During the competition, the BAYZ,M.D team managed to make a perfect overfit on the public leaderboard with interesting and creative methods. You can read more about that here.

Vector Bundles, Locally Free Sheaves and Divisors on a Curve

In this post, I’ll be summarizing the basics of the correspondence between vector bundles, locally free sheaves and divisors on a smooth curve (defined over an algebraically closed field k of characteristic zero) together with some of their individual properties.

Locally free sheaves and Vector bundles:

Proposition 1: a) A coherent sheaf \mathcal{E} on a curve C is locally free \iff the fibers (stalks) \mathcal{E}_p are free at every point p \in C.  (note that the statement is local.)

b) A subsheaf of a locally free sheaf is locally free. (Use the fact that local rings of a (smooth) curve are DVR, hence PID and a submodule of a finitely generated free is free)

c) A non-zero map from a rank one locally free sheaf to a locally free sheaf is injective. (If there is a non-zero kernel, by b) it is locally free of rank one, then the cokernel will be a torsion sheaf injecting to a locally free sheaf!)

d) Let \mathcal{E} be a locally free sheaf of rank r and \mathcal{E}' a subsheaf of rank r'. There exists a subsheaf \mathcal{E}'' or rank r'' containing \mathcal{E}' s.t. \mathcal{E}/\mathcal{E}'' is locally free. In particular, if \mathcal{E}' is the maximal (w.r.t inclusion) the quotient is already locally free. (Think how to kill the torsion!)

Theorem: There is a natural one-to-one correspondence between vector bundles (or rank r) and locally free sheaves (of rank r) on C.

Idea: Given a vector bundle, the sheaf of sections is the corresponding sheaf. The converse is a little tricky. Given a locally free sheaf \mathcal{E}, one can define a vector bundle as follows

E:= \{ (p,t)|\; p \in C, \;\; t \in \mathcal{E}_p/{\mathfrak{m}_p \mathcal{E}_p} \}

where \mathfrak{m}_p is the unique maximal ideal of the local ring \mathcal{O}_p.

Remark: Subsheaves of a locally free sheaf do not necessarily correspond to the subbundles of its associated bundle. The point is that, injectivity of the map of locally free sheaves may fail to be injective when it is reduced modulo the maximal ideal of some point.

Lemma 1: A non-trivial global section of a vector bundle E correspond to a non-zero map of sheaves \mathcal{O} \to \mathcal{E} where \mathcal{E} is the sheaf associated to E.


A Weil divisor on C is a finite formal sum of points with integral multiplicities. If f is a rational function on C the divisor corresponding to f is defined as (f)=\sum \text{ord}_p(f) p.

A Cartier divisor on C is given by a covering U_i of C together with functions f_i \in \mathcal{O}(U_i) s.t. f_i/f_j are  invertible on U_i \cap U_j. A principal divisor is given by the cover consisting of C alone and a function on C (global section of \mathcal{O}_C.)

Given a Cartier divisor, one can associate a Weil divisor to it by considering on each open set U_i the zeros minus the poles of (f_i) and this is well-defined, since f_i/f_j is invertible on U_i \cap U_j. Conversely, given a Weil divisor D, one can construct a Cartier divisor by choosing open sets that contain at most one of the point on the support of D and functions that vanish at these points with the assigned multiplicities. Therefore, these two seemingly different notions are equivalent over a smooth curve C.

Line Bundles (locally free sheaves of rank one) and Divisors:

Given a Cartier divisor, one can define a locally free sheaf of rank one by taking the trivial sheaf \mathcal{O}(U_i) and gluing them by the isomorphisms f_i/f_j on U_i \cap U_j.

Conversely, given an invertible sheaf and a trivialization \{ U_i, f_{i,j} \}, one can define a Cartier divisor as follows;  take an arbirtary open set, say U_0 and define a Cartier divisor with (U_i,f_{i,o}), as f_{j,o}=f^{-1}_{0,j} and f_{0,j}f_{i,0}=f_{i,j} which is a unit on U_i \cap U_j.

Definition: Let D=\sum n_p p be an effective divisor. Define the sheaf \mathcal{F} on the support of D by \mathcal{F}_p=k^{n_p}. Now, define the skyscraper sheaf k_D as the extension by zero outside of D.

Lemma 2: A line bundle \mathcal{L} corresponds to an effective divisor D if and only if \mathcal{L} has a non-zero global section (by lemma 1, there is a non-zero map of sheaves \mathcal{O} \to \mathcal{L} which is injective by Proposition 1, b)) In this case, one has the following short exact sequence of sheaves

0 \longrightarrow \mathcal{O} \longrightarrow \mathcal{L} \longrightarrow k_D \longrightarrow 0

one then write \mathcal{L}=\mathcal{O}(D).

Riemann-Roch Theorem: If \mathcal{L} is an invertible sheaf (line bundle), then the Euler-Poincare characteristic can be computed as


where g=h^1(\mathcal{O}) is the genus of the curve.

Proof: First, assume that \mathcal{L} corresponds to an effective divisor D, then by lemma 2, there is a short exact sequence, thus the associated long exact sequence gives rise to \chi(\mathcal{O}(D))= \chi (\mathcal{O})+ \chi (k_D). Since k_D is a skyscraper sheaf, its support is a zero-dimensional set, hence h^1(k_D)=0, \chi(k_D)=h^0(k_D)=\deg D so the result.

In the general case, assume that \mathcal{L} corresponds to D-D' with both D,D' effective. Then \mathcal{O}(D-D')=\mathcal{O}(D) \otimes \mathcal{O}(-D') where \mathcal{O}(-D')=\mathcal{O}(D')^*. Indeed, \mathcal{O}(-D') is a locally free sheaf so is flat and tensoring the above short exact sequence with \mathcal{O}(-D') one obtains

0\longrightarrow \mathcal{O}(-D') \longrightarrow \mathcal{L} \longrightarrow k_D \otimes \mathcal{O}(-D') \longrightarrow 0

Therefore, \chi(\mathcal{L})= \chi (\mathcal{O}(-D'))+ \chi(k_D \otimes \mathcal{O}(-D')). On the other hand, replacing D by D' in the above sequence and tensoring with \mathcal{O}(-D') and do the same thing as above leads to the desired formula.

Lemma 3: Given a locally free sheaf \mathcal{E} of rank r there exists a short exact sequence of sheaves

0 \longrightarrow \mathcal{L} \longrightarrow \mathcal{E} \longrightarrow \mathcal{E}' \longrightarrow 0

where \mathcal{L} is an invertible sheaf and \mathcal{E}' a locally free sheaf of rank r-1.

Using lemma 3 and induction, one can prove the following formula for the calculation of \deg(\mathcal{E}_1 \otimes \mathcal{E}_2) where \mathcal{E}_1, \mathcal{E}_2 are locally free sheaves of rank r_1, r_2 respectively.

\deg(\mathcal{E}_1 \otimes \mathcal{E}_2)=r_1 \deg(\mathcal{E}_2) + r_2 \deg(\mathcal{E}_1)

where the degree of a locally free sheaf \mathcal{E} of rank r is defined by

\deg(\mathcal{E})=\chi(\mathcal{E})-r \chi (\mathcal{O})

Definition: A sheaf \mathcal{F} is said to be generated by global sections if the natural map

H^0(\mathcal{F}) \otimes \mathcal{O} \longrightarrow \mathcal{F}

is onto.

Proposition 2: If \mathcal{E} is a locally free sheaf on C, there exists a positive divisor D on C s.t. \mathcal{E}(D):=\mathcal{E} \otimes \mathcal{O}(D) is generated by global sections.

Atiyah’s theorem: Given a locally free sheaf \mathcal{E} of rank r generated by global sections, there exists an exact sequence

0 \longrightarrow \mathcal{O}^{r-1} \longrightarrow \mathcal{E} \longrightarrow \mathcal{L} \longrightarrow 0

where \mathcal{L} is an invertible sheaf.

Classification of Vector Bundles on Elliptic curves

I’m supposed to give a talk on this subject for one of my courses, so I consider this post as a “pre-exposition.” I learned from and heavily used the great exposition “Vector bundles on curves” by Montserrat Teixidor I Bigas in this post. I wrote up the pre-requisites here.

In 1957, Atiyah in this famous paper “Vector bundles over an elliptic curve” classified indecomposable vector bundles of arbitrary rank and degree. Briefly, every vector bundle (locally free sheaf) is decomposed uniquely (up to the order) to the direct sum of indecomposable vector bundles and the set of isomorphism classes of indecomposable vector bundles of a fixed rank and degree is isomorphic to the Jacobian of the curve which the latter is isomorphic to the curve itself. The isomorphisms are canonical for vector bundles of degree zero and for higher degree, they depend on the choice of a line bundle of degree one (a base point on the curve)  [“Vector bundles on curves” by Montserrat Teixidor I Bigas, section 4.]

An elliptic curve C is a smooth projective curve of genus one over an algebraically closed field k. One may assume that \text{char}(k)=0. Throughout, the words vector bundle E and locally free sheaf \mathcal{E} are used interchangeably.

Denote by U(r,d) the set of isomorphism classes of indecomposable vector bundles of rank r and degree d where the degree of a vector bundle E of rank r is defined as the degree of the associated locally free sheaf \mathcal{E} which is

\deg(\mathcal{E})= \chi (\mathcal{E})-r \chi (\mathcal{O})=\chi (\mathcal{E})-r(1-g).

One can also show that \deg(\mathcal{E}) is equal to the degree of its determinant.

Case 1: Vector bundles of degree zero d=0.

For every positive integer r there exists a unique (self-dual) indecomposable vector bundle \overline{E}_{r,0} of rank r and degree zero with only one section. It turns out that any indecomposable vector bundle of rank r and degree zero is isomorphic to \overline{E}_{r,0} \otimes L for a unique line bundle L of degree zero. Therefore, the set (moduli space) of indecomposable vector bundles of degree zero on C, i.e. U(r,0) is (canonically) isomorphic to the moduli space of degree zero line bundles on C which in turn by definition is the Jacobian of C.

Case 2: Vector bundles of non-negative degree d \geq 0, (general case)

In general, let L be a fixed line bundle of degree one, (which corresponds to fixing a base point) on C then there is an isomorphism U(r,d) \tilde{\to} U(r,d+rk) sending E \mapsto E \otimes L^k. Note that \deg(E \otimes L^k)=\text{rk} (L^k) \deg (E)+\text{rk} (E) \deg(L^k)=d+rk.

If r>0 there is an isomorphism U(r,d) \tilde{\to} U(r+d,d) sending E to E' where E' is given by the following extension with \mathcal{O}^d,

0 \longrightarrow \mathcal{O}^d \longrightarrow E' \longrightarrow E \longrightarrow 0

By these two operations, we can assume that d \geq 0 and r>0. Moreover, if d \geq r we will have U(r,d) \cong U(r,r-d) and if d<r then U(r,d) \cong U(r-d,d). Note that non of these operations changes h=\gcd(r,d). Thus, we can construct a sequence of isomorphisms U(r_i,d_i) \cong U(r_{i+1}, d_{i+1}) with r_i>0, \; d_i \geq 0 s.t. \gcd(r_i,d_i)=\gcd(r_{i+1},d_{i+1}) and r_i+d_i> r_{i+1}+d_{i+1}. This is the sequence of positive numbers, so it will terminate when r_i=h and d_i=0. Hence, we have established the isomorphism

U(r,d) \cong U(h,0)

and by case 1, U(h,0) is isomorphic to the Jacobian of C (and is isomorphic to the curve itself.)

That being said, the above isomorphism is completely determined up to the choice of a line bundle L of degree one.

So far, we have achieved what we wanted. Let’s now, try to dig more. Denote by E^L_{r,d} the element in U(r,d) corresponding to \overline{E}_{h,0} in U(h,0).

Proposition 1: a) Every vector bundle E of rank r and degree d can be written as E^L_{r,d} \otimes L' for some line bundle L' of degree zero.

b) If \gcd(r,d)=1, then E^L_{r,d} \otimes (E^L_{r,d})^* \cong \bigoplus^{r^2}_{i=1} L_i where L_i run over the set of line bundles of order (in the Picard group) r.

c) If r \geq s, then \overline{E}_{r,0} \otimes \overline{E}_{s,0} \cong \overline{E}_{r-s+1,0} \oplus \overline{E}_{r-s+3,0} \oplus \cdots \oplus \overline{E}_{r+s-3} \oplus \overline{E}_{r+s-1,0}.

d) If gcd(r,d)=1, then E^L_{r,d} \otimes \overline{E}_{h,0}=E^L_{rh,dh}.

e) If \gcd(r,r')=\gcd(r,d)=\gcd(r',d')=1, then E^L_{r,d} \otimes E^L_{r',d'}=E^L_{rr',rd'+r'd}.

Here is some results involving stability and semi-stability of vector bundles on C.

Proposition 2: a) An indecomposable vector bundle of degree zero is semi-stable not stable.

b) Indecomposable vector bundles are semi-stable and they are stable if and only if \gcd(r,d)=1.

For detailed and complete description of vector bundles on an elliptic curve, especially  when the ground field is of \text{char}(k)=p, look at Atiyah’s original paper.

Some Homological Algebra Computations

In this post, I’m going to write down the detailed proofs of some of the exercises in Rotman’s Homological Algebra. They were asked in ML and then answered by me.

1. Let A be a torsion abelian group. Then \text{Ext}^1_\mathbb{Z}(A, \mathbb{Z}) \cong \text{Hom}_\mathbb{Z}(A,S^1), where S^1 is the unit circle.

One point is that the structure of the circle group is \mathbb{Q}/\mathbb{Z} \oplus \mathbb{R}. Now, since A is torsion, then \text{Hom}(A,\mathbb{R})=0. Therefore, it is enough to show that \text{Ext}^1_\mathbb{Z}(A, \mathbb{Z}) \cong \text{Hom}_\mathbb{Z}(A,\mathbb{Q}/\mathbb{Z}).

We know that \text{Ext}(A,-) functor is the right derived functor of the left exact (covariant) functor \text{Hom}(A,-). Consider the following exact sequence of abelian groups (injective resolution)

0 \longrightarrow \mathbb{Z} \longrightarrow \mathbb{Q} \longrightarrow \mathbb{Q}/\mathbb{Z} \longrightarrow 0

where obviously, the second arrow is inclusion and the third one is the projection.

Applying our left exact (covariant) functor \text{Hom}(A, -) to the above exact sequence, we will obtain the following long exact sequence

0 \longrightarrow \text{Hom}(A,\mathbb{Z}) \longrightarrow \text{Hom}(A,\mathbb{Q}) \longrightarrow \text{Hom}(A,\mathbb{Q}/\mathbb{Z}) \longrightarrow \text{Ext}^1(A, \mathbb{Z}) \longrightarrow 0

Notice that, \mathbb{Q} and \mathbb{Q}/\mathbb{Z} are divisible groups, hence are injective \mathbb{Z}-modules, thus, \text{Ext}^i(A, \mathbb{Q})=\text{Ext}^i(A, \mathbb{Q}/\mathbb{Z})=0 for i>0.

Similarly, since A is torsion then \text{Hom}(A,\mathbb{Z})=\text{Hom}(A,\mathbb{Q})=0. Hence,

\text{Ext}^1(A, \mathbb{Z}) \cong \text{Hom}(A,\mathbb{Q}/\mathbb{Z})

as desired.

2. Let A, B be finite abelian groups. Prove that \text{Ext}^1_\mathbb{Z} (A,B) \cong A \otimes_\mathbb{Z} B.

By the classification of finitely generated abelian groups, we can assume that A \cong \mathbb{Z}/q_1 \oplus \mathbb{Z}/q_2 \oplus \cdots \oplus \mathbb{Z}/q_k and B\cong \mathbb{Z}/p_1 \oplus \mathbb{Z}/p_2 \oplus \cdots \oplus \mathbb{Z}/p_l where q_i, p_j are (not necessarily distinct) prime numbers.

Lemma: Let A be a finite abelian group, then

\text{Ext}^1(A, \mathbb{Z}/m) \cong \frac{\text{Hom}(A, \mathbb{Q}/\mathbb{Z})}{m \text{Hom}(A, \mathbb{Q}/\mathbb{Z})}.

Proof of the lemma: Consider the following injective resolution for \mathbb{Z}/m

0 \longrightarrow \mathbb{Z}/m \longrightarrow \mathbb{Q}/\mathbb{Z} \longrightarrow \mathbb{Q}/\mathbb{Z} \longrightarrow 0

where the second arrow is multiplication by 1/m and the third one is multiplication by m.

Applying \text{Hom}(A, -) to the above short exact sequence, we will get

0 \longrightarrow \text{Hom}(A, \mathbb{Z}/m) \longrightarrow \text{Hom}(A,\mathbb{Q}/\mathbb{Z}) \longrightarrow \text{Hom}(A,\mathbb{Q}/\mathbb{Z}) \longrightarrow \text{Ext}^1(A, \mathbb{Z}/m) \to 0

Note that, injectivity of \mathbb{Q}/\mathbb{Z}, implies that \text{Ext}^i(A,\mathbb{Q}/\mathbb{Z})=0 for i>0. Hence, the required isomorphism. (Notice that the third arrow is the induced multiplication by m from our resolution.)

In particular, setting A=\mathbb{Z}/n in the lemma and using the result of the question 1), we get

\text{Ext}^1(\mathbb{Z}/n, \mathbb{Z}/m) \cong \frac{\text{Hom}(\mathbb{Z}/n, \mathbb{Q}/\mathbb{Z})}{m \text{Hom}(\mathbb{Z}/n, \mathbb{Q}/\mathbb{Z})} \cong \frac{\text{Ext}^1(\mathbb{Z}/n, \mathbb{Z})}{m \text{Ext}^1(\mathbb{Z}/n, \mathbb{Z})} \cong \frac{ \mathbb{Z}/n}{m \mathbb{Z}/n} \cong \mathbb{Z}/d

Note that, \text{Ext}^1(\mathbb{Z}/n, \mathbb{Z}) \cong \mathbb{Z}/n and \mathbb{Z}/d \cong \mathbb{Z}/n \otimes \mathbb{Z}/m where d=\gcd(n,m). Therefore,

\text{Ext}^1(\mathbb{Z}/n, \mathbb{Z}/m) \cong \mathbb{Z}/n \otimes \mathbb{Z}/m \;\;\;\; (\star)

Utilizing the commutativity of \text{Ext} and tensor product with (finite) direct sum, and ( \star), we obtain the required isomorphism,

\text{Ext}^1(A,B)=\text{Ext}^1(\mathbb{Z}/q_1 \oplus \mathbb{Z}/q_2 \oplus \cdots \oplus \mathbb{Z}/q_k, \mathbb{Z}/p_1 \oplus \mathbb{Z}/p_2 \oplus \cdots \oplus \mathbb{Z}/p_l) \cong \bigoplus_{i,j} \text{Ext}^1(\mathbb{Z}/q_i, \mathbb{Z}/p_j) \cong \bigoplus_{i,j} (\mathbb{Z}/q_i \otimes \mathbb{Z}/p_j) \cong (\mathbb{Z}/q_1 \oplus \mathbb{Z}/q_2 \oplus \cdots \oplus \mathbb{Z}/q_k) \otimes (\mathbb{Z}/p_1 \oplus \mathbb{Z}/p_2 \oplus \cdots \oplus \mathbb{Z}/p_l) \cong A \otimes B.

3. Let _RF be a flat left R-module and P be its projective covering, i.e. there is some R-module epimorphism p: P \rightarrow F. Prove that if P is flat, then \text{Ker}(p) is flat.

Rotman’s definition of projective cover of a module F is indeed, an ordered pair (P, p), where P is projective and p:P \to F is a surjective morphism with \text{ker}(p) a superfluous submodule of P.

I will use the following homological characterization for (left) flat R-modules,

\text{Tor}^R_n(-,M)=0 \;\; \text{for all} \;\; n \geq 1 \iff _RM \;\; \text{is a flat left} \; R\text{-module}

By assumption, 0 \longrightarrow M \hookrightarrow P \longrightarrow F \longrightarrow 0 is a short exact sequence of left R-modules, where M:=\text{ker}(p).

Let A be a arbitrary right R-module, then after applying the right exact functor A \otimes_R - to the above exact sequence, we will have

0 \to \text{Tor}^R_1(A,M) \to \text{Tor}^R_1(P,M) \to \text{Tor}^R_1(F,M) \to A \otimes_R M \to A \otimes_R P \to A \otimes_R F \to 0

By the characterization, \text{Tor}^R_n(-,P)=\text{Tor}^R_n(-,F)=0 for n\geq 1, therefore, \text{Tor}^R_1(A,M)=0, as well as \text{Tor}^R_n(-,M) for n\geq 1. Hence, _RM is a flat module.