about posts github email

Count insertions a user made in a git repository

Posted on 2017-05-09

How many lines did a certain user actually contribute to a git repository? How many lines did he delete? This post shows a way to answer this questions with the help of git and the Unix shell.

Here follows the full command, the impatient reader can skip the explanatory exposition below:

git log --author="Linus Torvalds" --pretty=format: --shortstat \
    | sed -e 's/\([0-9]*\) insertion.*/\n\1/;s/.*\n//' -e t -e d \
    | tr '\n' '+' | sed 's/.$/\n/' | bc

The approach nicely follows the filter-map-reduce pattern and thus consists of three main steps:

  1. Filter: from all commits consider only those whose author is Linus Torvalds. git offers this functionality. The cryptic --pretty=format: was the only way I found to surpress all information besided the one given by --shortstat. It is necessary to avoid commit headlines containing strings like "3000 insertions" getting matched by the regular expressions in the map step.
  2. Map: from every commit returned by the previous filter step, extract the count of added and deleted lines. sed does the job. I will not go into much detail here, just noting that the expression given to the first -e chains two search and replace expressions together, the following -e t and -e d ensure that lines not matching the first expression are filtered out.
  3. Reduce: sum up all the counts returned by the previous map step. This is done by bc, with a little help from tr and sed, who team up to form a valid arithmetic expression.

With very small changes in the map step, the command above can be adapted to consider deleted instead of inserted lines. It does, however, not consider the fact that lines added by the user might have been deleted or overwritten by another user. If one is not interested in historical commit data but rather wants to know how many lines in a revision of the repository were committed by a certain user, the command below can be used. It might take a long time though on large repositories.

find . -type f -exec git blame {} \; 2>/dev/null \
    | grep '^[a-z0-9]* (Linus Torvalds' \
    | wc -l

Comments

© 2018 Johannes Tax (johannes@johannes.tax)