Home > Software Development > How to Use GIT Java APIs to Diff Different Versions

How to Use GIT Java APIs to Diff Different Versions

February 3rd, 2013 Leave a comment Go to comments

Last week I introduced the JGIT Java API with a simple sample illustrating how to read content from HEAD. If you have multiple versions of a source code or text file, you may want to see their differences. An easy tool for this is the standard diff.

The JGIT Java API has built-in support for you to generate diff between any two versions of a file, be it a source code, properties file, XML file, or any other text files. Here is a sample that shows how to do this.

/** Copyright Steve Jin 2013 */
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.util.List;

import org.eclipse.jgit.api.Git;
import org.eclipse.jgit.diff.DiffEntry;
import org.eclipse.jgit.diff.DiffFormatter;
import org.eclipse.jgit.lib.ObjectId;
import org.eclipse.jgit.lib.ObjectReader;
import org.eclipse.jgit.treewalk.CanonicalTreeParser;

public class JGitDiff 
  public static void main(String[] args) throws Exception
    File gitWorkDir = new File("C:/temp/gittest/");
    Git git = Git.open(gitWorkDir);

    String oldHash = "d7db296cc2730ca562f91cfa539d6955a21284b6";

    ObjectId headId = git.getRepository().resolve("HEAD^{tree}");
    ObjectId oldId = git.getRepository().resolve(oldHash + "^{tree}");

    ObjectReader reader = git.getRepository().newObjectReader();
    CanonicalTreeParser oldTreeIter = new CanonicalTreeParser();
    oldTreeIter.reset(reader, oldId);
    CanonicalTreeParser newTreeIter = new CanonicalTreeParser();
    newTreeIter.reset(reader, headId);

    List<DiffEntry> diffs= git.diff()
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    DiffFormatter df = new DiffFormatter(out);

    for(DiffEntry diff : diffs)
      String diffText = out.toString("UTF-8");

The output of the program is as follows:

diff --git a/file1.txt b/file1.txt
index 7702b88..805e7c6 100644
--- a/file1.txt
+++ b/file1.txt
@@ -1 +1 @@
-DoubleCloud.org rocks!
\ No newline at end of file
+DoubleCloud.org really rocks!
\ No newline at end of file

As you noticed from the sample, there is a hash string, which is rarely used to identify a version in reality. For one thing, you it’s hidden in the .git/objects folder with other objects identified with hash strings. Most likely you would use a tag, branch head to identify a particular version. That is just a trade-off to simplify the sample.

  3. Mike
    February 5th, 2014 at 12:28 | #3

    Do you have any documentation on the best way to handle printing out the fileNames from a given commit?

  4. February 5th, 2014 at 13:19 | #4

    Hi Mike,

    I don’t remember on top of my head as I haven’t touch it for quite some time.


  5. Ameger
    July 16th, 2014 at 03:39 | #5

    May I know what is oldhash in this code ?

  1. February 4th, 2013 at 04:41 | #1