Using Git Commits for Hacks

Here I show how to use Git as a mechanism for programmers to load libraries and patches, as an alternative to the usual approach of downloading, installing, and loading libraries into a program. This is using Git in an unusual way, as Git is normally used as a version control system.

With this mechanism patches gain the ease of use and the flexibility of libraries: an end-programmer can pick and choose which patches they want, and patches can be loaded in the same way and just as easily as loading libraries.

Libraries gain the lightweight characteristics of patches: if a library is missing some feature or two libraries conflict, it’s easy to share a patch which resolves the issue.

As patches and libraries become more alike and can be shared and used in the same way, I use “hack” as a generic term to mean something which is either a patch or a library, or a bit of both.

Because I’ve just started working on this idea, the examples show using raw Git commands to pick which hacks you want. Presumably, if this works out, we’ll eventually write some higher level commands that will be easier to use, or maybe even implement something ourselves that isn’t as complicated as Git is.

See libs for a mechanism to load libraries using Git without getting gratuitous Git conflicts.

Using Git for hacks vs. using Git as a version control system

When using Git for hacks:

A Git checkout is a program or library and the libraries and patches that it depends on.
A Git repository can store the code and for multiple libraries, patches, and programs.
Each Git commit represents a particular library or patch.
Dependencies (library A uses library B which needs library C) are represented by ancestors of commits, and are automatically pulled in using Git’s merge mechanism.

When Git is used in the usual way as a version control system:

A Git checkout is a particular version of one program or library.
A Git repository stores the code and development history for that program or library.
Each Git commit is a snapshot of the development history.
Loading libraries and dependencies are managed in some way outside of Git: for example, one library may “import”, “use”, or “require” another.

You can use Git for both if you want, even in the same Git respository: you can develop a library using Git in the usual way as a version control system keeping track of your development process and then use Git to share your library.

Using Git to pull in some hacks

As an example of using Git to pull in some libraries and their prerequisites, suppose someone wanted to write a program that used my table-reader-writer hack and my mz hack. They could start off with:

 $ mkdir myprog
 $ cd myprog
 $ git init
 $ git pull git://github.com/CatDancer/arc.git tag arc2.table-reader-writer0
 $ git pull git://github.com/CatDancer/arc.git tag arc2.mz0

[tag names are going to be changed to be globally unique]

Dependencies are automatically pulled in by this process. For example, table-reader-writer builds upon arc2, so arc2 is pulled in. table-reader-writer also needs the list-writer and scheme-values hacks, so those are pulled in as well.

Sometimes a hack will conflict with another hacks. To resolve the conflict, a programmer can share a commit which resolves the conflict.

In Git tag names are not usually globally unique; a typical tag name in normal Git usage will be something like “v4.01”. If we’re going to be using Git to combine hacks written by different people then we’ll need unique tag names. I propose using our Arc forum username as a prefix, which will be good enough for now.

Sharing a Library

Suppose you have some libraries that you’d like to share using this approach. You’d do each library separately, but you can do all your work within one Git repository if you wish. Start off with a clean copy of arc3 (or whatever is your base):

[these tag names don’t exist yet]

 $ git checkout pg.arc3

As “pg.arc3” is a tag, Git will warn you that you are no longer on a branch. This is OK, you don’t need a branch for the following, though you can work within a branch if you want to. Then pull in any hacks that your library depends on:

 $ git merge catdancer.arc3.table-reader-writer0
 $ git merge someone-else.some-other-library

Make the changes which implement your library in your working directory. You can do this by hand, or, if you’ve developed your library using Git in the normal way with branches and commits keeping track of your development history, you can use “git-apply” or “git-cherry-pick” with the “-no-commit” option to pull the changes into your working directory without committing them.

Be careful with Git commands which commit for you: that will automatically add your development history as ancestors to the commit. That’s OK if you want to publish your development history publicly, but there’s no going back: once you’ve published an ancestor of a commit, you need to keep it in order for the end-programmer to be able to merge in new versions of your library.

Now commit your changes:

 $ git commit -m 'my library v1'

You can see which ancestor commits were included with:

 $ git log --decorate --topo-order

The “decorate” option shows you the tag name associated with each commit, and “--topo-order” shows you the commits in dependency order instead of in order by the date in which they were created. What you want to see is that only your library and it’s prerequisites appear: that you don’t have other unrelated hacks or development history commits included that you don’t want.

If you make a mistake, just clean out your changes with “git-reset --hard” and start over with the checkout command.

Looks good? Now tag it:

 $ git tag yourname.library-name-v1

If you’re already set up with a remote public repository such as at Github, you can push your changes to share them:

 $ git push --tags

If you don’t have a public Git repository yet, you can go over to github.com, sign up for a free account, and create a new repository. Github will give you some instructions on what to do next, use the ones under “Existing Git Repo”:

 cd existing_git_repo
 git remote add origin git@github.com:...
 git push origin master

and then do the “git push --tags”.

If you accidentally share a tag to some bad code or a commit which has some unneeded ancestor commits, delete the tag locally:

 $ git tag -d yourname.library-name-v1

and then explicitly push that now non-existent tag in order to get it deleted on the remote public repository:

 $ git push origin :refs/tags/yourname.library-name-v1

Never reuse a tag name for a different commit after you’ve shared it. Instead, make up a new tag name, such as “yourname.library-name-v2”. If people ask you what happened to v1, just say it was a bad commit you shared accidentally.

OK! You’ve shared your commit, so now other people can get your library:

 $ git pull git://github.com/yourname/yourrepo.git yourname.library-name-v2

Resolving Conflicts

Sometimes two hacks will conflict, either in the source code (for example, two patches change the same section of code) or semantically (for example, two libraries try to define a function with the same name).

Two resolve the conflict, share a commit which has the two conflicting hacks as ancestor commits:

 $ git merge hack-a
 $ git merge hack-b
 [fix fix fix]
 $ git commit -m 'merge hack-a and hack-b'
 $ git tag myname.hack-a.hack-b
 $ git push --tags

Then when someone wants to use hack-a and hack-b, they can say, “hey! you've already fixed this!” and grab your myname.hack-a.hack-b.

You can do the same thing if you need to advertise that two hacks don’t conflict with each other. For example, I published a “toerr” hack which built upon arc2. As it happens, it merges fine against arc3:

 $ git checkout pg.arc3
 $ git merge catdancer.arc2.toerr0

But if people were worried if it worked with arc3, I could share that commit:

 $ git tag catdancer.arc3.toerr0

While in this case there are no code differences between the arc2 versions of toerr and the arc3 version, my sharing this commit with pg.arc3 and catdancer.arc2.toerr0 as ancestors is a way for me to say that they work together.

Including ancestor commits without their code

Sometimes we’ll want to include a commit as an ancestor of our commit, even though we're not using any code from it.

For example, suppose Alice shares a library L1, and Bob creates a bug fix for that library F1. Then Cindy writes a library L2 which uses L1 and needs the bug fix F1. The commit for L2 will have F1 as a parent commit, so that it gets pulled in automatically if someones uses L2.

Imagine that F1 is actually a pretty ugly fix, and we’re writing a fix F2 which is a better replacement. F2 fixes the problem that F1 fixed, but in a different and better way, and without using any of F1’s code.

We’d like F2 to have F1 as a parent commit, because then someone can merge F2 and L2 and everything will work. If F2 didn't have F1 as a parent commit, then when someone merged L2, Git would attempt to also merge in the F1 code.

If while preparing the F2 commit we were to type “git merge --no-commit F1”, this would do two things: it would add the F1 commit to our .git/MERGE_HEAD so that it will be included as a parent commit later when we run “git commit”, and it would pull in the F1 changes into our working directory. We’d then need to tediously remove the F1 changes.

We can actually just add the F1 commit to our MERGE_HEAD ourselves, without using git merge at all:

 $ git rev-parse F1 >>.git/MERGE_HEAD

Now F1 will be included as one of the parent commits when we “git commit”, without having any of F1’s code in our working directory.

Comment

Comment in the Arc Forum.