Arc is a highly hackable language, so we can expect that programmers will create many Arc hacks. Some will be good, some bad, some experimental, some very interesting. Some hacks may be highly useful for a particular project but a bad idea for Arc in general.
As an Arc programmer, I’d like to be able to easily use hacks that other hackers create. And, to be able to get that hack without getting a lot of other hacks that I don’t want or don’t need. If I like a hack to Arc, or want to use it for a particular project, I’d like to be able to apply just that patch to Arc, without having to get a lot of other hacks as well.
As someone who would like to use other people’s hacks, for me the ideal situation would be if every hack were available as a minimal set of differences from Arc.
This is a rather unusual notion, since most software is distributed in releases, with each release incorporating a bunch of patches. To work on the release, programmers will check in patches to a development branch. When everything is working, the branch will be tagged (“version 1.52”) and released.
Unless special care is taken to produce a clean patch series, this process mushes hacks together. I’ll be working on hack A, and check in a patch A0, and then other hacks B, C, D, E, and F will get checked in, and then I’ll check some more patches A1 and A2 which finish up hack A. Now it’s tedious to pull out just hack A without getting the rest.
There is one common example where programmers do work at creating a “clean” patch set: when they are submitting work to a project’s coordinator, and want to make it easy for the coordinator to understand and accept the patches.
But this common scenario has a middleman: the coordinator who is collecting patches from developers and creating releases. I, the programmer using the code, gets the patches from the developers through the coordinator. The coordinator is deeply knowledgeable about the hairy internal details of the software and makes sure that the submitted patches don’t unintentionally break things.
What if we don’t need a middleman? Arc today is remarkably free of “hairy internal details”. One of the results of refining code down to its most succinct representation is that you aren’t left with a lot of complicated structure that you need to be a guru to figure out.
What if it were as easy and comfortable to choose which patches you wanted in your Arc as it is in other languages to choose which libraries you wanted to use?
I suspect there may be some unexpected benefits. As one example, when code just does what is needs to do like
(do (a) (b) (c)) it’s easy to see what it’s doing. But then some people need different things and so we start getting configuration options like
(do (if config*!a (a)) (if config*!b (b)) ...). I’ve seen libraries where there is more code dealing with configuration options specifying what to do then there is code to do the actual thing that the library is supposed to do. What if the code was so clear and patches so easy that it would be as easy to specify a patch to set the code to
(do (a) (c)) as it would be to write a configuration file?
I’ve been playing around with git, wondering if git was a good choice for sharing Arc hacks, and if so, which git entity (repositories, branches, tags...) would be best to use for one hack. From what I can tell so far, it looks like tags work well.
Suppose, for example, that you happened to want arc2 with my date and atomic fixes, my patch to read and write Arc tables, and nothing else that wasn’t needed for those hacks. Here’s how to do it with git (git’s output is not shown. Also, this assumes that you’re using my commit of arc2; see the next section if you’re starting with a different commit of arc2):
$ git clone git://github.com/CatDancer/arc.git $ cd arc $ git checkout arc2 $ git merge arc2.date0 $ git merge arc2.atomic-fix0 $ git merge arc2.table-reader-writer0
Your working directory will now contain a version of arc2 with those patches applied.
My naming convention for these tags is that the first part (“arc2”) is what is being patched, the second part is the name of the hack, and the final number is the version of the hack. Each hack contains the minimal number of changes needed to patch arc2 to implement just that hack, and nothing more.
Some hacks do depend on other hacks, so the git commands above will have pulled in some other hacks as well. The
--decorate option to
git-log will print out tags that point to included commits:
$ git-log --decorate ... commit bef6020695b2a4e7721e09d6833bbc2c1f512eae (refs/tags/arc2.table-reader-writer0) Author: Cat Dancer <firstname.lastname@example.org> Date: Sun Apr 12 15:00:35 2009 -0400 ... commit 1f2243319651f5797ecb3f4e0166bfd5751af3b1 (refs/tags/arc2.scheme-values0) ...
To get just the tag names, the
%d format string prints the “decorate” value:
$ git log --pretty=format:%d (refs/tags/arc2.table-reader-writer0) (refs/tags/arc2.scheme-values0) (refs/tags/arc2.list-writer0) (refs/tags/arc2.date0) (refs/tags/arc2.atomic-fix0) (refs/tags/arc2, refs/remotes/origin/master, refs/remotes/origin/HEAD, refs/heads/master)
(The blank lines are the merge commits that don’t have any tags pointing to them). This output in turn can be made easier to read by pulling out just the tags:
$ git log --pretty=format:%d | perl -ne 'm.refs/tags/([^),]+). && print "$1\n"' arc2.table-reader-writer0 arc2.scheme-values0 arc2.list-writer0 arc2.date0 arc2.atomic-fix0 arc2
Now we can easily see that the “scheme-values” and “list-writer” hacks were dependencies and got pulled in as well.
Why tags instead of branches? I could have a “table-reader-writer” branch which contained the latest version of that hack, and I might do that for some of my larger hacks where I’m actually going through multiple revisions. However, we need tags anyway to keep track of which version of a patch we have (since version 1 of hack A might be based on version 4 of hack B, while version 2 of hack A is based on version 6 of hack B). Since branches evolve over time, git has extra machinery to keep track of branches and to allow local branches to track remote branches, etc., a complexity which isn’t needed if all you want is to grab version 0 of the arc2 date patch.
At the moment I have around twenty hacks to Arc checked in to my git repository on github, each one independently accessible as a separate tag.
Many of these hacks are tiny. For example, arc2.testify-iso0 lets you pass in a list as the test argument to the functions that use testify:
arc> (rem '(c 3) '((a 1) (b 2) (c 3) (d 4))) ((a 1) (b 2) (d 4))
the patch adds one letter to the Arc source code, changing a call to
(def testify (x) - (if (isa x 'fn) x [is _ x])) + (if (isa x 'fn) x [iso _ x]))
You can see a pretty colored version of the commit on github.
The example above of merging patches went very smoothly, but it did assume that you were starting with my commit of arc2. Suppose instead you started with your own:
$ wget http://ycombinator.com/arc/arc2.tar $ tar xf arc2.tar $ cd arc2 $ git init $ git add . $ git commit -m 'initial version'
Now you can fetch a patch of mine:
$ git fetch git://github.com/CatDancer/arc.git tag arc2.date0
But now if you try to merge the patch you’ll get conflicts:
$ git merge arc2.date0 Auto-merging ac.scm CONFLICT (add/add): Merge conflict in ac.scm Auto-merging arc.arc CONFLICT (add/add): Merge conflict in arc.arc Automatic merge failed; fix conflicts and then commit the result.
The problem is that git knows that arc2.date0 is a patch to my commit of arc2, but it doesn’t know that your checkin of arc2 can be treated the same as my commit.
There’s an easy fix. First let’s get rid of the failed merge and revert back to your checkin of arc2 that you had before:
$ git reset --hard
By “merging” my arc2 into your checkin of arc2, git will know that they don’t conflict.
$ git merge arc2
None of the files actually change in this merge commit, since after all your checkin of arc2 and my commit of arc2 are the same. But now the merge of my patch goes smoothly:
$ git merge arc2.date0