Git

About

Developed by the Linux team in 2005 and is slang for a stupid or unpleasant person. it differs from the traditional VCS (Version Control System) and CVCS (Centralised Version Control System) solutions – the most common being SVN (Subversion). SVN is the evolution of a revision tool while Git is a true version control system.

The goals of Git are:

  • Speed
  • Simplicity* (refers to simplicity of the underlying technology, not how easy it is to use!)
  • Non-linear development (i.e. branches and merging)
  • Distributed (i.e. no single repository is more important than any other)
  • Data integrity (i.e. no file corruption)

Overview

Remote protocols Git, HTTP(S), SSH
Default branch name master
Default remote name origin
Default pointer name origin/master
Ignored list file .gitignore
Current branch pointer name HEAD

Basics

  • Git tracks content rather than files
  • Each developer has a local repository that they perform most tasks (e.g. branching, merging) against
  • The local repository can then be pushed to a remote (shared) repository
  • All committed objects are stored in a simple key-value store with a unique (40-bit SHA-1) checksum as the key
  • Commit identity comprised of name and email address
  • Snapshots entire filesnot just differences between files (i.e. deltas)
  • Submodules are nested Git repositories (through pointers). Parents are called superprojects.
  • Hooks are triggers that run custom scripts when an action (e.g. John pushes a change) occurs
  • Can create branch tags
  • All repositories (e.g. ‘master’) are considered branches
  • Switching branches swaps the entire context (i.e. resets the state of the working directory, which includes adding/removing files)
  • To work on multiple branches at the same time, clone another repository
  • Uses pointers (see below for detail)
  • It’s possible to edit (i.e. clean up) the local commit history before pushing to a remote repository
  • General commit best practices still apply
    • Should be easily digestible
    • Should deal with a single piece of functionality
    • Consistent messages (summary sentence followed by paragraphs/bullet points)

Pointers

  

  • Uses pointers to snapshots rather than making copies
  • Each branch (local or remote) has a pointer to a snapshot
  • HEAD is the current local working branch
  • Cloning creates the origin/master pointer and a local pointer, which operate independently
  • As new commits are added, the pointer is updated to the most recent snapshot
    • Move the pointer “upstream” (i.e. newer commits) via the “fast forward” operation

Pros and Cons

Pros

  • Lightweight framework (i.e. branches are just 41-byte pointers to snapshots) encourages frequent branching and merging
  • Easily tweak a long list of changes (files or parts of files) into digestible remote commits right before pushing
  • Quick and easy merging makes people merge more frequently, which makes each merge easier
  • Determines best common ancestor (i.e. master) when merging, making it a lot easier (now in Subversion)
  • Less network overhead as most work is done locally
  • Can work offline
  • More redundancy as all workers have a full backup of the project
  • Emerging as the new standard
  • Synchronisation can happen over any protocol – a central repository isn’t necessary but most common
  • Branching and merging recorded in the history (now in Subversion)
  • GitHub
  • Perfectly suited to large open-source projects
  • Git only creates one folder as opposed to SVN folders in every directory
  • Automatically deals with moved or deletes files
  • Doesn’t impose any workflow (i.e. centralised)
  • Data integrity as objects cannot be broken or tampered with and have the checksum still match
  • No commit access issues (who can write to the centralised repository?) through distribution because everyone has their own copy. Choose when to pull edits rather than having everyone push.
  • No namespacing issues because all branches aren’t in the same repository
  • Small teams can commit, merge and run (e.g. test suites) amongst themselves outside of the main branch

Cons

  • Harder learning curve because a single repository is easy to understand (single source of truth) and backup
  • Staging area is overkill
  • Must update before pushing – even if modified files don’t clash
  • Cloning a projects includes the entire history – including that 100Mb video that was accidentally checked in 3 years ago!
  • Not great for large files (i.e. 100Mb+)
  • Can’t work on subfolders of a repository
  • Centralised repositries work well for tightly controlled corporate environments
  • Windows support as a second-class citizen
  • Worse user interface tools, primarily a command-line tool
  • Worse IDEA support
  • SVN has shorter and predictable revision numbers (e.g. 1, 2, 3) as opposed to a random hash
  • Can’t commit to two branches at once
  • Unfriendly error messages

Limbo

  • Git is faster by a few milliseconds/seconds (trivial for 80/20)
  • Git is smaller by tens of megabytes (trivial for 80/20)
  • Git has a simple codebase
  • Git has better compression (trivial for 80/20)
  • Context switching branches could be good (command prompt) or bad (multitasking environment)

Workflow

  1. Checkout repository
  2. Modify working directory files
  3. Stage files, adding snapshots to the staging area
  4. Commit staging area snapshots to local repository
  5. Push local repository to remote

File States

  • Tracked
    • Unmodified
    • Modified
      • Files changed but not committed locally
    • Staged
      • Create staging snapshot of modified files (at time of stage) before commit
      • It’s possible to skip this step by auto-staging committed files
    • Committed
      • Files on the local repository change to unmodified
    • Conflict
      • Unmerged
      • Resolved
  • Untracked (i.e. ignored)

Areas

  • Working directory
    • Checkout of a remote repository
  • Staging area (index)
    • Files ready to be committed
    • Setup what you want commits to look like before committing
  • Local repository (directory)
    • Files committed
  • Remote repository
    • Files pushed to a remote server
    • Push will be rejected if repo has been modified since – even if unrelated files. Need to pull first.

Porcelain Commands

User-friendly commands are referred to as “porcelain” while low-level commands are “plumbing” commands.

Command
Description
Comments
init Creates a Git repository Stored in a folder named ‘.git’
add Track files, stage files, resolve conflicting files Multiple uses!
clone Checkout a remote repository
status Find out what state files are at
diff Compare differences between working and staged Does not compare with committed code!
diff –staged Compare differences between staged and committed
commit Commits staged changes
log View commit history
reset Unstages a file
checkout Reverts a file Local changes will be lost permanently
remote List known remote repositories
fetch Get all remote data you don’t have Update your local copy of a remote branch without modifying your working copy. Use merge after to use the latest files.
pull Fetch and merge Automatically merges without letting you review first!
push Push committed changes to remote
tag List tags
branch List branches, create new branch, delete branch Multiple uses! Creating new branch does not automatically switch to it. Warning when attempting to delete a branch that has not been merged.
checkout Switch to another branch Cannot switch branches with modified files uncommitted. Workaround is stashing.
merge Merge another branch into current Merging uses three-way merge between original, branch and latest – resulting in a new snapshot with 2 parents
tag Create tag Useful for specific release versions (e.g. v1.0). Use annotated tagging.
rebase Rewrites commit history without changing actual files
  • Same as merge except changes occur serially (each change applied in order – like a patch) rather than in parallel
  • Used to clean up commit history of local private work before pushing publically
  • Once applied, need to “fast forward” the master to the rebased snapshot
  • Never rebase public repositories as it appears like new changes (hashes have changed) and other will have to re-merge. Also duplicate commit messages in the history are confusing.
rebase shortlog Summary of commits Good for sending around to the team
reflog Log of recent HEAD pointer history Good for seeing what has been recently worked on
stash Moves modifications to a file Allows switching branches without having to commit (useful for unfinished work). Can unstash later.
stash filter-branch Modify entire history Permanently remove files or alter commit messages
svn Work as a subversion client Git locally, SVN remote. Need to follow some guidelines…

Workflow Paradigms

Centralised

  • Traditional CVCS with a single remote and multiple wite-access users
  • Good for small teams

Long running

  • Branches represent different levels of stability
  • Release ready code in master branch
  • Active work on dev branch
  • Short-lived work on topic branches (single feature)

 octocat

Integration/Manager

  • Developers fork a project, update, then request original author (“integration manager”) pull in the change
  • Author is the only person with write access to the repository (“blessed repository”)
  • Author can test locally first
  • Author can pull in changes at any time allowing everyone to work at their own pace
  • Used by GitHub
  • Similar to submitting patches – except forks can run standalone

Director/Lieutenants

  • Hierarchy of merging – similar to a traditional business structure (i.e. employees report to managers who report to their managers and so on…)
  • Good for huge projects as it delegates merging
  • Repository (“blessed repository”) read by all (“developers”), updated by “dictator”, changes filter up through “lieutenants”
  • Network of trust – pull from people you trust, who pull from people they trust etc…

Tools

Git for Windows

Offers a console implementation of the version control system for windows.

TortoiseGit

A revision control client, implemented as a Microsoft Windows shell extension.

GitExtensions

The only graphical user interface for Git that allows you control Git without using the commandline.

GitHub

Web-based hosting service for software development projects that use the Git revision control system. GitHub offers both paid plans for private repositories, and free accounts for open source projects.

git cola

git-cola

A sleek and powerful git GUI git-cola is Python-powered.

SmartGit Hg

A cross-platform client for Git, Mercurial and SVN. Costs $79.

Others

  • GitJungle (beta)
  • gitSafe
  • GitCheetah

IDEA

  • Context switching has overhead because needs to re-check everything
  • Will probably need to use the command line
  • Synchronise in IDEA when updating via command line
  • Smart checkout with automatically stash and unstach uncommitted changes when switching branches
  • Shows irrelevant commands like ‘git init’ when it has already been done
  • Have to manually set identity using the command line (e.g. git config –global user.name “Your Name”)
  • Pull automatically performs the merge (if no conflicts) and shows you the list afterwards
    • Revert to previous if unhappy with pull
    • To view actual code changes, right-click each file and click ‘Show diff’ (can’t easily navigate diffs)

Weird Stuff

  • It’s possible for a file to be both staged and unstaged! (e.g. modify foo.txt, stage it, then further modify it – stage snapshot is not the same as later modified)
  • Can’t always revert undos
  • Fetch doesn’t automatically merge or create a locally editable copy – instead only an immutable pointer?
  • By default push doesn’t transfer tags to remote server
  • To delete a remote branch you need to push nothing!
  • Must “fast forward” after rebasing (why not automatic?)
  • Add command to resolve conflicted files? Why not ‘resolve’?
  • Cryptic error messages
    • “master > master (nonfast forward). error: failed to push some refs to…” = push failed because remote has been changed

Conclusion

Git is not better than Subversion. But is also not worse. It’s different.

Subversion is good If you have a centralised repository, consistent internal access, a small project, basic needs (not many branches) and users who aren’t interested in the details of a source control system. The simplicity and excellent tooling will shine.

Git is good if you need to work offline, work on open-source projects (i.e. GitHub), have advanced needs (lots of branches) and a willingness to spend the extra time learning. Once mastered, the decentralised framework is quicker, more efficient and more flexible.

Ideal VCS

A repository for 99% of all projects that exist. If you have tens of thousands of developers with a million files, decoupling into smaller modules. KISS.

Goals

  1. Idiot proof
    1. No command line necessary (but useful as an API)
    2. Easy branching, merging, comparing differences, history and reverting
  2. Fast enough
  3. Small enough
  4. Thorough enough
    1. Checks integrity of files
    2. Don’t keep history of files that have been deleted long ago (occasional cleanup)
  5. Collaborative
  6. Not suitable for very large projects

Workflow

  1. Developer checks out the latest revision. Easy option to checkout entire repository after calculating the size.
  2. Modifies local version, changes are committed automatically
    1. Dropbox folder witn OS integration and asynchronous remote backup
      1. Modified files are automatically saved at each point
      2. Can revert any file (or the whole project) to any point (e.g. yesterday morning)
      3. Optionally set ‘milestones’, which can be used for remote pushes or reverting
  3. Separate branches are treated as different projects (i.e. different folders)
    1. Context switching will confuse my IDE
    2. May want to work on more than one branch at the same time
    3. Easy merge
  4. Collaboration
    1. Easy to share your snapshots with others (i.e. pull request)
      1. Similar to Dropbox’s public link
      2. Push online (temporarily or permanently, like Awesome Screenshot)
      3. Include full history or just latest revision (smaller)
    2. Easy to merge other’s snapshots
    3. Snapshots are self-contained
  5. When ready, push local version to remote
    1. Lists changed files locally and from remote
    2. Add a comment
    3. If no conflicts, automatically pulls in changes then pushes to remote
    4. If conflicts, easy way to resolve
  6. Idiot proof
    1. Handful of logically named commands (init, checkout, commit, pull, push, revert, branch, merge, tag, history)
    2. Error messages that make sense
    3. GUI
  7. Advanced
    1. Each file and folder has a checksum that determines if it has changed (recursively checks from top-to-bottom)
    2. Allow repositries within repositries (so you can get everything)

References

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s