|<<>>|29 of 275 Show listMobile Mode

Working with Git Submodules

Published by marco on

Introduction

The intended audience of this document is people interested in knowing which commands to execute to update submodules. The initial analysis section is intended for people interested in knowing how the commands work and what their strengths/weaknesses are.

The inspiration for this documentation was that I was wondering whether submodules were always cloned with detached heads and if there were some way to avoid that. The short answers to these questions are, respectively, “yes” and “no”.

Skip to the examples below to just see the commands and their effects.

At the end of the document are links to pages referenced to produce this documentation.

Terminology

In the discussion below, the term superproject refers to the root repository that contains submodule references. It comes from the git documentation where they make the distinction because submodules can be nested. Suppose, we have multiple nesting, as shown below.

📁 A
  📁 B
    📁 C
  • A is the root repository of both B and C
  • A is the superproject of B
  • B is the superproject of C

Where do submodules go?

Submodules are stored inside another repository.

For a simple we would see the following:

📁 A
  📁 .git
    📁 modules
      📁 B
        📄 config (worktree = ../../../B)
  📁 B
    📄 .git (points to ../.git/modules/B)

The submodule’s git folder is stored in the superproject’s git folder and is replaced by a file that references the new location. The submodule uses the worktrees feature to check out to a different folder.

Can I share a local copy of a submodule?

No. Storing the working tree of the submodule outside of the repository is not supported.

Why would you want to do that anyway?

One use case is that you have two repositories, each of which includes the same submodule, as shown below.

📁 A
  📁 B
📁 C
  📁 B

Instead of using two copies, you might think you could make the superprojects refer to the same copy of the submodule.

📁 A (refers to ../B)
📁 B
📁 C (refers to ../B)
  • The advantage would be that changes made to A would immediately be available in C
  • However, it would no longer be possible to make A and C refer to different commits

Whereas you can manually move a submodule outside of the repository after you’ve cloned it, you cannot configure a superproject’s submodules in a way that Git will be able to clone properly. If you try it, you’ll probably get an error message like,

fatal: No url found for submodule path 'SUBMODULE.NAME' in .gitmodules

The next section explains how you can share local commits for testing.

Testing submodule changes in multiple projects

Assume, as above, that there are two copies of the submodule, BA and BC. Suppose there are commits in BA that have been tested with A, but should also be tested with C.

One way to test C would be to push the commits in BA and then pull them from BC. That involves a round-trip to the server, which is not optimal, but relatively straightforward.

Another way to test C would be to add the local BA as a remote to BC and then check out the commit from BA directly.

To set up a remote called B_A in BC, execute:

git remote add B_A ../../A/B

The testing flow would be, roughly,

  • Test changes to submodule BA in A
  • Create commit #1 in BA
  • Fetch from B_A into BC
  • Check out commit #1 in BC
  • Test changes in C
  • Repeat as needed

What to expect when cloning with submodules

A clone of a superproject (a repository with submodules) fetches submodules only when required (e.g. when –recurse-submodules is included). If submodules are fetched, then git sets the checked-out commit in each submodule to the commit ID specified for that module in the superproject. This makes sense because that is the correct commit to use. However, this also means that, after a clone, all submodules will be in a detached head state.

On an initial clone, git creates a local branch in the superproject corresponding to the checked-out branch in the clone command (either the default branch or the branch specified in the -b option, if included).

Git does not create local branches in any of the submodules. Git assumes that you will be working in the root repository and not in the submodules. The checked-out branch in the submodule is irrelevant to the superproject.

If you want to work in (one or more of) the submodules anyway, then you have to create a local branch for yourself and check it out.

The detached head situation is not “weird” but “entirely expected” and “working as designed”. All detached head means is that a commit ID has been checked out rather than a named, local branch.

If, however, you want the submodule to be checked out to the same branch as that checked out in the superproject (e.g. main), then the way to address that is to call git switch main in the submodule repository.

This will have no effect on the superproject if the main branch in the submodule repository is at the same commit ID as the one pointed to by the superproject. If it is not, then switching to the main branch in the submodule repository will show up as a change in the superproject (the change being that the submodule repository is now pointing to a different commit). To accept that change in the superproject, simply git add the submodule folder and commit the change.

What does –remote-submodules do?

The –remote-submodules option does the following (according to the official documentation):

“Git will use the status of the submodule’s remote-tracking branch to update the submodule, rather than the superproject’s recorded SHA-1 (i.e. “commit ID”)”

That means that using this parameter may cause changes in the working tree of the superproject if the remote-tracking branch in the submodule repository does not point to the same commit as that referenced by the superproject.

“Tracking” a branch in a submodule

The basic submodule registration looks like this in the .gitmodules file.

[submodule "SharedRepo"]
    path = SharedRepo
    url = git@ssh.dev.azure.com:v3/ustertechnologies/uster.quantum/PoC.IMHSharedRepo

If you don’t plan on using –remote-submodules, then that’s all you need.

However, if you want to set up your git submodules so that the superproject knows which branch it should “track” in the submodule, use the following configuration:

[submodule "SharedRepo"]
    path = SharedRepo
    url = git@ssh.dev.azure.com:v3/ustertechnologies/uster.quantum/PoC.IMHSharedRepo
    branch = .
    update = rebase

Note that the branch name is “.”. This tells git to use the same branch name as that which is checked out in the superproject (if it exists; if it doesn’t, then git does nothing further). This allows you to set up the .gitmodules once and it works as expected for all branches. Otherwise, you run the risk of merging in a .gitmodules file that references a specific feature branch (for example) and you end up syncing with that feature branch by accident if you call submodule update with –remote.

The update action indicates how git should get to the desired commit if it needs to make a change. Again, this only applies if you explicitly tell git to use the head commit for the given branch on the remote instead of just using whichever commit is already referenced locally.

A remote-update example

A superproject will see an update if it follows a branch in the submodule (as outlined in the preceding section) and that branch in the submodule has gained new commits since the last time the superproject was updated (i.e. the superproject still references a commit in the submodule that does not correspond to the current HEAD of the branch in the submodule).

Using the –remote-submodules option is a way of cloning a superproject, but also updating its submodules to the latest commits instead of just checking out whatever is referenced in the superproject. It is a useful way of cloning a superproject with the latest commits in not only the superproject’s repository, but also all submodules. However, you are then not only checking out the current state of the repository, but also requesting updates to the referenced submodules.

This only works if the submodule reference specifies a branch, though. If it doesn’t, then git has no way of knowing which branch in the submodule repository it should update to. As noted above, setting this branch doesn’t mean that git will create a local branch in the submodule with that name and check it out; it just means that it will change the commit ID referenced by the superproject for that submodule if the commit referenced by that branch in the submodule is different than the commit currently referenced by the superproject.

Phew! We now know enough to determine the commands to use.

Useful Commands

We now have the base knowledge to work with git and submodules using the command line. This will be useful for e.g. setting up agents.

Imagine we have two repositories

  • Repository A has a main branch that tracks the main branch of submodule B (currently commit ID1)
  • The main branch in B points to commit ID1
  • Repository A has a feature/setup branch that tracks the feature/setup branch of submodule B (currently commit ID2)

The examples will use something like the following diagram to show results. The bold indicates the commit and branch that are checked out. A bold commit with a non-bold branch name indicates a detached head.

The diagram below shows the situation outlined above, with main checked out.

Clone with submodules

To clone a repository with submodules and check out the default branch in the superproject, execute the following:

git clone –recurse-submodules <URL>

This results in:

  • The superproject is cloned and checked out to the default branch
  • Each submodule is cloned and checked out to the commit referenced in the respective submodule definition
  • Submodules are in detached head state because git does not create local branches in submodules

Using the example from the start of this section, after executing this command, we will see:

No change from the example is expected.

Clone with submodules (and check out a branch)

To do the same as above, but check out a particular branch, execute the following:

git clone -b feature/setup –recurse-submodules <URL>

This results in the same as above, but the superproject is checked out to “feature/setup”. Using the example from the start of this section, after executing this command, we will see:

Update submodules after cloning

To update submodules after an initial clone (not necessary immediately after a clone, of course), execute the following:

git submodule update

This results in:

  • No changes to the superproject
  • Missing submodules are cloned
  • All submodules are checked out to the commit referenced in the respective submodule definition

Submodules where a change to the checked-out commit is required are in detached head state. If no change is made, then the submodule remains at which detached commit or branch was previously checked out

As with an initial clone, this command does not update any references to submodule commits.

Clone with submodules and update remote references

To not only clone a superproject and all of its submodules, but to also update references to those submodule’s latest HEADs (as outlined in the remote-submodules section above), execute the following:

git clone –recurse-submodules –remote-submodules <URL>

This results in:

  • The superproject is cloned and checked out to the default branch
  • Each submodule is cloned and checked out to the latest commit on the branch referenced in the respective submodule definition
  • Submodules are in detached head state because git does not create local branches in submodules

If, for example, the remote branch main in repository B had been updated to BID2, then the reference from A to B would also have been updated to BID2:

Update submodules to remote references

To update submodules after an initial clone and update references (as outlined in the remote-submodules section above), execute the following:

git submodule update –remote

This results in:

  • No changes to the superproject
  • Missing submodules are cloned
  • All submodules are checked out to the latest commit on the branch referenced in the respective submodule definition
  • Submodules where a change to the checked-out commit is required are in detached head state. If no change was made (i.e. the remote commit for that branch in the submodule is still the same commit as that referenced by the superproject), then the submodule remains either with a detached commit or whichever branch was already checked out

As when calling clone with –remote-submodules, this command updates submodule references. Therefore, if the remote branch main in repository B had been updated to ID3, then we would expect to see A referencing that commit in B.

Links

The following links were helpful in writing this documentation: