This page shows the source for this entry, with WebCore formatting language tags and attributes highlighted.

Title

Working with Git Submodules

Description

<h>Introduction</h> The intended audience of this document is people interested in knowing which commands to execute to update submodules. The initial analysis section is intended for people interested in knowing how the commands work and what their strengths/weaknesses are. The inspiration for this documentation was that I was wondering whether submodules were always cloned with <i>detached heads</i> and if there were some way to avoid that. The short answers to these questions are, respectively, "yes" and "no". Skip to the <a href="#useful-commands">examples below</a> to just see the commands and their effects. At the end of the document are <a href="#links">links to pages</a> referenced to produce this documentation. <h>Terminology</h> In the discussion below, the term <i>superproject</i> refers to the <i>root</i> repository that contains submodule references. It comes from the git documentation where they make the distinction because submodules can be nested. Suppose, we have multiple nesting, as shown below. <code> 📁 A 📁 B 📁 C </code> <ul> <c>A</c> is the <i>root</i> repository of both <c>B</c> and <c>C</c> <c>A</c> is the <i>superproject</i> of <c>B</c> <c>B</c> is the <i>superproject</i> of <c>C</c> </ul> <h>Where do submodules go?</h> Submodules are stored <i>inside</i> another repository. For a simple we would see the following: <code> 📁 A 📁 .git 📁 modules 📁 B 📄 config (worktree = ../../../B) 📁 B 📄 .git (points to ../.git/modules/B) </code> The submodule's <c>git</c> folder is stored in the superproject's <c>git</c> folder and is replaced by a file that references the new location. The submodule uses the <a href="/Documentation/Tools/Git/Git-Trainings#session-3%3A-submodules%2C-interactive-rebase%2C-forensics">worktrees</a> feature to check out to a different folder. <h level="3">Can I share a local copy of a submodule?</h> No. Storing the working tree of the submodule outside of the repository is not supported. Why would you want to do that anyway? One use case is that you have two repositories, each of which includes the same submodule, as shown below. <code> 📁 A 📁 B 📁 C 📁 B </code> Instead of using two copies, you might think you could make the superprojects refer to the <i>same</i> copy of the submodule. <code> 📁 A (refers to ../B) 📁 B 📁 C (refers to ../B) </code> <ul> The advantage would be that changes made to <c>A</c> would immediately be available in <c>C</c> However, it would no longer be possible to make <c>A</c> and <c>C</c> refer to different commits </ul> Whereas you can <a href="https://stackoverflow.com/a/37151960/178874">manually move a submodule outside of the repository</a> <i>after you've cloned it</i>, you cannot configure a superproject's submodules in a way that Git will be able to <c>clone</c> properly. If you try it, you'll probably get an error message like, <code> fatal: No url found for submodule path 'SUBMODULE.NAME' in .gitmodules </code> The next section explains how you can share local commits for testing. <h level="3">Testing submodule changes in multiple projects</h> Assume, as above, that there are two copies of the submodule, B<sup>A</sup> and B<sup>C</sup>. Suppose there are commits in B<sup>A</sup> that have been tested with <c>A</c>, but should also be tested with <c>C</c>. One way to test <c>C</c> would be to push the commits in B<sup>A</sup> and then pull them from B<sup>C</sup>. That involves a round-trip to the server, which is not optimal, but relatively straightforward. Another way to test <c>C</c> would be to add the local B<sup>A</sup> as a <i>remote</i> to B<sup>C</sup> and then <i>check out</i> the commit from B<sup>A</sup> directly. To set up a remote called <c>B_A</c> in B<sup>C</sup>, execute: <code> git remote add B_A ../../A/B </code> The testing flow would be, roughly, <ul> Test changes to submodule B<sup>A</sup> in <c>A</c> Create commit <c>#1</c> in B<sup>A</sup> Fetch from <c>B_A</c> into B<sup>C</sup> Check out commit <c>#1</c> in B<sup>C</sup> Test changes in <c>C</c> Repeat as needed </ul> <h>What to expect when cloning with submodules</h> A clone of a superproject (a repository with submodules) fetches submodules only when required (e.g. when <c>--recurse-submodules</c> is included). If submodules are fetched, then git sets the checked-out commit in each submodule to the commit ID specified for that module in the superproject. This makes sense because that is the correct commit to use. However, this also means that, after a clone, all submodules will be in a <i>detached head</i> state. On an initial clone, git creates a local branch in the superproject corresponding to the checked-out branch in the clone command (either the default branch or the branch specified in the <c>-b</c> option, if included). Git does not create local branches in any of the submodules. Git assumes that you will be working in the root repository and not in the submodules. The checked-out branch in the submodule is irrelevant to the superproject. If you want to work in (one or more of) the submodules anyway, then you have to create a local branch for yourself and check it out. The <i>detached head</i> situation is not "weird" but "entirely expected" and "working as designed". All <i>detached head</i> means is that a commit ID has been checked out rather than a named, local branch. If, however, you want the submodule to be checked out to the same branch as that checked out in the superproject (e.g. <i>main</i>), then the way to address that is to call <c>git switch main</c> in the submodule repository. This will have no effect on the superproject if the <i>main</i> branch in the submodule repository is at the same commit ID as the one pointed to by the superproject. If it is not, then switching to the <i>main</i> branch in the submodule repository will show up as a change in the superproject (the change being that the submodule repository is now pointing to a different commit). To accept that change in the superproject, simply <c>git add</c> the submodule folder and commit the change. <h><span id="what-does-%60--remote-submodules%60-do%3F">What does <c>--remote-submodules</c> do?</span></h> The <c>--remote-submodules</c> option does the following (according to the official documentation): <bq>Git will use the status of the submodule's remote-tracking branch to update the submodule, rather than the superproject's recorded SHA-1 (i.e. "commit ID")</bq> That means that using this parameter may cause changes in the working tree of the superproject if the remote-tracking branch in the submodule repository does not point to the same commit as that referenced by the superproject. <h level="3">"Tracking" a branch in a submodule</h> The basic submodule registration looks like this in the .gitmodules file. <code> [submodule "SharedRepo"] path = SharedRepo url = git@ssh.dev.azure.com:v3/ustertechnologies/uster.quantum/PoC.IMHSharedRepo </code> If you don't plan on using <c>--remote-submodules</c>, then that's all you need. However, if you want to set up your git submodules so that the superproject knows which branch it should "track" in the submodule, use the following configuration: <code> [submodule "SharedRepo"] path = SharedRepo url = git@ssh.dev.azure.com:v3/ustertechnologies/uster.quantum/PoC.IMHSharedRepo branch = . update = rebase </code> Note that the branch name is ".". This tells git to use the same branch name as that which is checked out in the superproject (if it exists; if it doesn't, then git does nothing further). This allows you to set up the <c>.gitmodules</c> once and it works as expected for all branches. Otherwise, you run the risk of merging in a <c>.gitmodules</c> file that references a specific feature branch (for example) and you end up syncing with that feature branch by accident if you call submodule update with <c>--remote</c>. The update action indicates how git should get to the desired commit if it needs to make a change. Again, this only applies if you explicitly tell git to use the head commit for the given branch on the remote instead of just using whichever commit is already referenced locally. <h level="3">A remote-update example</h> A superproject will see an update if it <i>follows</i> a branch in the submodule (as outlined in the preceding section) and that branch in the submodule has gained new commits since the last time the superproject was updated (i.e. the superproject still references a commit in the submodule that does not correspond to the current <c>HEAD</c> of the branch in the submodule). Using the <c>--remote-submodules</c> option is a way of cloning a superproject, but also updating its submodules to the latest commits instead of just checking out whatever is referenced in the superproject. It is a useful way of cloning a superproject with the latest commits in not only the superproject's repository, but also all submodules. However, you are then not only checking out the <i>current</i> state of the repository, but also requesting updates to the referenced submodules. This only works if the submodule reference specifies a branch, though. If it doesn't, then git has no way of knowing which branch in the submodule repository it should update to. As noted above, setting this branch doesn't mean that git will create a local branch in the submodule with that name and check it out; it just means that it will change the commit ID referenced by the superproject for that submodule if the commit referenced by that branch in the submodule is different than the commit currently referenced by the superproject. Phew! We now know enough to determine the commands to use. <h><span id="useful-commands">Useful Commands</span></h> We now have the base knowledge to work with git and submodules using the command line. This will be useful for e.g. setting up agents. Imagine we have two repositories <ul> Repository <i>A</i> has a <i>main</i> branch that tracks the <i>main</i> branch of submodule <i>B</i> (currently commit <i>ID1</i>) The <i>main</i> branch in <i>B</i> points to commit <i>ID1</i> Repository <i>A</i> has a <i>feature/setup</i> branch that tracks the <i>feature/setup</i> branch of submodule <i>B</i> (currently commit <i>ID2</i>) </ul> The examples will use something like the following diagram to show results. The bold indicates the commit and branch that are checked out. A bold commit with a non-bold branch name indicates a <i>detached head</i>. The diagram below shows the situation outlined above, with <i>main</i> checked out. <img src="{att_link}original.jpg" href="{att_link}original.jpg" align="none" scale="50%"> <h level="3">Clone with submodules</h> To clone a repository with submodules and check out the default branch in the superproject, execute the following: <code> git clone --recurse-submodules <url> </code> This results in: <ul> The superproject is cloned and checked out to the default branch Each submodule is cloned and checked out to the commit referenced in the respective submodule definition Submodules are in <i>detached head</i> state because git does not create local branches in submodules </ul> Using the example from the start of this section, after executing this command, we will see: <img src="{att_link}clone_with_submodules.jpg" href="{att_link}clone_with_submodules.jpg" align="none" scale="50%"> No change from the example is expected. <h level="3">Clone with submodules (and check out a branch)</h> To do the same as above, but check out a particular branch, execute the following: <code> git clone -b feature/setup --recurse-submodules <url> </code> This results in the same as above, but the superproject is checked out to "feature/setup". Using the example from the start of this section, after executing this command, we will see: <img src="{att_link}clone_with_submodules_and_check_out_branch.jpg" href="{att_link}clone_with_submodules_and_check_out_branch.jpg" align="none" scale="50%"> <h level="3">Update submodules after cloning</h> To update submodules after an initial clone (not necessary immediately after a clone, of course), execute the following: <code> git submodule update </code> This results in: <ul> No changes to the superproject Missing submodules are cloned All submodules are checked out to the commit referenced in the respective submodule definition </ul> Submodules where a change to the checked-out commit is required are in detached head state. If no change is made, then the submodule remains at which detached commit or branch was previously checked out As with an initial clone, this command <i>does not</i> update any references to submodule commits. <img src="{att_link}clone_with_submodules.jpg" href="{att_link}clone_with_submodules.jpg" align="none" scale="50%"> <h level="3">Clone with submodules and update remote references</h> To not only clone a superproject and all of its submodules, but to also update references to those submodule's latest HEADs (as outlined in the <a href="#what-does-%60--remote-submodules%60-do%3F">remote-submodules section</a> above), execute the following: <code> git clone --recurse-submodules --remote-submodules <url> </code> This results in: <ul> The superproject is cloned and checked out to the default branch Each submodule is cloned and checked out to the latest commit on the branch referenced in the respective submodule definition Submodules are in <i>detached head</i> state because git does not create local branches in submodules </ul> If, for example, the remote branch <i>main</i> in repository <i>B</i> had been updated to <i>BID2</i>, then the reference from <i>A</i> to <i>B</i> would also have been updated to <i>BID2</i>: <img src="{att_link}clone_with_submodules_and_update_references.jpg" href="{att_link}clone_with_submodules_and_update_references.jpg" align="none" scale="50%"> <h level="3">Update submodules to remote references</h> To update submodules after an initial clone <i>and</i> update references (as outlined in the <a href="#what-does-%60--remote-submodules%60-do%3F">remote-submodules section</a> above), execute the following: <code> git submodule update --remote </code> This results in: <ul> No changes to the superproject Missing submodules are cloned All submodules are checked out to the latest commit on the branch referenced in the respective submodule definition Submodules where a change to the checked-out commit is required are in detached head state. If no change was made (i.e. the remote commit for that branch in the submodule is still the same commit as that referenced by the superproject), then the submodule remains either with a detached commit or whichever branch was already checked out </ul> As when calling clone with <c>--remote-submodules</c>, this command updates submodule references. Therefore, if the remote branch <i>main</i> in repository <i>B</i> had been updated to <i>ID3</i>, then we would expect to see <i>A</i> referencing that commit in <i>B</i>. <img src="{att_link}clone_with_submodules_and_update_references.jpg" href="{att_link}clone_with_submodules_and_update_references.jpg" align="none" scale="50%"> <h><span id="links">Links</span></h> The following links were helpful in writing this documentation: <ul> <a href="https://stackoverflow.com/questions/18770545/why-is-my-git-submodule-head-detached-from-master">Why is my Git Submodule HEAD detached from master?</a> <a href="https://stackoverflow.com/questions/20794979/git-submodule-is-in-detached-head-state-after-cloning-and-submodule-update">Git submodule is in "detached head" state after cloning and submodule update</a> <a href="https://stackoverflow.com/questions/3965676/why-did-my-git-repo-enter-a-detached-head-state">Why did my Git repo enter a detached HEAD state?</a> <a href="https://stackoverflow.com/questions/1777854/how-can-i-specify-a-branch-tag-when-adding-a-git-submodule">How can I specify a branch/tag when adding a Git submodule?</a> <a href="https://git-scm.com/docs/git-clone">git clone</a> <a href="https://git-scm.com/docs/git-submodule">git submodule</a> </ul>