Why
Recently I decided to open source most of the projects I’ve worked on in Fall 1. For the classes that I’ve kept a private git repo on 2, I had about 4 repos, each with their own master branch, and another 1 group repo.
I could have easily pushed it all to GitHub but I would have separate individual repos, and I felt it would clutter up my repositories, since it’s unlikely I’ll work on those repositories again, and people who actually will look at it are most likely other students that take the class in the future.
Alternatively, copy pasting all the latest files would suffice, but I’d lose commit info, history, which doesn’t seem right to me.
Merging multiple project repos into one is however counterintuitive to the principle that for each distinct project, have a separate repo, but given this is all considered academia, I wanted to group it all together.
It took me more time than I’d care to admit, but I decided to share this here in case anyone else is looking to consolidate their repositories.
Prerequisites and Goals
Prerequisites for me currently are:
- 4 repos, all separate and isolated.
- Each with a
master
branch
Goals:
- Merge into one large repository
- Keep track of all commits
- Keep history intact
- Separate branches for each repo merged
One more prerequisite, have backups handy before beginning the merge, it’s too easy to fudge things up, I have, so always having a backup copy of your repositories never hurt. This is a disclaimer!
How to do it:
Step 0.
We need to first prepare the old repo for merging.
On the repo you want to merge, I’ll use old_A
as the repo name, create a directory with a unique name. It is important this directory name is not used in the other
repos you want to merge with, avoiding potential merge conflicts. Usually creating a directory with the name of the project and then moving it all down one level works.
On the current root directory of your git repository (named as a folder old_A
):
$ cd ..
$ mkdir project_A_repo
$ mv old_A/.git/ project_A_repo/
$ mv old_A/.gitignore project_A_repo (if you have this)
$ mv old_A/ project_A_repo/project_A
$ cd project_A_repo
$ git add project_A
$ git commit -am 'Preparing Project A repository for merging.'
$ git push (to sync commit with origin)
Here project_A
is the unique name of the folder. This is weird because we can’t move all files at the current directory one level down without going up one level.
You can also manually git mv subdir project_A/subdir
for each sub directory and file, I did that because it was simpler and didn’t have too many folders in each repo.
Or try something like this.
Step 1.
In a different directory that does not contain a git repo, initialize an empty repository in an empty directory with git init
. This will be our ‘master’ repository.
Step 2.
Add a readme and perform an initial commit.
$ touch README.md
$ git add README.md
$ git commit -am 'Inital Commit'
Step 3.
Create an empty branch named clean
. This will be the base branch we’ll merge other repositories into. This is necessary as the master
branch will no longer be empty later.
$ git branch clean
Step 4.
If we list the branches in our repo with git branch --all
. This is the output
clean
* master
Step 5.
Checkout the clean branch with git checkout clean
.
Notice that we’re now on the clean
branch.
(side note: if you’re now rereading this step merging in other repos; you must be on the clean
branch before proceeding!)
Step 6.
For this tutorial I’ll use old_A
as the old repo. We’ll add a remote branch to old_A
(named remote_A
, a remote repo can be a local dir with a .git
inside it, or a web address.
$ git remote add remote_A https://github.com/username/proj_A.git
*alternatively, git remote add remote_A ~/folder/to/project_A
Step 7.
Fetch all the branches!
$ git fetch --all
Here’s the sample output from me pulling in my legacy Github Pages repo.
Fetching remote_A
warning: no common commits
remote: Counting objects: 822, done.
remote: Total 822 (delta 0), reused 1 (delta 0), pack-reused 821
Receiving objects: 100% (822/822), 4.95 MiB | 1.94 MiB/s, done.
Resolving deltas: 100% (475/475), done.
From https://github.com/leewc/legacy
* [new branch] gh-pages -> remote_A/gh-pages
* [new branch] master -> remote_A/master
Step 8.
Time to create & checkout a branch we want to pull our old repository into. I’ll use project_A
as the ‘new’ branch we want to hold the old repo.
(Also, note that I’m using remote_A/master
as the branch I want pulled. You’ll need multiple branches for multiple branches (lol.. duh?) ).
$ git branch project_A remote_A/master
Branch project_A set up to track remote branch master from remote_A.
$ git checkout project_A
Switched to branch 'project_A'
Your branch is up-to-date with 'remote_A/master'.
Step 9.
If we do an ls
now, you’ll see your files from the old project A in remote_A
. If you didn’t you’re not on the right branch, or you messed up.
Also, if you ran git branch --all
you’ll get a master
branch, clean
branch, project_A
branch and all remote branches showing up.
Step 10.
Looks good, go back to the master branch.
$ git checkout master
Switched to branch 'master'
(some output regarding master branch)
Now we merge in the project_A
branch. We have a couple options here. If you do a regular git merge
, all commits will
show up on master, which is great if you want GitHub to track and show more commits, but this gets out of hand when you merge other projects, as the commits are
ordered by time, and you get nonsensical chronology, unrelated commits intersecting. It’s up to you, but let’s do git merge --squash
instead so we have one
squashed commit per project on master
.
$ git merge --squash project_A master
Auto-merging README.md
Squash commit -- not updating HEAD
Automatic merge went well; stopped before committing as requested
Step 11.
Make a commit showing what you did. (this won’t happen if you didn’t do squash
)
$ git commit -am 'Squashed proj_A'
[master 25af1be] Squashed proj_A
111 files changed, 9458 insertions(+)
Step 12.
Time to delete the remote branch url since we don’t need to keep a reference to it.
$ git remote remove remote_A
$ git branch --all
clean
* master
project_A
You have now removed the remote url
reference to proj_A
.
Step 13.
Just wanted to highlight at this point you can now delete your old repo, we don’t need it anymore. In this example I can delete the repo at https://github.com/username/proj_A.git
.
Of course if you’re messing with your one and only repo, it’s better you wait till we’re finished with the merge!
Step 14.
Add a new remote for the master repo, and then push the branch up to your remote! (E.g that GitHub url for your new repo you created, make sure you don’t initialize the remote repo with a README)
$ git remote add origin https://github.com/yourusername/master-repo-name.git
$ git branch --all
$ git push -u origin master
$ git push -u origin project_A
Step 15.
First Repeat Step 0. Then, repeat Step 5 to Step 13 for other repositories, pushing the repository to origin after with git push -u origin name_of_branch
Step 16.
You should now have a nice ‘master’ repo. With commits on the master being squashed commits, and each branch holding it’s own commit history. Rejoice!
Lastly, credit also goes to Julio’s post on merging repos here, it definitely helped.
Also, special thanks to Nate for beta testing this tutorial and correcting mistakes!