[llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
56 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev
Hi all,

I've spent some time in the last couple of days trying to figure out how
to adopt the [LLVM git monorepo prototype] for an out of tree backend.
TLDR: I'm not convinced that this prototype is the right approach to
converting to the monorepo, and I have a possible alternative.

The main problems I'm running into stem from the fact that this
prototype rewrites all of history from scratch rather than leverage the
existing [official git mirrors]. This makes migrating out-of-tree work
from the official git mirrors to this repo very difficult, since there
is no shared history. Some efforts have gone into [documenting how to
port in-progress patches], but this doesn't attempt to discuss how to
handle more substantial out of tree work.

Issues with integrating the prototype
-------------------------------------

As far as I can tell, my options for trying to integrate with this
monorepo are fairly limited.

If I merge my trees directly into the monorepo prototype at head, I end
up with two copies of every commit, one of which is a monorepo style
commit and one with the singular repo history. These commits are
completely unrelated to each other, and exist in two separate parallel
histories, making it difficult to correlate one to the other or even to
tell which is which.

An arguably cleaner solution would be try to recreate all of my trees'
history artificially as if they were based on the monorepo prototype
history all along, but this has two problems. First, it's a very
significant tooling effort to do this - I'd need to match up several
years of merge points to their corresponding spots in the monorepo
prototype and somehow redo all of the merges in the same ways. Tools
like "rebase --preserve-merges" don't really help here, since they abort
on merge conflicts and ask a human to resolve them again. Even if I were
to come up with tooling that managed this, I'm still left with a
completely new set of hashes for commits and no easy way to map them to
existing references in emails, bug trackers, and release notes.

Finally, there's the option of throwing away all of my history and
applying my out of tree work in a single patch. This makes git-log and
git-blame useless for investigating issues in my codebase for a few
years. It also means that when fixes go into older branches they can't
be merged forward and need to be redone by hand.

All of these have very significant drawbacks, and none of them really
sounds like a good option at all.

An alternative approach
-----------------------

All of these problems could be mitigated if we could preserve the
history of the existing git mirrors when generating the monorepo. There
are two ways to do this.

1. Start the monorepo by subtree-merging the various repos together at
   an arbitrary point in time.

2. "Zip" together the commits in each official git mirror repo by
   merging them into a combined view after each commit.

While I personally don't see a problem with (1), I've heard people claim
that they want to use the monorepo to bisect arbitrarily far back into
history. If this is the case, we'd prefer an approach like (2).

A zippered repository gives us a lot of the benefits of the prototype,
without a lot of the issues that are caused by rewriting history:

- The commits from the official git mirrors exist as they are now, and
  we don't need to deal with changing hashes.

- Out-of-tree branches have all of their history whether they opt in to
  creating a monorepo style history or not

- All of the repo's history is visible as a monorepo by looking only at
  the merge commits. Bisect scripts can easily filter to these.

- The monorepo commits and individual repo commits are easily
  discernible and have a direct link between them in git's DAG, making
  it easy to find one from the other.

To demonstrate this approach, I've put up a snapshot of what LLVM might
look like if we did this, using some scripts that Duncan wrote a while
back to experiment with the idea:

  https://github.com/bogner/llvm-zipper-prototype

Note that this is just a demo/prototype. It has some minor issues, isn't
being automatically updated, and I may regenerate it at some point.

Thoughts?

Thanks,
-- Justin Bogner

[LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm
[official git mirrors]: https://git.llvm.org/git/llvm.git
[documenting how to port in-progress patches]: https://reviews.llvm.org/D53414
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev
 > Issues with integrating the prototype
 > ...
 > difficult to correlate one to the other or even to tell which is which.
 >...
 > new set of hashes for commits and no easy way to map them to
 > existing references

I would like to point out that while majority of the issues described
here are very real,
mapping two commits to each other seems to be rather straightforward by
the means
of SVN revision, which is still available in any kind of git-svn
conversion I'v seen.

It might present considerable inconvenience when investigating history
manually,
but all the kinds of automation is possible (e.g. remapping commits in
bug-tracking system,
filtering commit messages to refer to proper commit SHAs etc).

The only configuration when SVN revision is not enough is when you need
to map commits
coming from different "subrepos", where there is no one-to-one
correspondence between
commits and SVN numbers.

regards,
   Fedor.

On 10/31/18 7:22 PM, Justin Bogner via llvm-dev wrote:

> Hi all,
>
> I've spent some time in the last couple of days trying to figure out how
> to adopt the [LLVM git monorepo prototype] for an out of tree backend.
> TLDR: I'm not convinced that this prototype is the right approach to
> converting to the monorepo, and I have a possible alternative.
>
> The main problems I'm running into stem from the fact that this
> prototype rewrites all of history from scratch rather than leverage the
> existing [official git mirrors]. This makes migrating out-of-tree work
> from the official git mirrors to this repo very difficult, since there
> is no shared history. Some efforts have gone into [documenting how to
> port in-progress patches], but this doesn't attempt to discuss how to
> handle more substantial out of tree work.
>
> Issues with integrating the prototype
> -------------------------------------
>
> As far as I can tell, my options for trying to integrate with this
> monorepo are fairly limited.
>
> If I merge my trees directly into the monorepo prototype at head, I end
> up with two copies of every commit, one of which is a monorepo style
> commit and one with the singular repo history. These commits are
> completely unrelated to each other, and exist in two separate parallel
> histories, making it difficult to correlate one to the other or even to
> tell which is which.
>
> An arguably cleaner solution would be try to recreate all of my trees'
> history artificially as if they were based on the monorepo prototype
> history all along, but this has two problems. First, it's a very
> significant tooling effort to do this - I'd need to match up several
> years of merge points to their corresponding spots in the monorepo
> prototype and somehow redo all of the merges in the same ways. Tools
> like "rebase --preserve-merges" don't really help here, since they abort
> on merge conflicts and ask a human to resolve them again. Even if I were
> to come up with tooling that managed this, I'm still left with a
> completely new set of hashes for commits and no easy way to map them to
> existing references in emails, bug trackers, and release notes.
>
> Finally, there's the option of throwing away all of my history and
> applying my out of tree work in a single patch. This makes git-log and
> git-blame useless for investigating issues in my codebase for a few
> years. It also means that when fixes go into older branches they can't
> be merged forward and need to be redone by hand.
>
> All of these have very significant drawbacks, and none of them really
> sounds like a good option at all.
>
> An alternative approach
> -----------------------
>
> All of these problems could be mitigated if we could preserve the
> history of the existing git mirrors when generating the monorepo. There
> are two ways to do this.
>
> 1. Start the monorepo by subtree-merging the various repos together at
>     an arbitrary point in time.
>
> 2. "Zip" together the commits in each official git mirror repo by
>     merging them into a combined view after each commit.
>
> While I personally don't see a problem with (1), I've heard people claim
> that they want to use the monorepo to bisect arbitrarily far back into
> history. If this is the case, we'd prefer an approach like (2).
>
> A zippered repository gives us a lot of the benefits of the prototype,
> without a lot of the issues that are caused by rewriting history:
>
> - The commits from the official git mirrors exist as they are now, and
>    we don't need to deal with changing hashes.
>
> - Out-of-tree branches have all of their history whether they opt in to
>    creating a monorepo style history or not
>
> - All of the repo's history is visible as a monorepo by looking only at
>    the merge commits. Bisect scripts can easily filter to these.
>
> - The monorepo commits and individual repo commits are easily
>    discernible and have a direct link between them in git's DAG, making
>    it easy to find one from the other.
>
> To demonstrate this approach, I've put up a snapshot of what LLVM might
> look like if we did this, using some scripts that Duncan wrote a while
> back to experiment with the idea:
>
>    https://github.com/bogner/llvm-zipper-prototype
>
> Note that this is just a demo/prototype. It has some minor issues, isn't
> being automatically updated, and I may regenerate it at some point.
>
> Thoughts?
>
> Thanks,
> -- Justin Bogner
>
> [LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm
> [official git mirrors]: https://git.llvm.org/git/llvm.git
> [documenting how to port in-progress patches]: https://reviews.llvm.org/D53414
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev
In reply to this post by Alberto Barbaro via llvm-dev
I'm going to try to stay out of the question of whether or not we should do it this way.  (We'll see if I succeed.  :)

But if we do decide to do it this way, it would be nice if we'd do an N-way merge when there's a single SVN commit that affects multiple git repos.

On Wed, Oct 31, 2018 at 9:22 AM Justin Bogner via llvm-dev <[hidden email]> wrote:
Hi all,

I've spent some time in the last couple of days trying to figure out how
to adopt the [LLVM git monorepo prototype] for an out of tree backend.
TLDR: I'm not convinced that this prototype is the right approach to
converting to the monorepo, and I have a possible alternative.

The main problems I'm running into stem from the fact that this
prototype rewrites all of history from scratch rather than leverage the
existing [official git mirrors]. This makes migrating out-of-tree work
from the official git mirrors to this repo very difficult, since there
is no shared history. Some efforts have gone into [documenting how to
port in-progress patches], but this doesn't attempt to discuss how to
handle more substantial out of tree work.

Issues with integrating the prototype
-------------------------------------

As far as I can tell, my options for trying to integrate with this
monorepo are fairly limited.

If I merge my trees directly into the monorepo prototype at head, I end
up with two copies of every commit, one of which is a monorepo style
commit and one with the singular repo history. These commits are
completely unrelated to each other, and exist in two separate parallel
histories, making it difficult to correlate one to the other or even to
tell which is which.

An arguably cleaner solution would be try to recreate all of my trees'
history artificially as if they were based on the monorepo prototype
history all along, but this has two problems. First, it's a very
significant tooling effort to do this - I'd need to match up several
years of merge points to their corresponding spots in the monorepo
prototype and somehow redo all of the merges in the same ways. Tools
like "rebase --preserve-merges" don't really help here, since they abort
on merge conflicts and ask a human to resolve them again. Even if I were
to come up with tooling that managed this, I'm still left with a
completely new set of hashes for commits and no easy way to map them to
existing references in emails, bug trackers, and release notes.

Finally, there's the option of throwing away all of my history and
applying my out of tree work in a single patch. This makes git-log and
git-blame useless for investigating issues in my codebase for a few
years. It also means that when fixes go into older branches they can't
be merged forward and need to be redone by hand.

All of these have very significant drawbacks, and none of them really
sounds like a good option at all.

An alternative approach
-----------------------

All of these problems could be mitigated if we could preserve the
history of the existing git mirrors when generating the monorepo. There
are two ways to do this.

1. Start the monorepo by subtree-merging the various repos together at
   an arbitrary point in time.

2. "Zip" together the commits in each official git mirror repo by
   merging them into a combined view after each commit.

While I personally don't see a problem with (1), I've heard people claim
that they want to use the monorepo to bisect arbitrarily far back into
history. If this is the case, we'd prefer an approach like (2).

A zippered repository gives us a lot of the benefits of the prototype,
without a lot of the issues that are caused by rewriting history:

- The commits from the official git mirrors exist as they are now, and
  we don't need to deal with changing hashes.

- Out-of-tree branches have all of their history whether they opt in to
  creating a monorepo style history or not

- All of the repo's history is visible as a monorepo by looking only at
  the merge commits. Bisect scripts can easily filter to these.

- The monorepo commits and individual repo commits are easily
  discernible and have a direct link between them in git's DAG, making
  it easy to find one from the other.

To demonstrate this approach, I've put up a snapshot of what LLVM might
look like if we did this, using some scripts that Duncan wrote a while
back to experiment with the idea:

  https://github.com/bogner/llvm-zipper-prototype

Note that this is just a demo/prototype. It has some minor issues, isn't
being automatically updated, and I may regenerate it at some point.

Thoughts?

Thanks,
-- Justin Bogner

[LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm
[official git mirrors]: https://git.llvm.org/git/llvm.git
[documenting how to port in-progress patches]: https://reviews.llvm.org/D53414
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev
Justin Lebar <[hidden email]> writes:
> I'm going to try to stay out of the question of whether or not we should do
> it this way.  (We'll see if I succeed.  :)
>
> But if we do decide to do it this way, it would be nice if we'd do an N-way
> merge when there's a single SVN commit that affects multiple git repos.

The prototype I linked to does this. See for example:

  https://github.com/bogner/llvm-zipper-prototype/commit/6258012a126e2c9eecc6ae70eabec71bd8f6a8f5

> On Wed, Oct 31, 2018 at 9:22 AM Justin Bogner via llvm-dev <
> [hidden email]> wrote:
>
>> Hi all,
>>
>> I've spent some time in the last couple of days trying to figure out how
>> to adopt the [LLVM git monorepo prototype] for an out of tree backend.
>> TLDR: I'm not convinced that this prototype is the right approach to
>> converting to the monorepo, and I have a possible alternative.
>>
>> The main problems I'm running into stem from the fact that this
>> prototype rewrites all of history from scratch rather than leverage the
>> existing [official git mirrors]. This makes migrating out-of-tree work
>> from the official git mirrors to this repo very difficult, since there
>> is no shared history. Some efforts have gone into [documenting how to
>> port in-progress patches], but this doesn't attempt to discuss how to
>> handle more substantial out of tree work.
>>
>> Issues with integrating the prototype
>> -------------------------------------
>>
>> As far as I can tell, my options for trying to integrate with this
>> monorepo are fairly limited.
>>
>> If I merge my trees directly into the monorepo prototype at head, I end
>> up with two copies of every commit, one of which is a monorepo style
>> commit and one with the singular repo history. These commits are
>> completely unrelated to each other, and exist in two separate parallel
>> histories, making it difficult to correlate one to the other or even to
>> tell which is which.
>>
>> An arguably cleaner solution would be try to recreate all of my trees'
>> history artificially as if they were based on the monorepo prototype
>> history all along, but this has two problems. First, it's a very
>> significant tooling effort to do this - I'd need to match up several
>> years of merge points to their corresponding spots in the monorepo
>> prototype and somehow redo all of the merges in the same ways. Tools
>> like "rebase --preserve-merges" don't really help here, since they abort
>> on merge conflicts and ask a human to resolve them again. Even if I were
>> to come up with tooling that managed this, I'm still left with a
>> completely new set of hashes for commits and no easy way to map them to
>> existing references in emails, bug trackers, and release notes.
>>
>> Finally, there's the option of throwing away all of my history and
>> applying my out of tree work in a single patch. This makes git-log and
>> git-blame useless for investigating issues in my codebase for a few
>> years. It also means that when fixes go into older branches they can't
>> be merged forward and need to be redone by hand.
>>
>> All of these have very significant drawbacks, and none of them really
>> sounds like a good option at all.
>>
>> An alternative approach
>> -----------------------
>>
>> All of these problems could be mitigated if we could preserve the
>> history of the existing git mirrors when generating the monorepo. There
>> are two ways to do this.
>>
>> 1. Start the monorepo by subtree-merging the various repos together at
>>    an arbitrary point in time.
>>
>> 2. "Zip" together the commits in each official git mirror repo by
>>    merging them into a combined view after each commit.
>>
>> While I personally don't see a problem with (1), I've heard people claim
>> that they want to use the monorepo to bisect arbitrarily far back into
>> history. If this is the case, we'd prefer an approach like (2).
>>
>> A zippered repository gives us a lot of the benefits of the prototype,
>> without a lot of the issues that are caused by rewriting history:
>>
>> - The commits from the official git mirrors exist as they are now, and
>>   we don't need to deal with changing hashes.
>>
>> - Out-of-tree branches have all of their history whether they opt in to
>>   creating a monorepo style history or not
>>
>> - All of the repo's history is visible as a monorepo by looking only at
>>   the merge commits. Bisect scripts can easily filter to these.
>>
>> - The monorepo commits and individual repo commits are easily
>>   discernible and have a direct link between them in git's DAG, making
>>   it easy to find one from the other.
>>
>> To demonstrate this approach, I've put up a snapshot of what LLVM might
>> look like if we did this, using some scripts that Duncan wrote a while
>> back to experiment with the idea:
>>
>>   https://github.com/bogner/llvm-zipper-prototype
>>
>> Note that this is just a demo/prototype. It has some minor issues, isn't
>> being automatically updated, and I may regenerate it at some point.
>>
>> Thoughts?
>>
>> Thanks,
>> -- Justin Bogner
>>
>> [LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm
>> [official git mirrors]: https://git.llvm.org/git/llvm.git
>> [documenting how to port in-progress patches]:
>> https://reviews.llvm.org/D53414
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev
In reply to this post by Alberto Barbaro via llvm-dev
On 10/31/2018 09:22 AM, Justin Bogner via llvm-dev wrote:
> Hi all,
>
> I've spent some time in the last couple of days trying to figure out how
> to adopt the [LLVM git monorepo prototype] for an out of tree backend.
> TLDR: I'm not convinced that this prototype is the right approach to
> converting to the monorepo, and I have a possible alternative.
>

I think it's too late at this point to start considering alternative
monorepo layouts.  We're already behind in getting the current monorepo
up and running, and I think discussing and implementing an alternative
will take too long and put our goal of moving off SVN by next year's
development meeting at risk.

Is it possible that the monorepo you have proposed could be used as an
aide to people trying to integrate out-of-tree branches into the current monorepo?
For example, would someone be able to merge their changes into your monorepo
and then cherry-pick them to the current monorepo?

-Tom

> The main problems I'm running into stem from the fact that this
> prototype rewrites all of history from scratch rather than leverage the
> existing [official git mirrors]. This makes migrating out-of-tree work
> from the official git mirrors to this repo very difficult, since there
> is no shared history. Some efforts have gone into [documenting how to
> port in-progress patches], but this doesn't attempt to discuss how to
> handle more substantial out of tree work.
>
> Issues with integrating the prototype
> -------------------------------------
>
> As far as I can tell, my options for trying to integrate with this
> monorepo are fairly limited.
>
> If I merge my trees directly into the monorepo prototype at head, I end
> up with two copies of every commit, one of which is a monorepo style
> commit and one with the singular repo history. These commits are
> completely unrelated to each other, and exist in two separate parallel
> histories, making it difficult to correlate one to the other or even to
> tell which is which.
>
> An arguably cleaner solution would be try to recreate all of my trees'
> history artificially as if they were based on the monorepo prototype
> history all along, but this has two problems. First, it's a very
> significant tooling effort to do this - I'd need to match up several
> years of merge points to their corresponding spots in the monorepo
> prototype and somehow redo all of the merges in the same ways. Tools
> like "rebase --preserve-merges" don't really help here, since they abort
> on merge conflicts and ask a human to resolve them again. Even if I were
> to come up with tooling that managed this, I'm still left with a
> completely new set of hashes for commits and no easy way to map them to
> existing references in emails, bug trackers, and release notes.
>
> Finally, there's the option of throwing away all of my history and
> applying my out of tree work in a single patch. This makes git-log and
> git-blame useless for investigating issues in my codebase for a few
> years. It also means that when fixes go into older branches they can't
> be merged forward and need to be redone by hand.
>
> All of these have very significant drawbacks, and none of them really
> sounds like a good option at all.
>
> An alternative approach
> -----------------------
>
> All of these problems could be mitigated if we could preserve the
> history of the existing git mirrors when generating the monorepo. There
> are two ways to do this.
>
> 1. Start the monorepo by subtree-merging the various repos together at
>    an arbitrary point in time.
>
> 2. "Zip" together the commits in each official git mirror repo by
>    merging them into a combined view after each commit.
>
> While I personally don't see a problem with (1), I've heard people claim
> that they want to use the monorepo to bisect arbitrarily far back into
> history. If this is the case, we'd prefer an approach like (2).
>
> A zippered repository gives us a lot of the benefits of the prototype,
> without a lot of the issues that are caused by rewriting history:
>
> - The commits from the official git mirrors exist as they are now, and
>   we don't need to deal with changing hashes.
>
> - Out-of-tree branches have all of their history whether they opt in to
>   creating a monorepo style history or not
>
> - All of the repo's history is visible as a monorepo by looking only at
>   the merge commits. Bisect scripts can easily filter to these.
>
> - The monorepo commits and individual repo commits are easily
>   discernible and have a direct link between them in git's DAG, making
>   it easy to find one from the other.
>
> To demonstrate this approach, I've put up a snapshot of what LLVM might
> look like if we did this, using some scripts that Duncan wrote a while
> back to experiment with the idea:
>
>   https://github.com/bogner/llvm-zipper-prototype
>
> Note that this is just a demo/prototype. It has some minor issues, isn't
> being automatically updated, and I may regenerate it at some point.
>
> Thoughts?
>
> Thanks,
> -- Justin Bogner
>
> [LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm
> [official git mirrors]: https://git.llvm.org/git/llvm.git
> [documenting how to port in-progress patches]: https://reviews.llvm.org/D53414
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev
Tom Stellard <[hidden email]> writes:

> On 10/31/2018 09:22 AM, Justin Bogner via llvm-dev wrote:
>> Hi all,
>>
>> I've spent some time in the last couple of days trying to figure out how
>> to adopt the [LLVM git monorepo prototype] for an out of tree backend.
>> TLDR: I'm not convinced that this prototype is the right approach to
>> converting to the monorepo, and I have a possible alternative.
>>
>
> I think it's too late at this point to start considering alternative
> monorepo layouts.  We're already behind in getting the current monorepo
> up and running, and I think discussing and implementing an alternative
> will take too long and put our goal of moving off SVN by next year's
> development meeting at risk.

The layout here is not at all different, only the process by which the
repo is generated. I strongly believe that a history preserving
conversion is very important if we want to avoid making porting
out-of-tree work horribly disruptive.

> Is it possible that the monorepo you have proposed could be used as an
> aide to people trying to integrate out-of-tree branches into the
> current monorepo?
> For example, would someone be able to merge their changes into your monorepo
> and then cherry-pick them to the current monorepo?

Cherry picking out of tree branches is not at all practical. If I have a
backend that's been in development for several years and has many
merges, cherry picking doesn't help. We'd probably need a tool that
regenerates the history "as-if" it had been done on the monorepo itself,
but besides being fairly difficult to do that has it's own problems that
I described below.

> -Tom
>
>> The main problems I'm running into stem from the fact that this
>> prototype rewrites all of history from scratch rather than leverage the
>> existing [official git mirrors]. This makes migrating out-of-tree work
>> from the official git mirrors to this repo very difficult, since there
>> is no shared history. Some efforts have gone into [documenting how to
>> port in-progress patches], but this doesn't attempt to discuss how to
>> handle more substantial out of tree work.
>>
>> Issues with integrating the prototype
>> -------------------------------------
>>
>> As far as I can tell, my options for trying to integrate with this
>> monorepo are fairly limited.
>>
>> If I merge my trees directly into the monorepo prototype at head, I end
>> up with two copies of every commit, one of which is a monorepo style
>> commit and one with the singular repo history. These commits are
>> completely unrelated to each other, and exist in two separate parallel
>> histories, making it difficult to correlate one to the other or even to
>> tell which is which.
>>
>> An arguably cleaner solution would be try to recreate all of my trees'
>> history artificially as if they were based on the monorepo prototype
>> history all along, but this has two problems. First, it's a very
>> significant tooling effort to do this - I'd need to match up several
>> years of merge points to their corresponding spots in the monorepo
>> prototype and somehow redo all of the merges in the same ways. Tools
>> like "rebase --preserve-merges" don't really help here, since they abort
>> on merge conflicts and ask a human to resolve them again. Even if I were
>> to come up with tooling that managed this, I'm still left with a
>> completely new set of hashes for commits and no easy way to map them to
>> existing references in emails, bug trackers, and release notes.
>>
>> Finally, there's the option of throwing away all of my history and
>> applying my out of tree work in a single patch. This makes git-log and
>> git-blame useless for investigating issues in my codebase for a few
>> years. It also means that when fixes go into older branches they can't
>> be merged forward and need to be redone by hand.
>>
>> All of these have very significant drawbacks, and none of them really
>> sounds like a good option at all.
>>
>> An alternative approach
>> -----------------------
>>
>> All of these problems could be mitigated if we could preserve the
>> history of the existing git mirrors when generating the monorepo. There
>> are two ways to do this.
>>
>> 1. Start the monorepo by subtree-merging the various repos together at
>>    an arbitrary point in time.
>>
>> 2. "Zip" together the commits in each official git mirror repo by
>>    merging them into a combined view after each commit.
>>
>> While I personally don't see a problem with (1), I've heard people claim
>> that they want to use the monorepo to bisect arbitrarily far back into
>> history. If this is the case, we'd prefer an approach like (2).
>>
>> A zippered repository gives us a lot of the benefits of the prototype,
>> without a lot of the issues that are caused by rewriting history:
>>
>> - The commits from the official git mirrors exist as they are now, and
>>   we don't need to deal with changing hashes.
>>
>> - Out-of-tree branches have all of their history whether they opt in to
>>   creating a monorepo style history or not
>>
>> - All of the repo's history is visible as a monorepo by looking only at
>>   the merge commits. Bisect scripts can easily filter to these.
>>
>> - The monorepo commits and individual repo commits are easily
>>   discernible and have a direct link between them in git's DAG, making
>>   it easy to find one from the other.
>>
>> To demonstrate this approach, I've put up a snapshot of what LLVM might
>> look like if we did this, using some scripts that Duncan wrote a while
>> back to experiment with the idea:
>>
>>   https://github.com/bogner/llvm-zipper-prototype
>>
>> Note that this is just a demo/prototype. It has some minor issues, isn't
>> being automatically updated, and I may regenerate it at some point.
>>
>> Thoughts?
>>
>> Thanks,
>> -- Justin Bogner
>>
>> [LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm
>> [official git mirrors]: https://git.llvm.org/git/llvm.git
>> [documenting how to port in-progress patches]: https://reviews.llvm.org/D53414
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev
On 10/31/2018 10:39 AM, Justin Bogner wrote:

> Tom Stellard <[hidden email]> writes:
>> On 10/31/2018 09:22 AM, Justin Bogner via llvm-dev wrote:
>>> Hi all,
>>>
>>> I've spent some time in the last couple of days trying to figure out how
>>> to adopt the [LLVM git monorepo prototype] for an out of tree backend.
>>> TLDR: I'm not convinced that this prototype is the right approach to
>>> converting to the monorepo, and I have a possible alternative.
>>>
>>
>> I think it's too late at this point to start considering alternative
>> monorepo layouts.  We're already behind in getting the current monorepo
>> up and running, and I think discussing and implementing an alternative
>> will take too long and put our goal of moving off SVN by next year's
>> development meeting at risk.
>
> The layout here is not at all different, only the process by which the
> repo is generated. I strongly believe that a history preserving
> conversion is very important if we want to avoid making porting
> out-of-tree work horribly disruptive.
>

The process is actually what I'm concerned about here, much more so than
the physical layout of the repo.  It takes time to discuss, develop
and debug a new process for automatically syncing from SVN to a new git
repository.  We've already gone through all these steps with the existing
monorepo, so switching to something else at this point would be a step
backwards in my opinion.

-Tom

>> Is it possible that the monorepo you have proposed could be used as an
>> aide to people trying to integrate out-of-tree branches into the
>> current monorepo?
>> For example, would someone be able to merge their changes into your monorepo
>> and then cherry-pick them to the current monorepo?
>
> Cherry picking out of tree branches is not at all practical. If I have a
> backend that's been in development for several years and has many
> merges, cherry picking doesn't help. We'd probably need a tool that
> regenerates the history "as-if" it had been done on the monorepo itself,
> but besides being fairly difficult to do that has it's own problems that
> I described below.
>
>> -Tom
>>
>>> The main problems I'm running into stem from the fact that this
>>> prototype rewrites all of history from scratch rather than leverage the
>>> existing [official git mirrors]. This makes migrating out-of-tree work
>>> from the official git mirrors to this repo very difficult, since there
>>> is no shared history. Some efforts have gone into [documenting how to
>>> port in-progress patches], but this doesn't attempt to discuss how to
>>> handle more substantial out of tree work.
>>>
>>> Issues with integrating the prototype
>>> -------------------------------------
>>>
>>> As far as I can tell, my options for trying to integrate with this
>>> monorepo are fairly limited.
>>>
>>> If I merge my trees directly into the monorepo prototype at head, I end
>>> up with two copies of every commit, one of which is a monorepo style
>>> commit and one with the singular repo history. These commits are
>>> completely unrelated to each other, and exist in two separate parallel
>>> histories, making it difficult to correlate one to the other or even to
>>> tell which is which.
>>>
>>> An arguably cleaner solution would be try to recreate all of my trees'
>>> history artificially as if they were based on the monorepo prototype
>>> history all along, but this has two problems. First, it's a very
>>> significant tooling effort to do this - I'd need to match up several
>>> years of merge points to their corresponding spots in the monorepo
>>> prototype and somehow redo all of the merges in the same ways. Tools
>>> like "rebase --preserve-merges" don't really help here, since they abort
>>> on merge conflicts and ask a human to resolve them again. Even if I were
>>> to come up with tooling that managed this, I'm still left with a
>>> completely new set of hashes for commits and no easy way to map them to
>>> existing references in emails, bug trackers, and release notes.
>>>
>>> Finally, there's the option of throwing away all of my history and
>>> applying my out of tree work in a single patch. This makes git-log and
>>> git-blame useless for investigating issues in my codebase for a few
>>> years. It also means that when fixes go into older branches they can't
>>> be merged forward and need to be redone by hand.
>>>
>>> All of these have very significant drawbacks, and none of them really
>>> sounds like a good option at all.
>>>
>>> An alternative approach
>>> -----------------------
>>>
>>> All of these problems could be mitigated if we could preserve the
>>> history of the existing git mirrors when generating the monorepo. There
>>> are two ways to do this.
>>>
>>> 1. Start the monorepo by subtree-merging the various repos together at
>>>    an arbitrary point in time.
>>>
>>> 2. "Zip" together the commits in each official git mirror repo by
>>>    merging them into a combined view after each commit.
>>>
>>> While I personally don't see a problem with (1), I've heard people claim
>>> that they want to use the monorepo to bisect arbitrarily far back into
>>> history. If this is the case, we'd prefer an approach like (2).
>>>
>>> A zippered repository gives us a lot of the benefits of the prototype,
>>> without a lot of the issues that are caused by rewriting history:
>>>
>>> - The commits from the official git mirrors exist as they are now, and
>>>   we don't need to deal with changing hashes.
>>>
>>> - Out-of-tree branches have all of their history whether they opt in to
>>>   creating a monorepo style history or not
>>>
>>> - All of the repo's history is visible as a monorepo by looking only at
>>>   the merge commits. Bisect scripts can easily filter to these.
>>>
>>> - The monorepo commits and individual repo commits are easily
>>>   discernible and have a direct link between them in git's DAG, making
>>>   it easy to find one from the other.
>>>
>>> To demonstrate this approach, I've put up a snapshot of what LLVM might
>>> look like if we did this, using some scripts that Duncan wrote a while
>>> back to experiment with the idea:
>>>
>>>   https://github.com/bogner/llvm-zipper-prototype
>>>
>>> Note that this is just a demo/prototype. It has some minor issues, isn't
>>> being automatically updated, and I may regenerate it at some point.
>>>
>>> Thoughts?
>>>
>>> Thanks,
>>> -- Justin Bogner
>>>
>>> [LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm
>>> [official git mirrors]: https://git.llvm.org/git/llvm.git
>>> [documenting how to port in-progress patches]: https://reviews.llvm.org/D53414
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev


Le mer. 31 oct. 2018 à 11:27, Tom Stellard via llvm-dev <[hidden email]> a écrit :
On 10/31/2018 10:39 AM, Justin Bogner wrote:
> Tom Stellard <[hidden email]> writes:
>> On 10/31/2018 09:22 AM, Justin Bogner via llvm-dev wrote:
>>> Hi all,
>>>
>>> I've spent some time in the last couple of days trying to figure out how
>>> to adopt the [LLVM git monorepo prototype] for an out of tree backend.
>>> TLDR: I'm not convinced that this prototype is the right approach to
>>> converting to the monorepo, and I have a possible alternative.
>>>
>>
>> I think it's too late at this point to start considering alternative
>> monorepo layouts.  We're already behind in getting the current monorepo
>> up and running, and I think discussing and implementing an alternative
>> will take too long and put our goal of moving off SVN by next year's
>> development meeting at risk.
>
> The layout here is not at all different, only the process by which the
> repo is generated. I strongly believe that a history preserving
> conversion is very important if we want to avoid making porting
> out-of-tree work horribly disruptive.
>

The process is actually what I'm concerned about here, much more so than
the physical layout of the repo.  It takes time to discuss, develop
and debug a new process for automatically syncing from SVN to a new git
repository.  We've already gone through all these steps with the existing
monorepo, so switching to something else at this point would be a step
backwards in my opinion.

At this point we can still consider it, I highly doubt that waiting a few weeks would jeopardize the one year deadline (that is really not that ambitious).
What we should do though in my opinion is go with strict deadlines: i.e. every stage of discussion should be open for a very limited time.
The current linear repo has the edge, but we for example we could leave this "zipper" proposal open for the next 1 week (or 2 if you want) as an RFC. Unless this alternative gets a high traction then we should close and move with the linear history repo.

After almost two years of more or less stagnation, I feel it'd be unfortunate to rush right now on what I perceive as important design point (especially for downstream users) like this one.

Cheers,

-- 
Mehdi


 

-Tom

>> Is it possible that the monorepo you have proposed could be used as an
>> aide to people trying to integrate out-of-tree branches into the
>> current monorepo?
>> For example, would someone be able to merge their changes into your monorepo
>> and then cherry-pick them to the current monorepo?
>
> Cherry picking out of tree branches is not at all practical. If I have a
> backend that's been in development for several years and has many
> merges, cherry picking doesn't help. We'd probably need a tool that
> regenerates the history "as-if" it had been done on the monorepo itself,
> but besides being fairly difficult to do that has it's own problems that
> I described below.
>
>> -Tom
>>
>>> The main problems I'm running into stem from the fact that this
>>> prototype rewrites all of history from scratch rather than leverage the
>>> existing [official git mirrors]. This makes migrating out-of-tree work
>>> from the official git mirrors to this repo very difficult, since there
>>> is no shared history. Some efforts have gone into [documenting how to
>>> port in-progress patches], but this doesn't attempt to discuss how to
>>> handle more substantial out of tree work.
>>>
>>> Issues with integrating the prototype
>>> -------------------------------------
>>>
>>> As far as I can tell, my options for trying to integrate with this
>>> monorepo are fairly limited.
>>>
>>> If I merge my trees directly into the monorepo prototype at head, I end
>>> up with two copies of every commit, one of which is a monorepo style
>>> commit and one with the singular repo history. These commits are
>>> completely unrelated to each other, and exist in two separate parallel
>>> histories, making it difficult to correlate one to the other or even to
>>> tell which is which.
>>>
>>> An arguably cleaner solution would be try to recreate all of my trees'
>>> history artificially as if they were based on the monorepo prototype
>>> history all along, but this has two problems. First, it's a very
>>> significant tooling effort to do this - I'd need to match up several
>>> years of merge points to their corresponding spots in the monorepo
>>> prototype and somehow redo all of the merges in the same ways. Tools
>>> like "rebase --preserve-merges" don't really help here, since they abort
>>> on merge conflicts and ask a human to resolve them again. Even if I were
>>> to come up with tooling that managed this, I'm still left with a
>>> completely new set of hashes for commits and no easy way to map them to
>>> existing references in emails, bug trackers, and release notes.
>>>
>>> Finally, there's the option of throwing away all of my history and
>>> applying my out of tree work in a single patch. This makes git-log and
>>> git-blame useless for investigating issues in my codebase for a few
>>> years. It also means that when fixes go into older branches they can't
>>> be merged forward and need to be redone by hand.
>>>
>>> All of these have very significant drawbacks, and none of them really
>>> sounds like a good option at all.
>>>
>>> An alternative approach
>>> -----------------------
>>>
>>> All of these problems could be mitigated if we could preserve the
>>> history of the existing git mirrors when generating the monorepo. There
>>> are two ways to do this.
>>>
>>> 1. Start the monorepo by subtree-merging the various repos together at
>>>    an arbitrary point in time.
>>>
>>> 2. "Zip" together the commits in each official git mirror repo by
>>>    merging them into a combined view after each commit.
>>>
>>> While I personally don't see a problem with (1), I've heard people claim
>>> that they want to use the monorepo to bisect arbitrarily far back into
>>> history. If this is the case, we'd prefer an approach like (2).
>>>
>>> A zippered repository gives us a lot of the benefits of the prototype,
>>> without a lot of the issues that are caused by rewriting history:
>>>
>>> - The commits from the official git mirrors exist as they are now, and
>>>   we don't need to deal with changing hashes.
>>>
>>> - Out-of-tree branches have all of their history whether they opt in to
>>>   creating a monorepo style history or not
>>>
>>> - All of the repo's history is visible as a monorepo by looking only at
>>>   the merge commits. Bisect scripts can easily filter to these.
>>>
>>> - The monorepo commits and individual repo commits are easily
>>>   discernible and have a direct link between them in git's DAG, making
>>>   it easy to find one from the other.
>>>
>>> To demonstrate this approach, I've put up a snapshot of what LLVM might
>>> look like if we did this, using some scripts that Duncan wrote a while
>>> back to experiment with the idea:
>>>
>>>   https://github.com/bogner/llvm-zipper-prototype
>>>
>>> Note that this is just a demo/prototype. It has some minor issues, isn't
>>> being automatically updated, and I may regenerate it at some point.
>>>
>>> Thoughts?
>>>
>>> Thanks,
>>> -- Justin Bogner
>>>
>>> [LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm
>>> [official git mirrors]: https://git.llvm.org/git/llvm.git
>>> [documenting how to port in-progress patches]: https://reviews.llvm.org/D53414
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev
In reply to this post by Alberto Barbaro via llvm-dev
Tom Stellard <[hidden email]> writes:

>> On 10/31/2018 10:39 AM, Justin Bogner wrote:
>>> Tom Stellard <[hidden email]> writes:
>>>> On 10/31/2018 09:22 AM, Justin Bogner via llvm-dev wrote:
>>>>> Hi all,
>>>>>
>>>>> I've spent some time in the last couple of days trying to figure out how
>>>>> to adopt the [LLVM git monorepo prototype] for an out of tree backend.
>>>>> TLDR: I'm not convinced that this prototype is the right approach to
>>>>> converting to the monorepo, and I have a possible alternative.
>>>>>
>>>>
>>>> I think it's too late at this point to start considering alternative
>>>> monorepo layouts.  We're already behind in getting the current monorepo
>>>> up and running, and I think discussing and implementing an alternative
>>>> will take too long and put our goal of moving off SVN by next year's
>>>> development meeting at risk.
>>>
>>> The layout here is not at all different, only the process by which the
>>> repo is generated. I strongly believe that a history preserving
>>> conversion is very important if we want to avoid making porting
>>> out-of-tree work horribly disruptive.
>>
>> The process is actually what I'm concerned about here, much more so than
>> the physical layout of the repo.  It takes time to discuss, develop
>> and debug a new process for automatically syncing from SVN to a new git
>> repository.  We've already gone through all these steps with the existing
>> monorepo, so switching to something else at this point would be a step
>> backwards in my opinion.

I appreciate the amount of effort you and others have put in to get us
this far, but in my opinion these steps are not quite complete. A lot of
people have just started actually trying to merge with the monorepo
prototype since it was announced that it's intended to be the official
one last week. While there's certainly been a lot of discussion about
the monorepo in general in the last couple of years, I really hadn't
seen much serious discussion in public about the actual conversion until
the "New LLVM git repository conversion prototype" thread earlier this
month.

Just to elaborate a bit on why I think this is important, I think the
difference between the two approaches to conversion have to do with what
we consider the real source of truth in our repository history. The
current prototype rebuilds everything with SVN as a source of truth and
throws out the official git mirrors, which sounds nice in theory, but
has pragmatic problems. The reality is that a lot of people have been
basing work off the git mirrors for a number of years now, so throwing
away that history causes real world problems.

Mehdi AMINI <[hidden email]> writes:

> At this point we can still consider it, I highly doubt that waiting a few
> weeks would jeopardize the one year deadline (that is really not that
> ambitious).
>
> What we should do though in my opinion is go with strict deadlines: i.e.
> every stage of discussion should be open for a very limited time.
> The current linear repo has the edge, but we for example we could leave
> this "zipper" proposal open for the next 1 week (or 2 if you want) as an
> RFC. Unless this alternative gets a high traction then we should close and
> move with the linear history repo.
>
> After almost two years of more or less stagnation, I feel it'd be
> unfortunate to rush right now on what I perceive as important design point
> (especially for downstream users) like this one.

I agree with this. I'd appreciate if we give this a bit of time for
other people to weigh in - I suspect others are hitting the same issues
as I in trying to integrate with this version of the monorepo.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev
In reply to this post by Alberto Barbaro via llvm-dev
Justin,

Could you show me an example of longer tree to to be migrated?
It's okay if one is not yours but public in the github.

I suggest we may provide the script to migrate deep tree.

1) Generate svnrev-hash maps for each the monorepo and other individual.git.
  (It may be delayed until (3))
2) Do git-fast-export the branch.
3) Do git-fast-import with substituting out-of-branch hashes.

I am not certain git-fast-export would be mature.
In contrast, I am certain git-fast-import is mature.

ps. I tried the zipper layout several years ago and I concluded it was not useful.
It's the reason why, in my monorepo, I grafted some commits to each corresponding commits of individual.git.
It just guaranteed my monorepo isn't orphan.
Note, I don't think such grafts were really useful.


Takumi


On Thu, Nov 1, 2018 at 1:22 AM Justin Bogner via llvm-dev <[hidden email]> wrote:
Hi all,

I've spent some time in the last couple of days trying to figure out how
to adopt the [LLVM git monorepo prototype] for an out of tree backend.
TLDR: I'm not convinced that this prototype is the right approach to
converting to the monorepo, and I have a possible alternative.

The main problems I'm running into stem from the fact that this
prototype rewrites all of history from scratch rather than leverage the
existing [official git mirrors]. This makes migrating out-of-tree work
from the official git mirrors to this repo very difficult, since there
is no shared history. Some efforts have gone into [documenting how to
port in-progress patches], but this doesn't attempt to discuss how to
handle more substantial out of tree work.

Issues with integrating the prototype
-------------------------------------

As far as I can tell, my options for trying to integrate with this
monorepo are fairly limited.

If I merge my trees directly into the monorepo prototype at head, I end
up with two copies of every commit, one of which is a monorepo style
commit and one with the singular repo history. These commits are
completely unrelated to each other, and exist in two separate parallel
histories, making it difficult to correlate one to the other or even to
tell which is which.

An arguably cleaner solution would be try to recreate all of my trees'
history artificially as if they were based on the monorepo prototype
history all along, but this has two problems. First, it's a very
significant tooling effort to do this - I'd need to match up several
years of merge points to their corresponding spots in the monorepo
prototype and somehow redo all of the merges in the same ways. Tools
like "rebase --preserve-merges" don't really help here, since they abort
on merge conflicts and ask a human to resolve them again. Even if I were
to come up with tooling that managed this, I'm still left with a
completely new set of hashes for commits and no easy way to map them to
existing references in emails, bug trackers, and release notes.

Finally, there's the option of throwing away all of my history and
applying my out of tree work in a single patch. This makes git-log and
git-blame useless for investigating issues in my codebase for a few
years. It also means that when fixes go into older branches they can't
be merged forward and need to be redone by hand.

All of these have very significant drawbacks, and none of them really
sounds like a good option at all.

An alternative approach
-----------------------

All of these problems could be mitigated if we could preserve the
history of the existing git mirrors when generating the monorepo. There
are two ways to do this.

1. Start the monorepo by subtree-merging the various repos together at
   an arbitrary point in time.

2. "Zip" together the commits in each official git mirror repo by
   merging them into a combined view after each commit.

While I personally don't see a problem with (1), I've heard people claim
that they want to use the monorepo to bisect arbitrarily far back into
history. If this is the case, we'd prefer an approach like (2).

A zippered repository gives us a lot of the benefits of the prototype,
without a lot of the issues that are caused by rewriting history:

- The commits from the official git mirrors exist as they are now, and
  we don't need to deal with changing hashes.

- Out-of-tree branches have all of their history whether they opt in to
  creating a monorepo style history or not

- All of the repo's history is visible as a monorepo by looking only at
  the merge commits. Bisect scripts can easily filter to these.

- The monorepo commits and individual repo commits are easily
  discernible and have a direct link between them in git's DAG, making
  it easy to find one from the other.

To demonstrate this approach, I've put up a snapshot of what LLVM might
look like if we did this, using some scripts that Duncan wrote a while
back to experiment with the idea:

  https://github.com/bogner/llvm-zipper-prototype

Note that this is just a demo/prototype. It has some minor issues, isn't
being automatically updated, and I may regenerate it at some point.

Thoughts?

Thanks,
-- Justin Bogner

[LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm
[official git mirrors]: https://git.llvm.org/git/llvm.git
[documenting how to port in-progress patches]: https://reviews.llvm.org/D53414
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev
In reply to this post by Alberto Barbaro via llvm-dev
On Wed, Oct 31, 2018 at 3:02 PM Justin Bogner via llvm-dev <[hidden email]> wrote:
Just to elaborate a bit on why I think this is important, I think the
difference between the two approaches to conversion have to do with what
we consider the real source of truth in our repository history. The
current prototype rebuilds everything with SVN as a source of truth and
throws out the official git mirrors, which sounds nice in theory, but
has pragmatic problems. The reality is that a lot of people have been
basing work off the git mirrors for a number of years now, so throwing
away that history causes real world problems.

When you say, SVN is the source of truth, I agree, it is, the official git mirrors were never the source of truth. Any time you would've had to upstream a patch with the current SVN system, you end up rebasing it and losing the merge history. I don't see how moving to the monorepo is different. It'll only be painful for a few years, as you've said, which is kind of to be expected.

That said, I haven't dug into the zipper proposal, maybe it's not a big imposition. I just never felt that the "official" git mirrors were particularly official, they were always just something for developers to use to get work done, available without too many implied promises of stability.

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev
In reply to this post by Alberto Barbaro via llvm-dev
NAKAMURA Takumi <[hidden email]> writes:
> Justin,
>
> Could you show me an example of longer tree to to be migrated?
> It's okay if one is not yours but public in the github.

Unfortunately my work is a proprietary backend, so I can't share it. It
would take quite a bit of effort to make something artificial that was
realistic.

You could perhaps looks at something like swift if you wanted to
experiment, but I don't know how complex their branching structure is.

> I suggest we may provide the script to migrate deep tree.
>
> 1) Generate svnrev-hash maps for each the monorepo and other individual.git.
>   (It may be delayed until (3))
> 2) Do git-fast-export the branch.
> 3) Do git-fast-import with substituting out-of-branch hashes.
>
> I am not certain git-fast-export would be mature.
> In contrast, I am certain git-fast-import is mature.

I have doubts about how effective this would be, and even if it works it
means every hash that's recorded in my bug tracker, in my commit
messages, and in release notes becomes invalid.

This seems much worse than the zipper layout to me.

> ps. I tried the zipper layout several years ago and I concluded it was not
> useful.
> It's the reason why, in my monorepo, I grafted some commits to each
> corresponding commits of individual.git.
> It just guaranteed my monorepo isn't orphan.
> Note, I don't think such grafts were really useful.

I'm not sure I understand what problems you found. Have you looked at
the repo with zipper layout I've prototyped at
https://github.com/bogner/llvm-zipper-prototype ?

>
> Takumi
>
>
> On Thu, Nov 1, 2018 at 1:22 AM Justin Bogner via llvm-dev <
> [hidden email]> wrote:
>
>> Hi all,
>>
>> I've spent some time in the last couple of days trying to figure out how
>> to adopt the [LLVM git monorepo prototype] for an out of tree backend.
>> TLDR: I'm not convinced that this prototype is the right approach to
>> converting to the monorepo, and I have a possible alternative.
>>
>> The main problems I'm running into stem from the fact that this
>> prototype rewrites all of history from scratch rather than leverage the
>> existing [official git mirrors]. This makes migrating out-of-tree work
>> from the official git mirrors to this repo very difficult, since there
>> is no shared history. Some efforts have gone into [documenting how to
>> port in-progress patches], but this doesn't attempt to discuss how to
>> handle more substantial out of tree work.
>>
>> Issues with integrating the prototype
>> -------------------------------------
>>
>> As far as I can tell, my options for trying to integrate with this
>> monorepo are fairly limited.
>>
>> If I merge my trees directly into the monorepo prototype at head, I end
>> up with two copies of every commit, one of which is a monorepo style
>> commit and one with the singular repo history. These commits are
>> completely unrelated to each other, and exist in two separate parallel
>> histories, making it difficult to correlate one to the other or even to
>> tell which is which.
>>
>> An arguably cleaner solution would be try to recreate all of my trees'
>> history artificially as if they were based on the monorepo prototype
>> history all along, but this has two problems. First, it's a very
>> significant tooling effort to do this - I'd need to match up several
>> years of merge points to their corresponding spots in the monorepo
>> prototype and somehow redo all of the merges in the same ways. Tools
>> like "rebase --preserve-merges" don't really help here, since they abort
>> on merge conflicts and ask a human to resolve them again. Even if I were
>> to come up with tooling that managed this, I'm still left with a
>> completely new set of hashes for commits and no easy way to map them to
>> existing references in emails, bug trackers, and release notes.
>>
>> Finally, there's the option of throwing away all of my history and
>> applying my out of tree work in a single patch. This makes git-log and
>> git-blame useless for investigating issues in my codebase for a few
>> years. It also means that when fixes go into older branches they can't
>> be merged forward and need to be redone by hand.
>>
>> All of these have very significant drawbacks, and none of them really
>> sounds like a good option at all.
>>
>> An alternative approach
>> -----------------------
>>
>> All of these problems could be mitigated if we could preserve the
>> history of the existing git mirrors when generating the monorepo. There
>> are two ways to do this.
>>
>> 1. Start the monorepo by subtree-merging the various repos together at
>>    an arbitrary point in time.
>>
>> 2. "Zip" together the commits in each official git mirror repo by
>>    merging them into a combined view after each commit.
>>
>> While I personally don't see a problem with (1), I've heard people claim
>> that they want to use the monorepo to bisect arbitrarily far back into
>> history. If this is the case, we'd prefer an approach like (2).
>>
>> A zippered repository gives us a lot of the benefits of the prototype,
>> without a lot of the issues that are caused by rewriting history:
>>
>> - The commits from the official git mirrors exist as they are now, and
>>   we don't need to deal with changing hashes.
>>
>> - Out-of-tree branches have all of their history whether they opt in to
>>   creating a monorepo style history or not
>>
>> - All of the repo's history is visible as a monorepo by looking only at
>>   the merge commits. Bisect scripts can easily filter to these.
>>
>> - The monorepo commits and individual repo commits are easily
>>   discernible and have a direct link between them in git's DAG, making
>>   it easy to find one from the other.
>>
>> To demonstrate this approach, I've put up a snapshot of what LLVM might
>> look like if we did this, using some scripts that Duncan wrote a while
>> back to experiment with the idea:
>>
>>   https://github.com/bogner/llvm-zipper-prototype
>>
>> Note that this is just a demo/prototype. It has some minor issues, isn't
>> being automatically updated, and I may regenerate it at some point.
>>
>> Thoughts?
>>
>> Thanks,
>> -- Justin Bogner
>>
>> [LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm
>> [official git mirrors]: https://git.llvm.org/git/llvm.git
>> [documenting how to port in-progress patches]:
>> https://reviews.llvm.org/D53414
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev
In reply to this post by Alberto Barbaro via llvm-dev
I'd suggest trying to script something like the following.

For each svn commit;
$ git replace <old> <new>

$ git filter-branch --index-filter `script to drop projects archived from the mono repo` HEAD..branch1 HEAD..branch2 ...


On 1 November 2018 at 08:40, NAKAMURA Takumi via llvm-dev <[hidden email]> wrote:
Justin,

Could you show me an example of longer tree to to be migrated?
It's okay if one is not yours but public in the github.

I suggest we may provide the script to migrate deep tree.

1) Generate svnrev-hash maps for each the monorepo and other individual.git.
  (It may be delayed until (3))
2) Do git-fast-export the branch.
3) Do git-fast-import with substituting out-of-branch hashes.

I am not certain git-fast-export would be mature.
In contrast, I am certain git-fast-import is mature.

ps. I tried the zipper layout several years ago and I concluded it was not useful.
It's the reason why, in my monorepo, I grafted some commits to each corresponding commits of individual.git.
It just guaranteed my monorepo isn't orphan.
Note, I don't think such grafts were really useful.


Takumi


On Thu, Nov 1, 2018 at 1:22 AM Justin Bogner via llvm-dev <[hidden email]> wrote:
Hi all,

I've spent some time in the last couple of days trying to figure out how
to adopt the [LLVM git monorepo prototype] for an out of tree backend.
TLDR: I'm not convinced that this prototype is the right approach to
converting to the monorepo, and I have a possible alternative.

The main problems I'm running into stem from the fact that this
prototype rewrites all of history from scratch rather than leverage the
existing [official git mirrors]. This makes migrating out-of-tree work
from the official git mirrors to this repo very difficult, since there
is no shared history. Some efforts have gone into [documenting how to
port in-progress patches], but this doesn't attempt to discuss how to
handle more substantial out of tree work.

Issues with integrating the prototype
-------------------------------------

As far as I can tell, my options for trying to integrate with this
monorepo are fairly limited.

If I merge my trees directly into the monorepo prototype at head, I end
up with two copies of every commit, one of which is a monorepo style
commit and one with the singular repo history. These commits are
completely unrelated to each other, and exist in two separate parallel
histories, making it difficult to correlate one to the other or even to
tell which is which.

An arguably cleaner solution would be try to recreate all of my trees'
history artificially as if they were based on the monorepo prototype
history all along, but this has two problems. First, it's a very
significant tooling effort to do this - I'd need to match up several
years of merge points to their corresponding spots in the monorepo
prototype and somehow redo all of the merges in the same ways. Tools
like "rebase --preserve-merges" don't really help here, since they abort
on merge conflicts and ask a human to resolve them again. Even if I were
to come up with tooling that managed this, I'm still left with a
completely new set of hashes for commits and no easy way to map them to
existing references in emails, bug trackers, and release notes.

Finally, there's the option of throwing away all of my history and
applying my out of tree work in a single patch. This makes git-log and
git-blame useless for investigating issues in my codebase for a few
years. It also means that when fixes go into older branches they can't
be merged forward and need to be redone by hand.

All of these have very significant drawbacks, and none of them really
sounds like a good option at all.

An alternative approach
-----------------------

All of these problems could be mitigated if we could preserve the
history of the existing git mirrors when generating the monorepo. There
are two ways to do this.

1. Start the monorepo by subtree-merging the various repos together at
   an arbitrary point in time.

2. "Zip" together the commits in each official git mirror repo by
   merging them into a combined view after each commit.

While I personally don't see a problem with (1), I've heard people claim
that they want to use the monorepo to bisect arbitrarily far back into
history. If this is the case, we'd prefer an approach like (2).

A zippered repository gives us a lot of the benefits of the prototype,
without a lot of the issues that are caused by rewriting history:

- The commits from the official git mirrors exist as they are now, and
  we don't need to deal with changing hashes.

- Out-of-tree branches have all of their history whether they opt in to
  creating a monorepo style history or not

- All of the repo's history is visible as a monorepo by looking only at
  the merge commits. Bisect scripts can easily filter to these.

- The monorepo commits and individual repo commits are easily
  discernible and have a direct link between them in git's DAG, making
  it easy to find one from the other.

To demonstrate this approach, I've put up a snapshot of what LLVM might
look like if we did this, using some scripts that Duncan wrote a while
back to experiment with the idea:

  https://github.com/bogner/llvm-zipper-prototype

Note that this is just a demo/prototype. It has some minor issues, isn't
being automatically updated, and I may regenerate it at some point.

Thoughts?

Thanks,
-- Justin Bogner

[LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm
[official git mirrors]: https://git.llvm.org/git/llvm.git
[documenting how to port in-progress patches]: https://reviews.llvm.org/D53414
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev
In reply to this post by Alberto Barbaro via llvm-dev
Hi,

Thanks for starting this discussion Justin!

On 10/31/18 5:22 PM, Justin Bogner via llvm-dev wrote:

> Hi all,
>
> I've spent some time in the last couple of days trying to figure out how
> to adopt the [LLVM git monorepo prototype] for an out of tree backend.
> TLDR: I'm not convinced that this prototype is the right approach to
> converting to the monorepo, and I have a possible alternative.
>
> The main problems I'm running into stem from the fact that this
> prototype rewrites all of history from scratch rather than leverage the
> existing [official git mirrors]. This makes migrating out-of-tree work
> from the official git mirrors to this repo very difficult, since there
> is no shared history. Some efforts have gone into [documenting how to
> port in-progress patches], but this doesn't attempt to discuss how to
> handle more substantial out of tree work.
>
> Issues with integrating the prototype
> -------------------------------------
>
> As far as I can tell, my options for trying to integrate with this
> monorepo are fairly limited.
>
> If I merge my trees directly into the monorepo prototype at head, I end
> up with two copies of every commit, one of which is a monorepo style
> commit and one with the singular repo history. These commits are
> completely unrelated to each other, and exist in two separate parallel
> histories, making it difficult to correlate one to the other or even to
> tell which is which.
>
> An arguably cleaner solution would be try to recreate all of my trees'
> history artificially as if they were based on the monorepo prototype
> history all along, but this has two problems. First, it's a very
> significant tooling effort to do this - I'd need to match up several
> years of merge points to their corresponding spots in the monorepo
> prototype and somehow redo all of the merges in the same ways. Tools
> like "rebase --preserve-merges" don't really help here, since they abort
> on merge conflicts and ask a human to resolve them again. Even if I were
> to come up with tooling that managed this, I'm still left with a
> completely new set of hashes for commits and no easy way to map them to
> existing references in emails, bug trackers, and release notes.
>
> Finally, there's the option of throwing away all of my history and
> applying my out of tree work in a single patch. This makes git-log and
> git-blame useless for investigating issues in my codebase for a few
> years. It also means that when fixes go into older branches they can't
> be merged forward and need to be redone by hand.
>
> All of these have very significant drawbacks, and none of them really
> sounds like a good option at all.
>

We're in this situation. We have over 7 years of git history for our
out-of-tree target and it would be a huge pain and drawback if we were
to lose that history by e.g. needing to apply all our changes as a
single patch to the new monorepo.

We haven't started moving to the monorepo yet so while we haven't hit
the issues in practice yet, we will. Preserving the history from the git
mirrors would surely be beneficial.

> An alternative approach
> -----------------------
>
> All of these problems could be mitigated if we could preserve the
> history of the existing git mirrors when generating the monorepo. There
> are two ways to do this.
>
> 1. Start the monorepo by subtree-merging the various repos together at
>     an arbitrary point in time.
>
> 2. "Zip" together the commits in each official git mirror repo by
>     merging them into a combined view after each commit.
>
> While I personally don't see a problem with (1), I've heard people claim
> that they want to use the monorepo to bisect arbitrarily far back into
> history. If this is the case, we'd prefer an approach like (2).
>
> A zippered repository gives us a lot of the benefits of the prototype,
> without a lot of the issues that are caused by rewriting history:
>
> - The commits from the official git mirrors exist as they are now, and
>    we don't need to deal with changing hashes.
>
> - Out-of-tree branches have all of their history whether they opt in to
>    creating a monorepo style history or not
>
> - All of the repo's history is visible as a monorepo by looking only at
>    the merge commits. Bisect scripts can easily filter to these.
>
> - The monorepo commits and individual repo commits are easily
>    discernible and have a direct link between them in git's DAG, making
>    it easy to find one from the other.
>
> To demonstrate this approach, I've put up a snapshot of what LLVM might
> look like if we did this, using some scripts that Duncan wrote a while
> back to experiment with the idea:
>
>    https://github.com/bogner/llvm-zipper-prototype

I took a quick look at the zipper prototype and I think it looks awesome!

(Then unfortunately gitk flipped out and after 40 minutes it ate 42GB of
memory (and continued grabbing more) but I don't know if that's a
problem that is perhaps solved in a more recent git version than I'm
running or what the problem really is.)

Thanks,
Mikael

>
> Note that this is just a demo/prototype. It has some minor issues, isn't
> being automatically updated, and I may regenerate it at some point.
>
> Thoughts?
>
> Thanks,
> -- Justin Bogner
>
> [LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm
> [official git mirrors]: https://git.llvm.org/git/llvm.git
> [documenting how to port in-progress patches]: https://reviews.llvm.org/D53414
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev
On Thu, 1 Nov 2018 at 08:45, Mikael Holmén via llvm-dev
<[hidden email]> wrote:

>
> Hi,
>
> Thanks for starting this discussion Justin!
>
> On 10/31/18 5:22 PM, Justin Bogner via llvm-dev wrote:
> > Hi all,
> >
> > I've spent some time in the last couple of days trying to figure out how
> > to adopt the [LLVM git monorepo prototype] for an out of tree backend.
> > TLDR: I'm not convinced that this prototype is the right approach to
> > converting to the monorepo, and I have a possible alternative.
> >
> > The main problems I'm running into stem from the fact that this
> > prototype rewrites all of history from scratch rather than leverage the
> > existing [official git mirrors]. This makes migrating out-of-tree work
> > from the official git mirrors to this repo very difficult, since there
> > is no shared history. Some efforts have gone into [documenting how to
> > port in-progress patches], but this doesn't attempt to discuss how to
> > handle more substantial out of tree work.
> >
> > Issues with integrating the prototype
> > -------------------------------------
> >
> > As far as I can tell, my options for trying to integrate with this
> > monorepo are fairly limited.
> >
> > If I merge my trees directly into the monorepo prototype at head, I end
> > up with two copies of every commit, one of which is a monorepo style
> > commit and one with the singular repo history. These commits are
> > completely unrelated to each other, and exist in two separate parallel
> > histories, making it difficult to correlate one to the other or even to
> > tell which is which.
> >
> > An arguably cleaner solution would be try to recreate all of my trees'
> > history artificially as if they were based on the monorepo prototype
> > history all along, but this has two problems. First, it's a very
> > significant tooling effort to do this - I'd need to match up several
> > years of merge points to their corresponding spots in the monorepo
> > prototype and somehow redo all of the merges in the same ways. Tools
> > like "rebase --preserve-merges" don't really help here, since they abort
> > on merge conflicts and ask a human to resolve them again. Even if I were
> > to come up with tooling that managed this, I'm still left with a
> > completely new set of hashes for commits and no easy way to map them to
> > existing references in emails, bug trackers, and release notes.
> >
> > Finally, there's the option of throwing away all of my history and
> > applying my out of tree work in a single patch. This makes git-log and
> > git-blame useless for investigating issues in my codebase for a few
> > years. It also means that when fixes go into older branches they can't
> > be merged forward and need to be redone by hand.
> >
> > All of these have very significant drawbacks, and none of them really
> > sounds like a good option at all.
> >
>
> We're in this situation. We have over 7 years of git history for our
> out-of-tree target and it would be a huge pain and drawback if we were
> to lose that history by e.g. needing to apply all our changes as a
> single patch to the new monorepo.
>
> We haven't started moving to the monorepo yet so while we haven't hit
> the issues in practice yet, we will. Preserving the history from the git
> mirrors would surely be beneficial.
>

We are also in the same situation for our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm
https://github.com/CTSRD-CHERI/clang
https://github.com/CTSRD-CHERI/lld). I am aware there were some
attempts at converting our repos to a monorepo structure a few years
ago according to
<http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html>.
However, I'm not sure if the script mentioned there can be reused with
the new git monorepo and it seems that it only handles clang. We would
have to also include our forks of llvm,lld,libunwind and libc++.

Thanks,
Alex

> > An alternative approach
> > -----------------------
> >
> > All of these problems could be mitigated if we could preserve the
> > history of the existing git mirrors when generating the monorepo. There
> > are two ways to do this.
> >
> > 1. Start the monorepo by subtree-merging the various repos together at
> >     an arbitrary point in time.
> >
> > 2. "Zip" together the commits in each official git mirror repo by
> >     merging them into a combined view after each commit.
> >
> > While I personally don't see a problem with (1), I've heard people claim
> > that they want to use the monorepo to bisect arbitrarily far back into
> > history. If this is the case, we'd prefer an approach like (2).
> >
> > A zippered repository gives us a lot of the benefits of the prototype,
> > without a lot of the issues that are caused by rewriting history:
> >
> > - The commits from the official git mirrors exist as they are now, and
> >    we don't need to deal with changing hashes.
> >
> > - Out-of-tree branches have all of their history whether they opt in to
> >    creating a monorepo style history or not
> >
> > - All of the repo's history is visible as a monorepo by looking only at
> >    the merge commits. Bisect scripts can easily filter to these.
> >
> > - The monorepo commits and individual repo commits are easily
> >    discernible and have a direct link between them in git's DAG, making
> >    it easy to find one from the other.
> >
> > To demonstrate this approach, I've put up a snapshot of what LLVM might
> > look like if we did this, using some scripts that Duncan wrote a while
> > back to experiment with the idea:
> >
> >    https://github.com/bogner/llvm-zipper-prototype
>
> I took a quick look at the zipper prototype and I think it looks awesome!
>
> (Then unfortunately gitk flipped out and after 40 minutes it ate 42GB of
> memory (and continued grabbing more) but I don't know if that's a
> problem that is perhaps solved in a more recent git version than I'm
> running or what the problem really is.)
>
> Thanks,
> Mikael
>
> >
> > Note that this is just a demo/prototype. It has some minor issues, isn't
> > being automatically updated, and I may regenerate it at some point.
> >
> > Thoughts?
> >
> > Thanks,
> > -- Justin Bogner
> >
> > [LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm
> > [official git mirrors]: https://git.llvm.org/git/llvm.git
> > [documenting how to port in-progress patches]: https://reviews.llvm.org/D53414
> > _______________________________________________
> > LLVM Developers mailing list
> > [hidden email]
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev
I just want to point out that the issue of incompatible history is not new. This has been getting discussed all the way back in July 2016.


As James said in that email:

That we'll be getting incompatible history has been glossed over, and it is
indeed really important to make it clear and have a good plan there. This
doesn't only affect actual "forks", it also affects every single developer
with a local git clone which contains unfinished work.

So, what is the plan with the existing mono-repo implementation? If there isn't one, then we should strongly consider alternative implementations of the mono-repo.

I also strongly believe we should not allow a schedule to force us to ignore significant problems in the proposals and implementations. Especially ones that we've known about for years.

-Chris

On Nov 1, 2018, at 6:27 AM, Alexander Richardson via llvm-dev <[hidden email]> wrote:

On Thu, 1 Nov 2018 at 08:45, Mikael Holmén via llvm-dev
<[hidden email]> wrote:

Hi,

Thanks for starting this discussion Justin!

On 10/31/18 5:22 PM, Justin Bogner via llvm-dev wrote:
Hi all,

I've spent some time in the last couple of days trying to figure out how
to adopt the [LLVM git monorepo prototype] for an out of tree backend.
TLDR: I'm not convinced that this prototype is the right approach to
converting to the monorepo, and I have a possible alternative.

The main problems I'm running into stem from the fact that this
prototype rewrites all of history from scratch rather than leverage the
existing [official git mirrors]. This makes migrating out-of-tree work
from the official git mirrors to this repo very difficult, since there
is no shared history. Some efforts have gone into [documenting how to
port in-progress patches], but this doesn't attempt to discuss how to
handle more substantial out of tree work.

Issues with integrating the prototype
-------------------------------------

As far as I can tell, my options for trying to integrate with this
monorepo are fairly limited.

If I merge my trees directly into the monorepo prototype at head, I end
up with two copies of every commit, one of which is a monorepo style
commit and one with the singular repo history. These commits are
completely unrelated to each other, and exist in two separate parallel
histories, making it difficult to correlate one to the other or even to
tell which is which.

An arguably cleaner solution would be try to recreate all of my trees'
history artificially as if they were based on the monorepo prototype
history all along, but this has two problems. First, it's a very
significant tooling effort to do this - I'd need to match up several
years of merge points to their corresponding spots in the monorepo
prototype and somehow redo all of the merges in the same ways. Tools
like "rebase --preserve-merges" don't really help here, since they abort
on merge conflicts and ask a human to resolve them again. Even if I were
to come up with tooling that managed this, I'm still left with a
completely new set of hashes for commits and no easy way to map them to
existing references in emails, bug trackers, and release notes.

Finally, there's the option of throwing away all of my history and
applying my out of tree work in a single patch. This makes git-log and
git-blame useless for investigating issues in my codebase for a few
years. It also means that when fixes go into older branches they can't
be merged forward and need to be redone by hand.

All of these have very significant drawbacks, and none of them really
sounds like a good option at all.


We're in this situation. We have over 7 years of git history for our
out-of-tree target and it would be a huge pain and drawback if we were
to lose that history by e.g. needing to apply all our changes as a
single patch to the new monorepo.

We haven't started moving to the monorepo yet so while we haven't hit
the issues in practice yet, we will. Preserving the history from the git
mirrors would surely be beneficial.


We are also in the same situation for our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm
https://github.com/CTSRD-CHERI/clang
https://github.com/CTSRD-CHERI/lld). I am aware there were some
attempts at converting our repos to a monorepo structure a few years
ago according to
<http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html>.
However, I'm not sure if the script mentioned there can be reused with
the new git monorepo and it seems that it only handles clang. We would
have to also include our forks of llvm,lld,libunwind and libc++.

Thanks,
Alex

An alternative approach
-----------------------

All of these problems could be mitigated if we could preserve the
history of the existing git mirrors when generating the monorepo. There
are two ways to do this.

1. Start the monorepo by subtree-merging the various repos together at
   an arbitrary point in time.

2. "Zip" together the commits in each official git mirror repo by
   merging them into a combined view after each commit.

While I personally don't see a problem with (1), I've heard people claim
that they want to use the monorepo to bisect arbitrarily far back into
history. If this is the case, we'd prefer an approach like (2).

A zippered repository gives us a lot of the benefits of the prototype,
without a lot of the issues that are caused by rewriting history:

- The commits from the official git mirrors exist as they are now, and
  we don't need to deal with changing hashes.

- Out-of-tree branches have all of their history whether they opt in to
  creating a monorepo style history or not

- All of the repo's history is visible as a monorepo by looking only at
  the merge commits. Bisect scripts can easily filter to these.

- The monorepo commits and individual repo commits are easily
  discernible and have a direct link between them in git's DAG, making
  it easy to find one from the other.

To demonstrate this approach, I've put up a snapshot of what LLVM might
look like if we did this, using some scripts that Duncan wrote a while
back to experiment with the idea:

  https://github.com/bogner/llvm-zipper-prototype

I took a quick look at the zipper prototype and I think it looks awesome!

(Then unfortunately gitk flipped out and after 40 minutes it ate 42GB of
memory (and continued grabbing more) but I don't know if that's a
problem that is perhaps solved in a more recent git version than I'm
running or what the problem really is.)

Thanks,
Mikael


Note that this is just a demo/prototype. It has some minor issues, isn't
being automatically updated, and I may regenerate it at some point.

Thoughts?

Thanks,
-- Justin Bogner

[LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm
[official git mirrors]: https://git.llvm.org/git/llvm.git
[documenting how to port in-progress patches]: https://reviews.llvm.org/D53414
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev

While my team doesn't have one, it's clear that out-of-tree backends are an important long-standing valuable use-case for downstream consumers of LLVM, and the new monorepo should try very hard NOT to make their lives difficult.

--paulr

 

From: llvm-dev [mailto:[hidden email]] On Behalf Of Chris Bieneman via llvm-dev
Sent: Thursday, November 01, 2018 1:27 PM
To: llvm-dev
Subject: Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

 

I just want to point out that the issue of incompatible history is not new. This has been getting discussed all the way back in July 2016.

 

 

As James said in that email:

 

That we'll be getting incompatible history has been glossed over, and it is

indeed really important to make it clear and have a good plan there. This

doesn't only affect actual "forks", it also affects every single developer

with a local git clone which contains unfinished work.

 

So, what is the plan with the existing mono-repo implementation? If there isn't one, then we should strongly consider alternative implementations of the mono-repo.

 

I also strongly believe we should not allow a schedule to force us to ignore significant problems in the proposals and implementations. Especially ones that we've known about for years.

 

-Chris



On Nov 1, 2018, at 6:27 AM, Alexander Richardson via llvm-dev <[hidden email]> wrote:

 

On Thu, 1 Nov 2018 at 08:45, Mikael Holmén via llvm-dev
<
[hidden email]> wrote:


Hi,

Thanks for starting this discussion Justin!

On 10/31/18 5:22 PM, Justin Bogner via llvm-dev wrote:

Hi all,

I've spent some time in the last couple of days trying to figure out how
to adopt the [LLVM git monorepo prototype] for an out of tree backend.
TLDR: I'm not convinced that this prototype is the right approach to
converting to the monorepo, and I have a possible alternative.

The main problems I'm running into stem from the fact that this
prototype rewrites all of history from scratch rather than leverage the
existing [official git mirrors]. This makes migrating out-of-tree work
from the official git mirrors to this repo very difficult, since there
is no shared history. Some efforts have gone into [documenting how to
port in-progress patches], but this doesn't attempt to discuss how to
handle more substantial out of tree work.

Issues with integrating the prototype
-------------------------------------

As far as I can tell, my options for trying to integrate with this
monorepo are fairly limited.

If I merge my trees directly into the monorepo prototype at head, I end
up with two copies of every commit, one of which is a monorepo style
commit and one with the singular repo history. These commits are
completely unrelated to each other, and exist in two separate parallel
histories, making it difficult to correlate one to the other or even to
tell which is which.

An arguably cleaner solution would be try to recreate all of my trees'
history artificially as if they were based on the monorepo prototype
history all along, but this has two problems. First, it's a very
significant tooling effort to do this - I'd need to match up several
years of merge points to their corresponding spots in the monorepo
prototype and somehow redo all of the merges in the same ways. Tools
like "rebase --preserve-merges" don't really help here, since they abort
on merge conflicts and ask a human to resolve them again. Even if I were
to come up with tooling that managed this, I'm still left with a
completely new set of hashes for commits and no easy way to map them to
existing references in emails, bug trackers, and release notes.

Finally, there's the option of throwing away all of my history and
applying my out of tree work in a single patch. This makes git-log and
git-blame useless for investigating issues in my codebase for a few
years. It also means that when fixes go into older branches they can't
be merged forward and need to be redone by hand.

All of these have very significant drawbacks, and none of them really
sounds like a good option at all.


We're in this situation. We have over 7 years of git history for our
out-of-tree target and it would be a huge pain and drawback if we were
to lose that history by e.g. needing to apply all our changes as a
single patch to the new monorepo.

We haven't started moving to the monorepo yet so while we haven't hit
the issues in practice yet, we will. Preserving the history from the git
mirrors would surely be beneficial.


We are also in the same situation for our out-of-tree CHERI backend
(
https://github.com/CTSRD-CHERI/llvm
https://github.com/CTSRD-CHERI/clang
https://github.com/CTSRD-CHERI/lld). I am aware there were some
attempts at converting our repos to a monorepo structure a few years
ago according to
<
http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html>.
However, I'm not sure if the script mentioned there can be reused with
the new git monorepo and it seems that it only handles clang. We would
have to also include our forks of llvm,lld,libunwind and libc++.

Thanks,
Alex


An alternative approach
-----------------------

All of these problems could be mitigated if we could preserve the
history of the existing git mirrors when generating the monorepo. There
are two ways to do this.

1. Start the monorepo by subtree-merging the various repos together at
   an arbitrary point in time.

2. "Zip" together the commits in each official git mirror repo by
   merging them into a combined view after each commit.

While I personally don't see a problem with (1), I've heard people claim
that they want to use the monorepo to bisect arbitrarily far back into
history. If this is the case, we'd prefer an approach like (2).

A zippered repository gives us a lot of the benefits of the prototype,
without a lot of the issues that are caused by rewriting history:

- The commits from the official git mirrors exist as they are now, and
  we don't need to deal with changing hashes.

- Out-of-tree branches have all of their history whether they opt in to
  creating a monorepo style history or not

- All of the repo's history is visible as a monorepo by looking only at
  the merge commits. Bisect scripts can easily filter to these.

- The monorepo commits and individual repo commits are easily
  discernible and have a direct link between them in git's DAG, making
  it easy to find one from the other.

To demonstrate this approach, I've put up a snapshot of what LLVM might
look like if we did this, using some scripts that Duncan wrote a while
back to experiment with the idea:

  https://github.com/bogner/llvm-zipper-prototype


I took a quick look at the zipper prototype and I think it looks awesome!

(Then unfortunately gitk flipped out and after 40 minutes it ate 42GB of
memory (and continued grabbing more) but I don't know if that's a
problem that is perhaps solved in a more recent git version than I'm
running or what the problem really is.)

Thanks,
Mikael



Note that this is just a demo/prototype. It has some minor issues, isn't
being automatically updated, and I may regenerate it at some point.

Thoughts?

Thanks,
-- Justin Bogner

[LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm
[official git mirrors]: https://git.llvm.org/git/llvm.git
[documenting how to port in-progress patches]: https://reviews.llvm.org/D53414
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev
Agreed. I also would argue that this problem isn't unique to out-of-tree backends. Generally it could impact any fork that has out-of-tree changes. I think out-of-tree backends is probably the most common type of use case for that, however it will also likely impact a variety of forks of LLVM projects. For example this will likely have impact on the Swift project's forks of LLVM & Clang which have out-of-tree modifications.

-Chris

On Nov 1, 2018, at 11:00 AM, [hidden email] wrote:

While my team doesn't have one, it's clear that out-of-tree backends are an important long-standing valuable use-case for downstream consumers of LLVM, and the new monorepo should try very hard NOT to make their lives difficult.
--paulr
From: llvm-dev [[hidden email]] On Behalf Of Chris Bieneman via llvm-dev
Sent: Thursday, November 01, 2018 1:27 PM
To: llvm-dev
Subject: Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo
 
I just want to point out that the issue of incompatible history is not new. This has been getting discussed all the way back in July 2016.
 
 
As James said in that email:
 
That we'll be getting incompatible history has been glossed over, and it is
indeed really important to make it clear and have a good plan there. This
doesn't only affect actual "forks", it also affects every single developer
with a local git clone which contains unfinished work.
 
So, what is the plan with the existing mono-repo implementation? If there isn't one, then we should strongly consider alternative implementations of the mono-repo.
 
I also strongly believe we should not allow a schedule to force us to ignore significant problems in the proposals and implementations. Especially ones that we've known about for years.
 
-Chris


On Nov 1, 2018, at 6:27 AM, Alexander Richardson via llvm-dev <[hidden email]> wrote:
 
On Thu, 1 Nov 2018 at 08:45, Mikael Holmén via llvm-dev
<
[hidden email]> wrote:


Hi,

Thanks for starting this discussion Justin!

On 10/31/18 5:22 PM, Justin Bogner via llvm-dev wrote:

Hi all,

I've spent some time in the last couple of days trying to figure out how
to adopt the [LLVM git monorepo prototype] for an out of tree backend.
TLDR: I'm not convinced that this prototype is the right approach to
converting to the monorepo, and I have a possible alternative.

The main problems I'm running into stem from the fact that this
prototype rewrites all of history from scratch rather than leverage the
existing [official git mirrors]. This makes migrating out-of-tree work
from the official git mirrors to this repo very difficult, since there
is no shared history. Some efforts have gone into [documenting how to
port in-progress patches], but this doesn't attempt to discuss how to
handle more substantial out of tree work.

Issues with integrating the prototype
-------------------------------------

As far as I can tell, my options for trying to integrate with this
monorepo are fairly limited.

If I merge my trees directly into the monorepo prototype at head, I end
up with two copies of every commit, one of which is a monorepo style
commit and one with the singular repo history. These commits are
completely unrelated to each other, and exist in two separate parallel
histories, making it difficult to correlate one to the other or even to
tell which is which.

An arguably cleaner solution would be try to recreate all of my trees'
history artificially as if they were based on the monorepo prototype
history all along, but this has two problems. First, it's a very
significant tooling effort to do this - I'd need to match up several
years of merge points to their corresponding spots in the monorepo
prototype and somehow redo all of the merges in the same ways. Tools
like "rebase --preserve-merges" don't really help here, since they abort
on merge conflicts and ask a human to resolve them again. Even if I were
to come up with tooling that managed this, I'm still left with a
completely new set of hashes for commits and no easy way to map them to
existing references in emails, bug trackers, and release notes.

Finally, there's the option of throwing away all of my history and
applying my out of tree work in a single patch. This makes git-log and
git-blame useless for investigating issues in my codebase for a few
years. It also means that when fixes go into older branches they can't
be merged forward and need to be redone by hand.

All of these have very significant drawbacks, and none of them really
sounds like a good option at all.


We're in this situation. We have over 7 years of git history for our
out-of-tree target and it would be a huge pain and drawback if we were
to lose that history by e.g. needing to apply all our changes as a
single patch to the new monorepo.

We haven't started moving to the monorepo yet so while we haven't hit
the issues in practice yet, we will. Preserving the history from the git
mirrors would surely be beneficial.


We are also in the same situation for our out-of-tree CHERI backend
(
https://github.com/CTSRD-CHERI/llvm
https://github.com/CTSRD-CHERI/clang
https://github.com/CTSRD-CHERI/lld). I am aware there were some
attempts at converting our repos to a monorepo structure a few years
ago according to
<
http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html>.
However, I'm not sure if the script mentioned there can be reused with
the new git monorepo and it seems that it only handles clang. We would
have to also include our forks of llvm,lld,libunwind and libc++.

Thanks,
Alex


An alternative approach
-----------------------

All of these problems could be mitigated if we could preserve the
history of the existing git mirrors when generating the monorepo. There
are two ways to do this.

1. Start the monorepo by subtree-merging the various repos together at
   an arbitrary point in time.

2. "Zip" together the commits in each official git mirror repo by
   merging them into a combined view after each commit.

While I personally don't see a problem with (1), I've heard people claim
that they want to use the monorepo to bisect arbitrarily far back into
history. If this is the case, we'd prefer an approach like (2).

A zippered repository gives us a lot of the benefits of the prototype,
without a lot of the issues that are caused by rewriting history:

- The commits from the official git mirrors exist as they are now, and
  we don't need to deal with changing hashes.

- Out-of-tree branches have all of their history whether they opt in to
  creating a monorepo style history or not

- All of the repo's history is visible as a monorepo by looking only at
  the merge commits. Bisect scripts can easily filter to these.

- The monorepo commits and individual repo commits are easily
  discernible and have a direct link between them in git's DAG, making
  it easy to find one from the other.

To demonstrate this approach, I've put up a snapshot of what LLVM might
look like if we did this, using some scripts that Duncan wrote a while
back to experiment with the idea:

  https://github.com/bogner/llvm-zipper-prototype

I took a quick look at the zipper prototype and I think it looks awesome!

(Then unfortunately gitk flipped out and after 40 minutes it ate 42GB of
memory (and continued grabbing more) but I don't know if that's a
problem that is perhaps solved in a more recent git version than I'm
running or what the problem really is.)

Thanks,
Mikael



Note that this is just a demo/prototype. It has some minor issues, isn't
being automatically updated, and I may regenerate it at some point.

Thoughts?

Thanks,
-- Justin Bogner

[LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm
[official git mirrors]: https://git.llvm.org/git/llvm.git
[documenting how to port in-progress patches]: https://reviews.llvm.org/D53414
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev
In reply to this post by Alberto Barbaro via llvm-dev
Justin Bogner via llvm-dev <[hidden email]> writes:

> The layout here is not at all different, only the process by which the
> repo is generated. I strongly believe that a history preserving
> conversion is very important if we want to avoid making porting
> out-of-tree work horribly disruptive.

How would an out-of-tree branch be ported with this new approach?  Do
you have scripts to do it?

                          -David
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Alberto Barbaro via llvm-dev
In reply to this post by Alberto Barbaro via llvm-dev
Justin,

I thought we may provide yet another zipper repo, that contains individual.git and the monorepo.
I guess it would make easier to interact between your repo and the monorepo.

As guys mentioned, a zipper repo is useful for migration, but I think it would be hard to live in my daily development.

I will experiment.

Takumi

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
123