I'm not sure I understand what the problem is. I'm only talking about the logical objects, not the physical representation. You can still store diffs and you can still have history hashes if that helps you with storage, processing, etc. -- that's perfectly fine. The storage optimizations should be independent of the logical structure. I'm just saying the logical identity of a commit shouldn't depend on its history. For example, if someone removes a commit from the history, that shouldn't have to trash anyone's repo and be such a massively destructive operation. It should only cause a client (in the worst case) to resync its history hashes from that point onward the next time it pulls -- which is quite a cheap, fast, and non-intrusive operation. (Say, 100k commits with 20B SHA-1 hashes would just be ~2MB.)
I'm not sure I understand. How could removing a commit from the history not be a destructive operation? It would necessarily affect every commit after it, hashes or no hashes, because for each commit following, the state of the tree would change, and thus so would the commit.
To my mind It would be akin to walking across the room and then somehow changing things such that you took one step fewer than required.
I'm not sure what kind of implementation you're envisioning that could work the way you seem to describe. Or do you mean that git should save the entire state of the repository as independent blobs every time you commit something? I don't think you could do that with any hope of reasonable performance.
If you instead just allowed "removing" commits logically without actually physically altering the datastructure on disk, there's no point in providing the functionality in the first place.
> If you instead just allowed "removing" commits logically without actually physically altering the datastructure on disk
Yes
> there's no point in providing the functionality in the first place.
Why so dismissive? Wouldn't it make sense to give me the benefit of the doubt here and ask me what the point of something like this might be, instead of just shutting down it down as pointless? Unless you think I'm just dumb, or otherwise trying to troll here by asking for something pointless?
I am not being dismissive. If you provide functionality that allows the user to delete something without actually deleting it, what's the point of pretending that you can delete things? Usually when people want to delete commits, it's because they committed something like a secret, and really do want to delete it.
Git doesn't try to hide the fact that the committed data is immutable, and to accomplish "deletion" the only option is to rewrite the entire affected part of the datastructure and garbage-collect anything that's unreferenced. You can not modify a commit. You can only create new commits and manipulate references to them.
This is fundamentally what enables git to function in a distributed manner, since the only state between repositories that needs special logic are the references; the actual data could be blindly synced with rsync or something, because it practically speaking can't ever conflict.
In order to have useful global non-hash commit identifiers, you would need a separate data structure of references that somehow decides which commits are identical, and is capable of reconciling conflicts globally across all clones of a git repository. I'm pretty sure that this isn't even in theory possible for the general case.
As for signoffs, a change in history might make a change you signed off broken or completely irrelevant, so yes, I do think that a change in history can invalidate a signoff on a commit.