Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s interesting that they prefer to develop such a tool rather than giving up on the monorepo concept.


One very counterintuitive truth of large scale software development is that as you scale to multiple services, you are gravitationally pulled into a monorepo. There are different forces at work but the strongest at this scale are your data models and API definitions.


The problems with data models can happen at small scale. I remember the first large-ish project that I built, it had a few different components but the problem started when I tried to introduce data models to the Java and Python parts (I had it in my head after reading a book that I needed domain objects or some other nonsense in each language)...mistake, the data was still changing, it took forever to make changes, wasn't critical but I learned my lesson very quickly.

One perspective on this is that many ORM libraries don't take the DB as the source of truth (one very good ORM library that does, and which saved me in this case, was JOOQ). I think a lot of small-scale problems could be solved this way, monorepo is just another variation of this solution: moving the source of truth into the repo.

It is surprising to me how often variations of this problem come up. Obviously, there are solutions from multiple directions: having cross-language definitions (ProtoBuf), Arrow (zero-copy abstractions suitable for high performance), maybe even Swagger which comes at the problem from documentation...but I think this problem still comes up anywhere (and the DB approach is, imo, a very strong approach with a decent ORM at smaller scale).


Does your schema for data models (inception!) have a revision associated with it? If not, deployments are going to be spicy. If so, you end up having to deal with version-rot. Part of why putting this in the repo with your source code is a winning solution is that when you're working off head, you will naturally pick up and test the latest thing, and in most cases your next deploy will also just naturally roll forward as well.


Is it? I'm working for a small company and even at our scale, when the workflow is centralized, I find git a bit painful at times. I mean it's still an amazing tool, don't get me wrong, but when you have to deal with several sub-projects that you have to keep in sync and need to evolve together, I find that it gets messy real fast.

I think the core issue with git is that the submodule thing is obviously an afterthought that's cobbled together on top of the preexisting SCM instead of something that was taken into account from the start. It's better than nothing, but it's probably the one aspect of git that sometimes makes me long for SVN.

At the scale of something like Facebook you'd either have to pay a team to implement and support your split repo framework, or you'd have to pay a team to implement and support your monorepo framework. I don't have enough experience with such large codebases to claim expertise, but I would probably go for a monorepo as well based on my personal experience. Seems like the most straightforward, flexible and easier to scale approach.


If your company is small, I don't think you should be using git submodules at all.

My last place was about 10 years young, 150 engineers, and was still working within a single git repo without submodules.

There is a non-zero amount of discoverable config that goes into managing a repo like that, but it's trivial compared to the ongoing headaches of managing submodules, like you suggest.


We need to track large external projects (buildroot, the Linux kernel for instance) so the ability to include them as submodules and update them fairly easily is worth it IMO. If you're at the scale of Google it probably makes vastly more sense just including the code in your monorepo and pay a bunch of engineers to merge back and forth with upstream and have the rest of your team not worry about it, but for us it would take a lot of time and effort to maintain a clone of these projects in a bespoke repository.


We have customer IP that not everyone is allowed to access and has to be deleted after the project is done. We use submodules and IMO it sucks but I don't see a way around it considering the restrictions.


For extremely small companies (N == 1) git submodules can be neat though. It’s a great way to create small libs without having to bother distribution through LuaRocks, npm, RubyGems and the like.


Submodules are a great way to break out libraries in a language-agnostic way without having them really be broken out. This is independent of team size.


Dan Luu wrote about monorepo. It's worth a read https://danluu.com/monorepo/


You need good tooling to work with large monorepos, you need good tooling to work with large multirepos. Neither option is easy at that scale.


Do Facebook and Google literally have repos with everything they write in there available to everyone that works there (modulo privileged stuff)?


For a little more color on your modulo, the major omission in google3 I can recall from ~9 years ago was Android. For Reasons, I think legal.

The others weren’t “oh huh” enough to be easily recalled writing this comment, which probably speaks to their interestingness. But yes, you can chdir from search to calendar to borg and their dependencies, internal and vendored. It’s pretty much all there. It was pretty splendid, actually, and influences my thoughts on monos to this day.


Not quite, but almost.


the monorepo is handy: simplifies dependency management


But it ads complexity and creates its own issues.


Monorepo (on git) has been awesome for us the last 5 years or so.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: