A crash course in package management, Node, and Yarn v2

I've been writing some guides at my startup, Nerve, for newer engineers. My aim is to cover things you don't necessarily have to know to do your job, but which will probably come in handy at some point in your career.

This is a guide to package management, which I always found to be tricky and mysterious until I took the time to learn about it a little. At Nerve we use Typescript on the frontend and backend, so about half of this guide is Typescript-specific.

Package managers are an indispensable part of any software project, large or small. However, it's not always clear what they're doing behind the scenes or why - especially when things go wrong. This guide is for you if you want to understand more about how package management works, whether you're fixing a bug, adding new dependencies, or are just curious. We'll start by taking a look at the general design of a package manager, then discuss some of the idiosyncrasies of Yarn, the package manager we use at Nerve.

Why are there package managers?

All but the very earliest software has been built on code written by other people, and we call that "other people's code" a library. You use libraries, and so does every other developer; that implies that every time someone downloads and uses a piece of software they also need to get the libraries it requires to run - its dependencies. How should you go about distributing your software's dependencies? One option is to bundle your program with its dependencies and have the user download everything at once. This will work and is done in practice, especially for consumer software, but in certain situations it can be pretty wasteful. In particular, if we use many pieces of software that depend on the same library, we can save disk space and bandwidth by having them all share a copy, instead of each of them having their own. This leads to the second approach to libraries, which is to bundle each library into its own downloadable unit (aka a package) and have the user download libraries individually. In this scenario the user is responsible for making sure they have the correct libraries available for their programs to use. This, for example, is what /usr/lib is for on Linux systems - it's your 'library of libraries', so to speak.

Managing your own libraries turns out to be very hard, for many reasons. For example, consider that libraries can change over time, and any program you use has only been tested with a particular version of its dependencies. If you have a library that is a later version than the one your program has been tested with, and that library behaves differently than it did before, your program could break. So you have to find out what version of the library your program needs, and make sure that's the version you have. If you upgrade your program you have to check all the versions again! To make matters worse, sometimes two programs can depend on different versions of the same library. If that happens, you need to have two copies of your library around, and you have to find a way to make sure that the right version is exposed to the right program (sometimes this is possible and sometimes it isn't.) Libraries can use other libraries too, so you also need to manage your dependencies' dependencies, and so on. Take all this together and you can spend days trying to download just the right packages before anything will run at all (this is the infamous state known as dependency hell.)

Finding and downloading all the right dependencies is boring, fiddly work, and not something humans are really cut out for. It's exactly the sort of thing computers are cut out for, though, so it was only a matter of time before someone wrote a program to do this work for us. Now these programs are everywhere, and we call them package managers!

The structure of a package manager

To better understand how package managers work, let's consider what we'd want to put in our own hypothetical package manager. We'll call it mpm, for My Awesome Package Manager.

A quick aside: the universe of package managers can be roughly divided into two camps, which we'll refer to as user-centric package managers and developer-centric package managers. User-centric package managers are, unsurprisingly, mainly concerned with installing packages on your computer. Usually these packages are executable - so, for example, if you ask the package manager to install the git package, it will install it and its entire dependency tree, taking care to make sure all the versions are correct, etc. After the package manager completes successfully, you can run the git package and it will just work! Usually the package manager will, as a courtesy, symlink the appropriate part of the package to /usr/bin so you can e.g. just type git at the command line and have that work too. Examples of user-centric package managers include apt-get, dnf, and homebrew.

Developer-centric package managers, on the other hand, are typically used by a developer to manage the dependencies of the program they're developing while they're developing it. This may sound like an easier task then managing packages for multiple programs, but it's not; all the problems mentioned above re: versioning and conflicts can and do happen between dependencies of a single project. Developer-centric package managers mainly work the same way user-centric package managers do, but they have a couple extra features to make them suited to a development environment. Developer-centric package managers include pipenv for Python, bundler for Ruby, maven for Java, and npm and yarn for Javascript. Sometimes developer-centric package managers can be made to act like user-centric package managers - see, for example, npm's -g option.

We're going to design mpm to be a developer-centric package manager. Just remember that a lot of what we're going to learn applies to user-centric package managers too!

Versioning and manifests

What do we need from mpm? Well, for one, it should be able to install dependencies in such a way that we can run our program without it breaking due to version mismatches. So, if package A depends on package B, mpm needs to know what version of package B package A has been tested with. Package A needs to declare this somewhere, in a format that mpm can understand! We'll solve this in the way most other package managers do - by requiring every package (including the program or library under development, which is technically a top-level package of its own) to have a manifest file that lives next to the code. A manifest includes, at minimum, the name and version of the package, and the names and versions of every other package that package depends on (in yarn and npm, the manifest file is called package.json.) Now when mpm is installing the dependencies for a package, it can check the manifest to ensure that it's getting all the right versions. Starting at the root package, mpm goes down and installs each package's dependencies, and then dependencies of dependencies, and so on, until every package has what it needs to run!

This is a good start, but what happens when two packages depend on different versions of the same dependency? To make sure nothing breaks, we need to have two copies of the same dependency. We can distinguish between them by looking at the version specified in the manifest, or by looking at the directory or archive name if the version number is included there. We would still like to save as much space as possible by ensuring we don't have multiple copies of the same dependency with the same version number - some package managers are better about this than others (Node package managers have historically had trouble with this; hence all the memes about giant node_modules directories.)

Importing packages

Our newest version of mpm can now handle version conflicts, but now it is no longer possible to identify a unique dependency just from its name alone - we need the name and version number. This raises the question - how does our code know how to load the correct dependency? After all, in Javascript you say import * as React from 'react', not import * as React from 'react-17.0.2', even if react-17.0.2 is the dependency specified in your manifest. We need a way to point the code to the right version of the dependency.

Sometimes, the runtime has its own special logic for where to look during imports, and this usually goes hand in hand with a special format for structuring packages. Node, for example, mandates that all of a package's dependencies be put in a special node_modules directory inside the package. Each dependency should have its dependencies in its own node_modules, and so on. Now there is guaranteed to only be one copy of each dependency in a given node_modules folder, since each node_modules folder only contains the dependencies of a single package. This means Node can look dependencies up by name alone - it just looks in the node_modules folder of the package that made the import call (the drawback to this approach, as mentioned above, is that we end up with more package duplication than we need.) npm, the most popular package manager for Node/Javascript, installs dependencies by setting up nested node_modules directories in a way Node can understand.

Some package managers handle package imports themselves, instead of letting the runtime do it. The specific details of how this is done vary across package managers and are beyond the scope of this guide, but in general they do it by patching your running code to add a mechanism that takes a bare dependency name at runtime, consults the appropriate manifest, and loads the correct version of the dependency (some package managers may achieve the same effect by changing the environment variables your code references, or by changing where your code is run.) This is why, for example, you sometimes need to run bundle exec (or require bundler/setup) to start your ruby project, or yarn run to start your Node project - bundler needs to patch require and yarn needs to patch import (well, yarn v2 does - see the section on yarn below.)

No matter which of these two approaches is used, the outcome is the same: all version information is kept in your package manifest, instead of in your code. You can simply import using bare dependency names, and the right version will be loaded!

Dependency ranges

It looks like mpm is now doing what we need it to. It makes sure all the dependencies for our app are correctly installed and versioned, and provides our running code with the correct version of each dependency when requested. We're finished, right?

Well, you'll have to take our word for it, but using mpm on a real-world project would reveal a few more things we need to change. For example, if every dependency is locked to only one version, then the odds of two dependencies with the same name also having the same version is fairly low, which means many copies of each dependency must be kept around. This can bloat a project to an unworkable size as the number of dependencies grows. What we need is a little more flexibility in our manifests - a way for packages to declare that they work with many different versions of a package, instead of just one. For example, perhaps package A works with any version of package B lower than 2.1.0, or package C works with any version of package D that starts with 1.3. This is harder for the package owner to guarantee, but it lets our package manager have some more leeway in determining how to get everybody's dependencies satisfied with as few versions of each package as possible. Most modern package managers use range specifiers in their manifests to let a package depend on a range of versions instead of a single one. If you look in any of our package.json files, you'll see characters like ^ (means anything at or above the specified version) and * (means anything at all!)

Semantic versioning (semver)

Now a package can depend on a range of versions, but how can package authors really validate that their software works across the whole range? If you wanted to declare that, for example, you could handle version 2.* of a certain package, you'd have to run a test for every sub-version of version 2 - and there could be hundreds! What's worse, if version 2 was under active development, you'd have to run a new test every time a new sub-version was released. The problem here is that you can't really make any guarantees just by knowing the package version; even though version 2.1.2 of a dependency will work with your code, 2.1.3 might break it. The dev community ran into this same problem, and decided to imbue version numbers with a certain amount of meaning - a practice called Semantic Versioning, or semver. Under semver, each version has three parts, with each part corresponding to a certain guarantee. The general format is <MAJOR_VERSION>.<MINOR_VERSION>.<PATCH_VERSION>. The biggest distinction is between MAJOR_VERSION and everything else. Changing the major version of a package means breaking API changes are allowed, and that means all bets are off - if you depend on major version 2 of a package and upgrade to major version 3 you should expect things to break. A change to a minor version, meanwhile, means you've changed some functionality, but you've done it in a backwards-compatible way. If you depend on version 2.1 of a package and version 2.2 breaks you, then the package author has violated semver (there's no penalty per-se for violating semver, but your users may be mad at you.) A change to a patch version is almost the same as a change to a minor version, but a new patch version just signifies a bugfix - i.e. no functionality was changed.

Semver makes dependency ranges a little easier to deal with. Now you can depend on MAJOR_VERSION.* or MAJOR_VERSION.MINOR_VERSION.* and be fairly confident that your package won't break unexpectedly, and that you'll get important bug fixes and security updates automatically without having to fiddle with your manifest.

Dependency resolution

Dependency ranges are a trade-off. They let package managers satisfy everyone with fewer total packages - but finding out how to do it optimally (a problem we call dependency resolution) is hard. In fact, it's NP-Hard, which means that there is no known algorithm that can quickly resolve dependencies for every project! Luckily, by using a cocktail of tricks and heuristics, package managers can perform dependency resolution pretty well on most projects. A dependency resolution algorithm finds the smallest set of dependencies that need to be installed for your program to run correctly, and then installs them. This is what's going on when you type bundle or yarn install.

Lockfiles

In our earlier version of mpm, every package specified exactly one version for each of its dependencies, which meant that there was only one solution to the dependency resolution problem. No matter when or where you ran the package manager, it would install the exact same set of packages. Introducing dependency ranges makes dependency resolution non-deterministic; there may be many ways to satisfy the same set of dependencies, and you may end up with a different solution even if all your manifests stay the same! For example, say package A depends on version ^1.2.0 of package B. Alice and Bob are working on two copies of the same project; Alice runs yarn install on her copy and it installs B@1.2.0. Then, package B's author publishes a new version of package B, say B@1.2.1. After that, Bob runs yarn install on his copy of the project, and it installs B@1.2.1 (since, all else being equal, package managers tend to install the latest versions they can). Alice and Bob got different results, just because they ran their package managers at different times. This may seem benign, but it can cause some big headaches. If there's a bug in B@1.2.1, for example, then Bob's environment will break, but Alice's won't, even though all of their code and configuration is identical (a classic 'works on my machine' scenario). Things could break in CI that can't be replicated on any developer machine, etc. etc. - it's just a bad time in general. We want to return to having our package manager produce deterministic results; if two people run a package manager on identical copies of a project, it should always download and install the same set of packages. But how can we make this work with dependency ranges?

To make dependency resolution deterministic again, package managers use something called a lockfile (yarn's lockfile, for example, is named yarn.lock). Every time a package manager does dependency resolution, it uses the lockfile to record the exact version it calculated for every dependency. After that, whenever you use the package manager to install dependencies, yarn skips dependency resolution and just installs exactly what is specified in the lockfile, no matter where or when you run the command (if you change your package's manifest, of course, the lockfile has to change too. That's why commands that change the manifest, e.g. yarn add or yarn upgrade, usually do dependency resolution again and update the lockfile accordingly.) The lockfile is typically checked in to version control so it's consistent for all members of a team. If two people have the same lockfile, their package managers will install exactly the same packages - guaranteed!

Where do all these packages come from?

We've talked a lot about downloading packages, but not at all about where the packages are coming from. Usually, a package manager looks for packages in a package repository (not to be confused with a Git repository!) A package repository is (usually) some centralized website that package authors can upload their packages to (and package managers can download packages from.) bundler uses rubygems.org as its repository, for example, and npm and yarn use npmjs.com. Usually a package manager has a special protocol that it uses to ask the repository which packages and package versions it has, or to request to download a specific package. Most package managers are configured by default to talk to the 'main' repository for their language, but you can point them at a different repository if you want. For example, some companies that need to run their package managers in production host a 'mirror repository' on their internal network. The mirror repository is manually curated, and only contains copies of the packages needed by the company's software. During production, the package manager is configured to use the mirror repository instead of the default one. This is mainly a security measure, since an intruder can't sneak in malicious code that depends on something not already hosted by the mirror repository.

Problem solved?

Gone are the days of messing around with dependencies by hand - now we have sophisticated, robust software to figure things out for us. So, no more fretting about dependencies! That's the computer's job, right?

Well, there's been a lot of great work on dependency managers, and we've come quite far in just a few decades, but despite our best efforts dependency hell still occasionally rears its ugly head. Here are a few things that can still go wrong:

Package authors make mistakes: Correctly versioning your package and all of its dependencies is really hard! Semver helps a little, but it's a (kind of fuzzy) recommendation, not a standard, and there's still no widespread agreement on what it means or how to use it. The bottom line is that package manifests are basically working on the honor system - if you say your package works with a certain version of a dependency that it really doesn't work with, no one will second-guess you. Obviously most people don't want to publish broken packages, but everyone makes mistakes and dependency ranges can be tough to reason about. This all means that if you, as a developer, have a dependency A that depends on version "*" of package B, when you upgrade package B package A might break because it's not really compatible with all versions of B, or at least not this new latest version of B. Luckily if package A is popular enough you will not be the first person to encounter this problem - check GitHub! Odds are someone else has already opened an issue about this, and if you're lucky the package authors will already be working on a fix.
'accidental' dependencies: Some package managers are looser than others in enforcing what packages you can actually import in your code vs what's declared in your manifest. If you have a package manager that's particularly lax, you may find yourself able to import packages you don't explicitly depend on - which is a recipe for trouble! See below for context.
repository reliability issues: The package repository itself is a web service, and like all other web services, it can be down, or can hang mysteriously, or a whole host of other weird behaviors. This can spell trouble if your CI or deploy processes depend on being able to run the package manager. To make their infrastructure more robust, some companies aggressively use a mirror repository for everything besides the development environment - since they own the mirror repository, they can ensure that it stays up!

This concludes our quick overview of package managers in general; next we'll take a closer look at the specific package manager we use here.

Yarn

We use yarn at Nerve. More specifically, we use yarn v2 (codenamed "berry"). yarn v2 is advertised as a more advanced version of yarn v1, which is advertised as a more advanced version of npm (to be fair, yarn v1 was a more advanced version of npm 4. At the time of this writing we are at npm 11, which has pretty much caught up to yarn v1. npm's version of yarn v2 was an experimental package manager called tink, which looks to be stalled out. It's also worth noting that projects that aim to replace the node runtime wholesale, like Bun and Deno, usually come with their own package managers.) yarn and npm are both open-source; yarn used to be maintained by a team at Facebook (now its lead maintainer is at Datadog) and npm is maintained by the company that shares its name. To understand what yarn v2 offers and why we're using it, it helps to know a bit more about what it's improving upon - so let's take a look at yarn v1!

Yarn v1

yarn v1 has been around for a while, and is now considered one of the main competitors to npm in the Node package management ecosystem. The development of yarn was at least partly a reaction to some problems people were running into when using npm 4 on big projects with big teams.

npm 4 did not have mandatory lockfiles. You could generate a lockfile if you wanted to with npm shrinkwrap, but you needed to remember to run it whenever you updated your package.json. If you forgot, it was back to non-deterministic installs until someone remembered to regenerate it! yarn, by contrast, always has a lockfile and always keeps it up-to-date. You need to check it in, of course, but that's a much lower bar to clear than remembering to regenerate it (yarn also makes some smaller quality-of-life changes around lockfiles - for example, it uses a custom format for yarn.lock that makes changes to the lockfile easier to view in a Github diff. The new format also makes merge conflicts in the lockfile easier to resolve, and decreases the likelihood that they'll show up in the first place.)
npm 4 was also quite slow, but that wasn't entirely its own fault! The part that was its fault had to do with parallelism: because of the way npm was architected, it could only install a single package at a time. yarn's architecture, on the other hand, lets it install packages in parallel - so if you have a multi-core machine (which you probably do; most machines are multi-core at this point) yarn will install many packages for you at the same time. yarn also added some improvements to the way packages were cached locally - that way it could avoid re-downloading identical versions of a package whenever possible. These improvements combined made yarn much faster than npm at the time it was released (to reiterate, npm has caught up since then and now the performance of the two is roughly comparable.)

The other part of npm's performance woes - that part that wasn't its fault - was actually Node's fault. Recall that node expects a package's dependencies to be in a node_modules folder in the package itself. That means, if you're developing a package (which you technically always are when using a package manager), there needs to be a node_modules folder in your repo root. Also recall that each of your dependencies needs its own node_modules folder, and so on - in other words, your whole dependency tree has to be contained in that top-level node_modules! And that means that, if you don't have node_modules checked in (which you shouldn't - checking in node_modules would make your git repo huge and git does not do well with big repos), you need to recreate the node_modules folder in your repository every time you e.g. check out a new branch. This may not seem like a big deal if you have all your dependencies cached, but you still have to copy packages from the cache to node_modules. This can take a long time, especially on bigger projects with multiple GB of dependencies. All told dealing with node_modules generally adds a big penalty to install times, and there's not much the package managers can do about it - in fact, npm still has this problem, and so does yarn v1. The only way out is to override Node's own import mechanism - which is what yarn v2 does!

Yarn v2 ('berry')

yarn v2 is pretty new, and in some ways still experimental. It makes a couple of bold, disruptive changes to node package management:

PnP: PnP stands for Plug N' Play, and is perhaps the biggest change that yarn v2 introduces. PnP is yarn's own package import mechanism - yarn v2 wrests control of package imports away from Node and handles them itself! If you're wondering why you've never seen a node_modules directory while working at Nerve, this is why. yarn v2 uses the trick we discussed earlier - when you use yarn run to kick off a Node process, yarn patches Node's import with its own version. There's some tricky indirection at this point, but it all boils down to yarn checking its own records to figure out the correct version of the package, and then loading the requested package directly from the package cache and handing it back to Node. No node_modules required!

That's PnP in a nutshell - it's a fairly simple change with a lot of ramifications. For one, we don't have to copy dependencies from the cache anymore, and as mentioned earlier that saves a lot of time. This alone makes yarn v2 a good bit faster than yarn v1. Install performance isn't the only thing that PnP improves, in fact, but to understand why we must return once again to the Node import algorithm.

Earlier we said that when Node sees an import statement, it looks in the node_modules folder of the currently running package to try and find the correct dependency. This is true, but we left an important part out - if Node doesn't find any package with the correct name in the current node_modules, it looks in the parent directory of the directory that contains node_modules. If that parent directory has a node_modules, it looks in there for the package. If the parent directory doesn't have a node_modules, or if the node_modules doesn't have the correct package, it looks in the parent's parent for a node_modules, and so on, until it gets to the root of the filesystem, at which point it blows up. Depending on how all the different node_modules folders are set up, Node may have to do a lot of searching before it finds what it's looking for. This means many filesystem calls (which are fairly slow) just to load a single package. But we shouldn't have that problem, right? After all, we said earlier that the entire job of package managers like npm was to populate each node_modules folder with the correct dependencies. If we're using a package manager, shouldn't Node always find the package it's looking for on the first try?

Well, we told a bit of a half-truth. It's true that npm used to put each package's dependencies into that package's node_modules - but there were serious drawbacks to that approach. First of all, dependency trees can be very deep, and filepaths to deeply nested dependencies started to exceed operating system limits (this was mainly a problem on Windows.) Second, there was the aforementioned problem with duplicate dependencies - these took up a ton of disk space and this eventually began to be untenable. So, starting with npm 3, Node dependency managers began to use a trick called hoisting. The idea was to take advantage of the way Node searched for directories and place common dependencies higher in the filesystem than their dependents. So, for example, if your project depends on package A and package B, and they both depend on v.1.0.1 of package C, you could put package C in the top-level node_modules, next to package A and package B! Then, if Node tried to load package C for package A to use, it would look in package A's node_modules, wouldn't find anything, then look up a level in the root node_modules and find package C there (and of course it would go through the same process if it was loading package C for package B). Now you've satisfied both package A and package B with only one copy of package C instead of two. Hoisting is a common tactic in the Node package manager world; both yarn v1 and npm do it. More info on hoisting is available here. Two important things to note from the linked article are a) hoisting can only get rid of some duplication, not all of it and b) the choice of which packages to hoist depends on installation order, which can change from run to run, even with a lockfile!

Hoisting brings the size of node_modules down, but it does mean Node has to potentially do some searching to find a particular package. For very deep dependency trees this can add up to a lot of searching, which means imports in node can be slow. With PnP yarn gets rid of this problem entirely, which surprisingly means that yarn v2 can actually improve the runtime performance of our app! The key improvement is that yarn maintains a single package cache, and all packages in the cache are labeled with both a name and a version. So when yarn needs a specific package version (remember, yarn's version of import has access to package.json, so it already knows what version of the package to pull) it just looks in the cache once - if the package is there, it loads it, if it isn't, it throws an error.

There are actually benefits to letting the package manager handle imports beyond just performance. yarn knows what all of the dependencies are for each package, and Node doesn't. In fact, Node doesn't know that a package manager was involved at all; all it knows about is node_modules. This can lead to some weird behavior from Node's default import, especially when hoisting gets into the mix. For example, say A and B both depend on D, but C doesn't. D will get hoisted to the level of A and B - i.e. it will be in the same node_modules folder as A, B, and C are. Now let's say C has an errant import * from 'D' somewhere - node will dutifully look in C's node_modules folder, where it won't find D, and then in the node_modules folder above, where it will. Since Node doesn't know anything about what C actually depends on, it will treat this as a success and return D to C, at which point C will go on its way. The end result is that C successfully imported a package it didn't explicitly depend on! This may not seem like a big deal, but because C doesn't explicitly depend on D in its manifest, there's no guarantee from the package manager that D will be available to C at runtime. Instead, we are depending on the hoisting behavior of the package manager, which if you'll recall can change based on installation order. This could potentially mean that C will sometimes be able to import D and sometimes not, seemingly at random; our old friend non-determinism has crept back in. The solution, of course, is to always ensure that you only import packages you actually depend on in your package.json, but Node can't enforce this so it's up to the developer not to make mistakes. yarn v2, on the other hand, can and does enforce this. If you try to import a package you don't have in your manifest you'll get a runtime error. Things are deterministic again, hooray!
workspaces: This is a feature that yarn v1 technically supports, but yarn v2 has it as a first-class feature and it's a little bit richer. Many projects (including ours) are structured as a mono-repo: one big project that has many separate sub-packages in it. It's not a monolith per-se - in a monolith the whole project would be bundled into a single gigantic package - but development on it is a bit similar to development on a monolith. There's usually one build process, one CI process, and one deployment process - each one just operates on many packages. Package management can be tricky in a monorepo, since the monorepo's constituent packages will probably depend on each other pretty heavily. npm and yarn have a command that lets you take a dependency on a local package (usually by specifying a relative path to the dependency), but it's a little clunky to use and quickly becomes a pain if you have a lot of packages. Additionally, running yarn install on every subpackage is time-consuming and goes against the whole "unified development experience" philosophy of a monorepo. Lucky for us, yarn has a workspaces feature that solves both these problems - in yarn v2 a project can optionally have many "workspaces" (yarn parlance for sub-packages). Each workspace has a package.json, and there's additionally a top-level package.json that doesn't correspond to a package itself, but lets yarn coordinate between the workspaces. yarn workspaces lets you, for example, add a dependency on workspace B from workspace A by just going to the root of workspace A and doing yarn add B. Additionally, there's only one place you can run yarn install in a project with workspaces, and that's from the root of the mono-repo. yarn install will install dependencies for every workspace and keep them in the same cache. There's only one lockfile, and it lives at the very top of the project. There are other benefits to using workspaces too (sometimes we use yarn to install some executable tools like tsc and jasmine, and thanks to workspaces those dependencies can live in the root package.json), but the general theme is that it keeps the repo more organized and eliminates messy duplication.

Why Yarn v2?

As mentioned several times already, we use yarn v2 at Nerve. The primary reason is performance - performance is important to us, especially in our development tools, and the performance gains from getting rid of node_modules are enough to justify using a new tool that many consider cutting-edge. Better determinism and support for workspaces are secondary, but they are definitely nice to have as well!

It's important to note that working around Node's module loader was not yarn v2's idea! It's something that the Node community has been wanting to do for a while, and many different package managers have aspirations to do it (e.g. tink). However, yarn v2 is the only one so far (that I know of!) to ship it in something reasonably suitable for production use.

Why not Yarn v2?

Believe it or not, when yarn v2 launched it caused a bit of backlash. Some people were upset about the new feature set and various companies announced they had no plans to upgrade. So what happened?

Many of the issues revolved around PnP. If you'll remember, yarn v2 patches Node's import method to implement PnP. This should just work - in theory, any third-party code that calls import will get the right module, whether or not the method has been patched. In theory...but other tools in the Javascript ecosystem patch import too, or use their own custom import methods to load modules. A lot of these (e.g. babel, webpack, jest) have worked with the yarn team to resolve the issue, so yarn could run them natively. Others needed to use a shim layer which allowed yarn to directly simulate the node_modules directory. Still others couldn't be used with PnP at all, like Angular, Flow, and React Native. Getting rid of node_modules was a bit controversial in its own right, since the standard Node module loading behavior is, after all, a standard - one that many parts of the Javascript ecosystem have come to depend on. Although PnP is getting more popular, it does still represent a significant breaking change, and some people in the community were frustrated about how this change was communicated and rolled out.

Another issue with PnP is the hard-line stance it takes on dependency enforcement. Recall that PnP will not let you import a package you have not explicitly declared as a dependency. In general this is a good thing! However, if you have a large project that you've been working on for a while, odds are you have a few of these 'accidental dependencies' that have slipped under the radar, and as soon as you upgrade to yarn v2 all of those become hard errors. This can really weigh down a migration process that already suffers a lot of friction with many standard toolchains. To make matters worse - sometimes there are accidental dependencies somewhere in your software supply chain, and even though you didn't author the packages they still break under yarn v2 - even if you add them after you migrate! This is rough but fortunately yarn v2 offers some escape hatches for when it happens (and it still beats debugging randomly missing dependencies in production.)

Hope you found this helpful! If anything in this article is incorrect or out of date, please drop me a line at mprast@get-nerve.com and I'll issue a correction.