Things about a major React version upgrade you won't hear about

Image showing a sparrow migrating through a desert

How we got here

At AirHelp, our frontends are almost exclusively React-based. It was adopted a few years ago, and the time had come to move from version 17 to 18. Multiple libraries started to move away from supporting older React versions, so from the maintenance and security perspective, the clock started ticking. Even more importantly, because of some technical issues, the app my team was maintaining was late to the upgrade party. Our other services have already migrated, so we started to see a compatibility drift - internal packages could not support every project without maintaining two versions of the code, one per React version.

Problems and solutions

As a foreword, let me point out an important thing - the story of any upgrade, refactor, or rewrite is different for each project. My team delayed the migration because there were bigger priorities at the time, and we knew that the change would not be easy. Others had done the upgrade in a matter of days, but we took almost a month with a team of three. While not working full-time, some research even started as far back as half a year ago.

The single point of failure

Migrating a solution that is the backbone of your project can be quite painful. To start moving to a newer React version in the repository, you must first upgrade React! And because many packages, tools, and most of the code depend on it, there is a high chance that trying to start the project right after the upgrade will result in a beautiful nothingness and a console full of errors. This is an unavoidable truth, and any discussions about being independent of the framework are a pointless undertaking - right now, React is generally used more like a platform that we build upon rather than a swappable library.

Such a situation is difficult, plain difficult. You do not know what is broken, and tests don't tell you much more as most are broken or may not run at all. There was a lot of hope put into the e2e tests - they were a colossal help further down into the process because they encode the behavior of the app with a black-box approach. But until the app starts, they are too useless. So, what to do?

An attack plan

This was more of a people problem. The dread that comes with the amount of work you have to do, with no clear time estimate of when the work will be finished, is a tough pill to swallow. But doing something is better than doing nothing. All we needed was an anchor to focus on, and a batch of work that was manageable without our brains going awry.

We had a head start because my colleague had already taken his time to identify the problematic parts of the app we had to refactor first for it to start. A technical meeting followed, where we structured the work in the form of a waterfall - we chose which tasks blocked others, their scope, and how much of the work could be done in parallel. Then we created loose estimates and split each task into smaller parts so it became manageable. What we did not consider was how many people needed to work on the topic. It was a mistake, but I will return to that later.

Then, we clearly saw that making the app render was the crucial thing to do. An important moment for me came when I acknowledged that when something is so broken, you do not sort the work you will do. The waterfall and planning were used to prepare a vision and create a measurable plan with units of work that gave us direction. But the code was totally unpredictable, and many bugs would surely show up only after we fixed other things. Being adaptable and doing the work that increased our confidence in the project was the way to go.

Stability can be a myth

After some time, we got the app to a point where it was clickable. The next step was to clear most of the errors, which mainly came from vendor code that was broken. Other things mostly worked, with optimization problems, but we could fix those later.

With a major version upgrade, most projects provide a migration guide. If the change is in the form of "rename this to that" or "you can remove this entirely", it is a perfect scenario. But in our case, some APIs changed completely (React-Router 5->6, compatibility library did not help), a few libraries stopped working altogether as maintainers evaporated, and multiple projects were dropped as new browser capabilities replaced them.

Somehow, I feel that these problems are especially visible on the web. We are in a constrained environment that for a long time lived without standardization. It gives us enormous flexibility in how we can implement our sites and applications, but with the trade-off of being chaotic. Truly, a box of skittles. For me, this is a good thing and we are approaching a point where things start to cool down a bit, however, there is no denying that the environment is still ever-changing.

To add insult to injury, anyone who was there when React announced changes between versions 17 and 18 remembers how heated the discussion around the topic was, and the burden that the transition introduced. In our case, it was a good thing overall. Our app was written when there was not much experience with React in the team and both the technology and ecosystem were not as mature. Some parts of the app just stopped working, and they should never have before, but earlier versions of the framework permitted some shenanigans (mostly thanks to race conditions). An old codebase that was a learning battleground has shown its fangs, sometimes fixing problems took a day to replace a single line of code. A horror on one hand, but true debugging skills level up on the other - pick your poison.

The job was tedious and manual, we started changing things in hundreds of places and rewriting some parts of the code that had to be replaced entirely - these moments were a form of a treat when you could implement something new altogether. After about two weeks, most of the work was done and we could move forward.

Overall, I do not think that every migration must look like that. If I was asked what are the things that predict that such a situation will occur, as always, I would probably pick high coupling and too much customization. Convention over configuration helps, otherwise, it is a maintenance problem at the core. Back then, we used Webpack through create-react-app with some added Webpack and Babel plugins. With meta-frameworks like Next, you do not configure things like that, most of the tooling is abstracted away for you. And when an upgrade comes, the responsibility to update those components is not an issue for you. There is also a backing community that does things in the same way as you do, so solutions to your problems are quickly available on the Internet.

When tests fail to test...

At this point, the application was functioning, and there were no visible issues. We ran the tests and about 80% of them failed. There was also a lot of unnecessary output in the form of warnings, with a long-standing testing library issue at the forefront. It was challenging to get them green again. With new package snapshots failing intermittently, some issues were interconnected. The output did not provide much assistance, and rerunning many tests, albeit efficient, increased the development time rapidly.

An incremental approach yielded positive results. We identified linked tests and fixed those in batches. Snapshots and warnings were ignored until we had everything under control. Unfortunately, in the end, the Testing Library warnings I mentioned were difficult to fix due to our own, not-so-ideal code structure, so over 200 files had to be manually modified. Everything took days, but the progress began to accumulate. When the non-snapshot tests were all green, I delved into the snapshots themselves.

I will pause here and elaborate because snapshots left me with a bitter experience. Unless you do everything correctly, they are challenging to work with. Add a CSS-in-JS solution that generates CSS classes and DOM attributes dynamically to the mix, and things get complicated. Inspecting hundreds of snapshots is not enjoyable, and believe it or not - you will struggle to reason about them if they are larger than a few lines. They are worthwhile, but use them sparingly. Visual snapshots in E2E tests and expectations in the form of element.toExist().contains(...) are a better and more flexible choice (in my opinion, but this is more difficult to set up).

Speaking of end-to-end tests, we were prepared to handle them. Let's say that this part went reasonably well. It was a good opportunity to revisit some of the older test cases. We also discovered tests that were no longer relevant. The work was somewhat tedious but it increased our knowledge about the application internals that had not been touched for some time (I joined not so long ago, so many things seemed somewhat new to me).

I must say, Cypress has shown that it is far from ideal when it comes to developer experience. When everything is failing, you would hope for better manual control. If not for the Chrome debugger, pausing the tests and gaining control over the browser would be much harder, and there is no simple way to just say - pause here and let me do what I want. Hit save at a bad moment and the tests rerun. Want to run a single nested test case? You better add those skip() calls everywhere. We had a lot of assertions in some tests, hidden under abstractions, good luck inserting pause() at the correct moment. It's like riding a wild horse - you go where you want, but boy, what a ride that is.

After all the hard work and three weeks into the migration, we were ready to push the app into the staging environment.

Code long forgotten

The testing before release followed a standard procedure. An additional manual testing plan was prepared, a team-wide bug hunt was announced, and e2e tests were run in a production-like environment. I must say that the process went well and we caught most of the problems before production. After the release, there were perhaps 2 or 3 bugs that had to be hotfixed, which I consider a success considering the tackled change surface.

However, these bugs brought something else into the limelight. They occurred because the tests did not cover some parts of the app, and even more interestingly, in some cases, we did not even know that the application functioned in that way. There were no resources that would explain such behavior (besides Git history, of course), and that demonstrated how valuable more comprehensive internal documentation could be. We entered into a discussion about past design decisions and had to decide what to do. This was probably avoidable, had we had proper knowledge in place.

After monitoring the new production release, it was time to celebrate (remember to always do so after a big win, by the way!). The project became future-proof and we got a huge weight off our shoulders.

Lessons learned

The process itself was interesting, but I think what we learned is far more valuable. This migration was a one-of-a-kind experience. Let's go over some conclusions.

How many people are enough

Two. This time, when you work on something that changes this quickly, communication is far more valuable than the hands on deck. Working from the office for most of the project was a huge win. Live pair programming and coordinating fixes helped a lot; the feedback loop was tightened considerably.

I mentioned that we worked in a team of three. We did, but even with a great separation of work, I think it would have been more optimal if the third person had focused on the product work that was going on in parallel. The waiting time was not insignificant as we blocked each other from time to time, and when someone was working on unblocking the rest of us, a second person could handle the mundane tasks like fixing types or warnings. However, the secret sauce lies in how we worked this time. There were no pull requests to the feature branch. We simply discussed what would be pushed and when, and it was savage. With no approval delays, improvements shipped like crazy. With a third person on board, it was still possible, but collisions happened. With just the two of us, we rocked.

I should really emphasize this. Sometimes, you have to work differently. And teams can work differently. Do not fear going unconventional from time to time. It will work. It will feel good and it will be fresh. But do remember that we did not push to the main branch, so these kinds of parlor tricks were allowed. If you think this way of working looks somehow familiar to you, you are probably right.

Shift most of the responsibility away

I mentioned this at the beginning of the article. If you are responsible for controlling every tool and abstraction in your codebase, expect to maintain it. This isn't much of a surprise, but because we are bad at predicting the future, this will come back to haunt you in situations like this. If your framework handles most of the abstractions, managing upgrades often boils down to bumping package versions and executing an npm install. Your libraries change, but your code does not, so let vendors handle the vendor code if possible, and avoid overcomplicating things yourself. Having too little control has its own set of issues, but we often deceive ourselves into believing that having everything under our control is what we truly want, especially in the realm of consumer applications. There is also a significant difference in the level of abstraction, for example:

The choice is highly context-dependent. But if there is one thing that I advocate like some kind of gospel, it would be standard browser APIs. Use them, and you will drastically reduce your headaches. They won't disappear (and even if they do, you will probably be retired by then).

How to estimate a migration

This will be a strictly personal opinion. I will try to make it look professional by introducing math! Without further ado, the following formula of estimation seems to work fine for a migration of this kind (a major version change of a core library or platform):

(predicted_time * (1 + number_of_large_unknowns) + 2 weeks)

An explanation is due. You think something will take a predicted_time, but there are unknowns such as bugs that will show up later, so at least double it if something is expected to show up. If you can identify more unknowns, it is probably worthwhile to triple or even quadruple the time taken (but maybe let's stop at that to mitigate Parkinson's Law, at this point, it is a management issue). This is because of the blockers; an unknown will probably prevent further work until it is resolved. The two weeks are for some slack, testing, and release. Adjust to your taste and the size of your company, perhaps?

A final word

My conclusions were primarily based on a sample of one (okay, maybe two and a half, but I will not delve into that). I have conducted refactorings of a similar scale in the past, but mostly by myself. Everything said and done, you are the engineer who decides what is optimal. Still, I hope that the rant proved useful to you :)

Maciej Ładoś's Profile

Maciej Ładoś

Software Engineer