What is Refactor and why it is matter ?

For someone with no programming experience, the word “refactor” can be confusing at first, because it’s mostly used in coding. For a simple explaining, refactor is Making something better or clearer without changing what it does.

Why Refactoring when there is nothing change ?

If it works, don’t touch it” is a principle that is still valid in real world. It’s truth, pragmatic and is recommended at some extent when we do not have enough understanding about system we are working on. From business perspective, Refactoring feels unproductive when there is no new features added to system, but, just like a business, sometime we need to restructure processes, reorganize people and rearrange tasks to maximize outcomes, programing process also need refactoring to optimize coding experience which help source code more readable, maintainable, and scalable. These benefits, in turns, accelerate developers when adding new features or fixing bugs later on.

What is Readable code ?

Readable code is code written clearly so another developer, or future you, can read it quickly and know what’s going on, sometime just by guessing via variable names and function names. Some tactics can be applied to ensure readable code are:

  • Clear naming: Variables, functions, and classes have names that explain their purpose.
  • Short, focused functions: Each function does one thing, not many things.
  • Consistent formatting: Proper indentation, spacing, and line breaks.
  • Avoids unnecessary complexity: No overly clever tricks, Straightforward logic.
  • Helpful comments: Explain why, not what.
  • Use of standard patterns: Code follows common conventions so others instantly recognize the structure.

What is Maintainable code ?

  • Readable: as explained as above
  • Well-organized: Code is structured logically into modules, functions, or classes
  • Consistent: Follows the same style, naming, and patterns everywhere.
  • Well-tested: Covered by tests to catch bugs early and safely.
  • Documented: Has comments or docs explaining why and how things work.
  • Flexible: Easy to modify, extend, or adapt without breaking existing code.

What is Scalable code ?

  • Efficient: Uses memory and CPU wisely, Avoids unnecessary heavy operations.
  • Modular: Pieces of code can be separated or duplicated easily
  • Asynchronous / non-blocking when needed: Doesn’t freeze the whole system while waiting for one slow task.
  • Uses good architecture: Clear layers, Can split into microservices or separate components if needed.
  • Uses proper data structures: For example, using a Map instead of a List for fast lookups.
  • Database scalability: Indexes, caching, batching queries, sharding, etc.

Refactor safety

Because the goal is to keep system working as the same while rewriting codes, there must be a metric indicate sameness, or early detect differences in system behaviors. This is where Test Driven Development shines.

Writing Test is mistakenly overlooked by inexperience developers. Beginners usually think programing job is just to write code, see it run then move on writing another code. Writing tests looks like an extra work or an annoying requirement. This is okay just as a young men does not understand “karma”. And karma for this overlooking usually are:

  • Bugs keep coming back
  • Bugs evolve when there is more code added
  • Take so much time for debugging
  • Source code become a mess and a small change can take months to add

When bugs bring enough pain, developers begin more experience.

Test Driven Development (aka TDD)

Test-Driven Development (TDD) is a software development process where you write tests before writing the actual code.

TDD follows a repeating 3-step loop:

  1. Write a failing test ( yes, always fail first ! )
    • The test describes what the code should do.
    • It fails because the feature doesn’t exist yet, or the bug is not fixed yet.
  2. Write the minimum code needed to make the test pass
    • Not perfect code, just enough to pass the test.
  3. Refactor – Clean up the code
    • Improve readability, maintainability, scalability
    • Keep the tests passing.

Then repeat the cycle for the next feature and bug fixing.

Tests ideally can simulate the UX that users will engage on real product. This can not be 100% achieved but keep this principle in mind will help a lot to write good tests. Depends on how closely a test to real world UX, tests can be classified to 3 levels: Unit Test, Integration Test, and E2E Test.

Unit Test

Unit Test is ideal to test behaviors of a function or a class. In each unit test, we can test output of a function given a particular input. We can anticipate what inputs can be, even unrealistic ones (hackers usually input unrealistic ones) , to ensure our functions keep functioning regardless what input is. Unit tests can be used as a debugging tool when we can test directly a part of system without try reproducing via UI/UX. For functions that is well guarded by unit tests, developers can feel more confident to add changes or refactor it because bugs can be caught early.

Integration Test

Integration tests are tests that check how multiple parts of system work together. Functions, Classes and Flows can be tested on how they are interacting together inside a system. It ensures that every “pieces” of the system are integrating properly. Similar to Unit Test, we can anticipate and simulates Flows to can catch bugs soon.

E2E Test

E2E (End-to-End) tests are the tests that simulate a real user using the real app, by actually click buttons and typing text.

This is ultimate form of Test that can catch bugs that unit or integration tests cannot. E2E Tests test the app in an environment closest to production. They validate the entire system from UI/UX to data storage. But it is the hardest tests to make when a real system need to be deployed for E2E tests can execute. Simulating user behaviors by coding requires more effort. This is why many teams usually stop at Integrating Tests and it is totally ok when majority of bugs can be catch at level of Integration Test. Writing E2E Tests is time consuming so we should only write it for bugs that un-produceable at Integration Test level. These bugs are high-level bugs and it should be addressed by high-level tests, and they mostly about concurrency, timing and resources:

  • Race Condition: A situation where the correct behavior of a program depends on the relative timing or interleaving of multiple threads or processes. It’s about uncontrolled timing causing wrong results.
  • DeadLock: Two or more threads/processes are waiting on each other indefinitely, preventing progress. As the result, system freezes because resources are locked in a circular wait.
  • Livelock: Threads or processes keep changing state in response to each other but make no actual progress. As the result, CPU or threads are active, but nothing gets done.
  • Starvation: A thread never gets access to a needed resource or CPU because other threads dominate it. As the result, resource exists, but some threads never get a chance to execute.
  • Atomicity violation: A set of operations that should be executed as a single, indivisible unit is interrupted, causing incorrect results.
  • Order violation: Correct behavior depends on operations happening in a specific order, but the order is not guaranteed that eventually leads to incorrect results.
  • Heisenbug: A bug that disappears or changes when you try to observe it (e.g., by debugging, logging, or adding print statements). This sounds like quantum computing but yes, it does exists. These bugs often caused by concurrency, timing, or memory issues.
  • Data corruption: Shared data is modified concurrently without proper synchronization, resulting in invalid or inconsistent values.
  • Lost update: Two concurrent operations overwrite each other’s results, causing data loss.
  • Dirty read / inconsistent read: A thread reads a partially updated or uncommitted value from another thread or transaction and then produce wrong results.
  • Priority inversion: A low-priority thread holds a resource needed by a high-priority thread, causing the high-priority thread to wait unnecessarily.

In conclusion, to Refactor code safely, we need a lot of tests, good tests one !

How to avoid Merge Conflicts in software development

Beside Ambiguous Requirements, Tight Deadline and Unstable Legacy Codebase, Merge Conflict is another light fear that annoys and disrupts developers the most while making software.

From a fact of how Merge Conflict might appear in this post that long-live branches should be avoid as much as possible to mitigate chance of conflict, here are some guidelines to help any software teams deal with this fear.

User Story based Task Description

Human brain is designed to consume story, not ambiguity. In software development, User Story is short, simple description of a feature or requirement told from the perspective of the end user. It’s a fundamental element of Agile and Scrum methods — meant to capture what the user wants and why, without prescribing how developers should implement it. A user story usually follows this format: As a [type of user], when [something happen], user want [some goal] so that [some reason] . For example: As a registered user, I want to reset my password so that I can access my account if I forget it. This helps teams understand who needs something, what they need, and why it matters.

Depends on how big the goal user want is, a User Story can be splitted into simpler stories. We don’t have to write an essay in task description because we are not at school. Clarity is the top priority when writing tasks descriptions, so don’t think, just tell stories.

When a User Story is simple enough, a task stemmed from it can have a small scope of change with limited effect of codebase, and in predictable way. Small scope of change in a task helps to mitigate chance of conflicts. Even conflicts happen, resolving them can be easier because it happens in fewer places.

Merge/Rebase daily, resolve early

When tasks are defined well and scope of change is limited, the branch now can be short-live which is okay for Rebase tactic. Depend on preference of history commit tree, Merge or Rebase both is okay. The recommended practice here is to do it daily: merge main branch into new branches, or rebase new branches onto main branch. This practice helps to early aware of possible places might cause conflicts so that we can adjust coding tactic or sync up with other developers about what changes are made.

FIFO Merging

It is obviously right when prioritize tasks but do not apply priority to the order of merging code because it can turn some branches into long-live one when higher priority tasks keep being merge first. When a task is put on progress, your Kanban board for example, and when it is in Done column for example, it should be merged asap regardless priority of its task. What is completed first should be merged first, (First-In-First-Out order) . To achieve this state, utilize any tool to automatically do so is highly recommended. Of course, completion of a task here includes testing phase as well.

When a task is well defined with limited scope in User Story format, and somehow it gets stuck and turn into long-live branch, we need to review its necessity:

  • If we don’t need this feature anymore, so discard the task and close the branch.
  • If we still need this feature, but it contains risky changes, and it is why no one dare to merge it to main branch, so it is time to make simplier stories from the risky parts. Don’t let task being stuck. Keep feature branches short-live.

Commit with Task ID

Commit message is encouraged to describe what changes are but there is nothing to force or ensure that consistency. This depends on how good a developer can explain things. So to make it simple and scalable, it is recommended to begin a commit message with Task ID, for example: #1234 fix things , so whenever anyone wonders why a commit is added, they can trace back to related task description and let the User Story explains.

Long-live branches as Microservices

For any reason that a branch is planned to be long-live, such as when making a challenging feature that inevitable requires long time of development, consider to turn this big part into Microservices. Microservices architecture can keep new code in separated repository, which in turn, can mitigate risk of conflicts with existing main branch. Main system can communicate to this new Microservices in any kind to get things done without worrying about large changes from new features. The new Microservices can have its own tasks board with its own User Stories, and User here, is the Main system.

Time is money, so Don’t waste development time for resolving merge conflicts !

Merge vs Rebase: which is better ?

I usually prefer using Merge to Rebase for safety first.

Merge and Rebase is 2 ways of combining changes from different branches when using Github as chosen source code management platform. Since Merge seems to be enough to get things done in every cases, why does Github includes Rebase method ?

The answer seems related to team’s preference on the commit history. Github maintains a tree of commits per repository and each commit is a snapshot of all files. It is important to notice that Github stores project snapshots, not the diffs that we see with command git diff . Diffs are calculated on the fly when we compare 2 commits. This nature of Github affects to how actually Merge and Rebase behaves under the hood:

How does Merge actually work ?

When using git merge, for example, to merge branch A into branch B and given branch B is created from branch A, Github performs below steps:

  1. Finds the common ancestor snapshot, aka the commit where branch B is created from.
  2. Compares the latest snapshots of branch A to ancestor snapshot, get the diffs D1 (aka MERGE_HEAD)
  3. Compares the latest snapshot of branch B to ancestor snapshot, get the diffs D2 (aka HEAD)
  4. Applies diffs D1 & D2 on the ancestor snapshot then output a new merged snapshot, stored in a new commit of branch B

Because commits are snapshots:

  • Git doesn’t need to replay all intermediate diffs.
  • It just looks at 3 snapshots: ancestor , HEAD and MERGE_HEAD.

That’s why merging large histories is fast and doesn’t rewrite old commits — the snapshots are stable and immutable. When using Merge, if Conflicts happen, because there are always 3 snapshots is taken into account and the output is always 1 new snapshot, resolving Conflicts when using Merge likely happens only once.

How does Rebase actually work ?

When using git rebase, for example, to rebase branch B onto branch A, given that branch B is created from branch A, Github performs below steps:

  1. Calculate the diff between each commit (aka snapshot) of branch B to its parent commit. This is likely to create a “patch” telling step-by-step how changes are already made on branch B,
  2. Reapplies those diffs (patches) on top of latest snapshot of branch A
  3. Creates new commits with new IDs (aka new snapshots).

So when each time when we rebase a branch B onto branch A, new commits (or snapshots) are added as if we have just made those changes on the snapshots of branch A. Because diffs are reapplied every time when we rebase, if there are Conflicts, it is likely we have to resolve same conflicts again and again. And this is why I prefer Merge to Rebase.

So, why does Rebase exists ?

Rebase is mostly used when we have a reason to control how the commit history looks like on a branch. This can be useful when a team prefer a linear commit history that is easier to read and do not care what actually happen such as when a branch is created and what is merged. Because it rewrites commit history on a branch, Rebase is not recommended to use on main branch due to the risk of losing commits and resolving conflicts multiple time. Rebase is safer only on a feature branch, which is created from main branch, and most important, this feature branch should have short-time development. On a feature branch that long-live enough, re-resolving conflicts might happen frequently and this can slow down development speed and even frustrate developers.

Conclusion

In summary, my suggestion on Merge vs Rebase is :

  1. Always using Merge for safety first
  2. If we are working on a feature branch (NOT the main or master one), and want to have a nicer commit history on this branch, and development time for this branch is short, then can use Rebase