Expert
Smells Catalog and possible refactoring
Deciding when to start refactoring — and when to stop — is just as important to refactoring as knowing how to operate the mechanics of it.
When:
- MYSTERIOUS NAME
- DUPLICATED CODE
- LONG FUNCTION
- LONG PARAMETER LIST
- GLOBAL DATA
- MUTABLE DATA
- DIVERGENT CHANGE
- SHOTGUN SURGERY
- FEATURE ENVY
- DATA CLUMPS
- PRIMITIVE OBSESSION
- REPEATED SWITCHES
- LOOPS
- LAZY ELEMENT
- SPECULATIVE GENERALITY
- TEMPORARY FIELD
- MESSAGE CHAINS
- MIDDLE MAN
- INSIDER TRADING
- LARGE CLASS
- ALTERNATIVE CLASSES WITH DIFFERENT INTERFACES
- DATA CLASS
- REFUSED BEQUEST
- COMMENTS
Composing Methods
EXTRACT FUNCTION
from:
function printOwing(invoice) {
printBanner();
let outstanding = calculateOutstanding();
//print details
console.log(`name: ${invoice.customer}`);
console.log(`amount: ${outstanding}`);
}
to:
function printOwing(invoice) {
printBanner();
let outstanding = calculateOutstanding();
printDetails(outstanding);
function printDetails(outstanding) {
console.log(`name: ${invoice.customer}`);
console.log(`amount: ${outstanding}`);
}
}
Extract Function is one of the most common refactorings I do. (Here, I use the term “function” but the same is true for a method in an objectoriented language, or any kind of procedure or subroutine.) I look at a fragment of code, understand what it is doing, then extract it into its own function named after its purpose.
During my career, I’ve heard many arguments about when to enclose code in its own function. Some of these guidelines were based on length: Functions should be no larger than fit on a screen. Some were based on reuse: Any code used more than once should be put in its own function, but code only used once should be left inline. The argument that makes most sense to me, however, is the separation between intention and implementation. If you have to spend effort looking at a fragment of code and figuring out what it’s doing, then you should extract it into a function and name the function after the “what.” Then, when you read it again, the purpose of the function leaps right out at you, and most of the time you won’t need to care about how the function fulfills its purpose (which is the body of the function).
Once I accepted this principle, I developed a habit of writing very small functions— typically, only a few lines long. To me, any function with more than halfadozen lines of code starts to smell, and it’s not unusual for me to have functions that are a single line of code. The fact that size isn’t important was brought home to me by an example that Kent Beck showed me from the original Smalltalk system. Smalltalk in those days ran on blackandwhite systems. If you wanted to highlight some text or graphics, you would reverse the video. Smalltalk’s graphics class had a method for this called highlight, whose implementation was just a call to the method reverse. The name of the method was longer than its implementation—but that didn’t matter because there was a big distance between the intention of the code and its implementation.
INLINE FUNCTION
from:
function getRating(driver) {
return moreThanFiveLateDeliveries(driver) ? 2 : 1;
}
function moreThanFiveLateDeliveries(driver) {
return driver.numberOfLateDeliveries > 5;
}
to:
function getRating(driver) {
return (driver.numberOfLateDeliveries > 5) ? 2 : 1;
}
One of the themes of this book is using short functions named to show their intent, because these functions lead to clearer and easier to read code. But sometimes, I do come across a function in which the body is as clear as the name. Or, I refactor the body of the code into something that is just as clear as the name. When this happens, I get rid of the function. Indirection can be helpful, but needless indirection is irritating. I also use Inline Function is when I have a group of functions that seem badly factored. I can inline them all into one big function and then reextract the functions the way I prefer.
I commonly use Inline Function when I see code that’s using too much indirection— when it seems that every function does simple delegation to another function, and I get lost in all the delegation. Some of this indirection may be worthwhile, but not all of it. By inlining, I can flush out the useful ones and eliminate the rest.
EXTRACT VARIABLE
from:
return order.quantity * order.itemPrice -
Math.max(0, order.quantity - 500) * order.itemPrice * 0.05 +
Math.min(order.quantity * order.itemPrice * 0.1, 100);
to:
const basePrice = order.quantity * order.itemPrice;
const quantityDiscount = Math.max(0, order.quantity - 500) * order.itemPrice * 0.05;
const shipping = Math.min(basePrice * 0.1, 100);
return basePrice - quantityDiscount + shipping;
Expressions can become very complex and hard to read. In such situations, local variables may help break the expression down into something more manageable. In particular, they give me an ability to name a part of a more complex piece of logic. This allows me to better understand the purpose of what’s happening.
Such variables are also handy for debugging, since they provide an easy hook for a debugger or print statement to capture.
If I’m considering Extract Variable, it means I want to add a name to an expression in my code. Once I’ve decided I want to do that, I also think about the context of that name. If it’s only meaningful within the function I’m working on, then Extract Variable is a good choice—but if it makes sense in a broader context, I’ll consider making the name available in that broader context, usually as a function. If the name is available more widely, then other code can use that expression without having to repeat the expression, leading to less duplication and a better statement of my intent.
INLINE VARIABLE
let basePrice = anOrder.basePrice;
return (basePrice > 1000);
return anOrder.basePrice > 1000;
Variables provide names for expressions within a function, and as such they are usually a Good Thing. But sometimes, the name doesn’t really communicate more than the expression itself. At other times, you may find that a variable gets in the way of refactoring the neighboring code. In these cases, it can be useful to inline the variable.
CHANGE FUNCTION DECLARATION
from:
function circum(radius) {...}
to:
function circumference(radius) {...}
Functions represent the primary way we break a program down into parts. Function declarations represent how these parts fit together—effectively, they represent the joints in our software systems. And, as with any construction, much depends on those joints. Good joints allow me to add new parts to the system easily, but bad ones are a constant source of difficulty, making it harder to figure out what the software does and how to modify it as my needs change. Fortunately, software, being soft, allows me to change these joints, providing I do it carefully.
The most important element of such a joint is the name of the function. A good name allows me to understand what the function does when I see it called, without seeing the code that defines its implementation. However, coming up with good names is hard, and I rarely get my names right the first time. When I find a name that’s confused me, I’m tempted to leave it—after all, it’s only a name. This is the work of the evil demon Obfuscatis; for the sake of my program’s soul I must never listen to him. If I see a function with the wrong name, it is imperative that I change it as soon as I understand what a better name could be. That way, the next time I’m looking at this code, I don’t have to figure out again what’s going on. (Often, a good way to improve a name is to write a comment to describe the function’s purpose, then turn that comment into a name.)
ENCAPSULATE VARIABLE
from:
let defaultOwner = {firstName: "Martin", lastName: "Fowler"};
to:
let defaultOwnerData = {firstName: "Martin", lastName: "Fowler"};
export function defaultOwner() {return defaultOwnerData;}
export function setDefaultOwner(arg) {defaultOwnerData = arg;}
This principle is why the objectoriented approach puts so much emphasis on keeping an object’s data private. Whenever I see a public field, I consider using Encapsulate Variable (in that case often called Encapsulate Field) to reduce its visibility. Some go further and argue that even internal references to fields within a class should go through accessor functions—an approach known as selfencapsulation. On the whole, I find selfencapsulation excessive—if a class is so big that I need to selfencapsulate its fields, it needs to be broken up anyway. But selfencapsulating a field is a useful step before splitting a class.
Keeping data encapsulated is much less important for immutable data. When the data doesn’t change, I don’t need a place to put in validation or other logic hooks before updates. I can also freely copy the data rather than move it—so I don’t have to change references from old locations, nor do I worry about sections of code getting stale data. Immutability is a powerful preservative.
RENAME VARIABLE
from:
let a = height * width;
to:
let area = height * width;
Naming things well is the heart of clear programming. Variables can do a lot to explain what I’m up to—if I name them well. But I frequently get my names wrong—sometimes because I’m not thinking carefully enough, sometimes because my understanding of the problem improves as I learn more, and sometimes because the program’s purpose changes as my users’ needs change.
Even more than most program elements, the importance of a name depends on how widely it’s used. A variable used in a oneline lambda expression is usually easy to follow—I often use a single letter in that case since the variable’s purpose is clear from its context. Parameters for short functions can often be terse for the same reason, although in a dynamically typed language like JavaScript, I do like to put the type into the name (hence parameter names like aCustomer).
Persistent fields that last beyond a single function invocation require more careful naming. This is where I’m likely to put most of my attention.
INTRODUCE PARAMETER OBJECT
from:
function amountInvoiced(startDate, endDate) {...}
function amountReceived(startDate, endDate) {...}
function amountOverdue(startDate, endDate) {...}
to:
function amountInvoiced(aDateRange) {...}
function amountReceived(aDateRange) {...}
function amountOverdue(aDateRange) {...}
I often see groups of data items that regularly travel together, appearing in function after function. Such a group is a data clump, and I like to replace it with a single data structure.
Grouping data into a structure is valuable because it makes explicit the relationship between the data items. It reduces the size of parameter lists for any function that uses the new structure. It helps consistency since all functions that use the structure will use the same names to get at its elements.
But the real power of this refactoring is how it enables deeper changes to the code. When I identify these new structures, I can reorient the behavior of the program to use these structures. I will create functions that capture the common behavior over this data—either as a set of common functions or as a class that combines the data structure with these functions. This process can change the conceptual picture of the code, raising these structures as new abstractions that can greatly simplify my understanding of the domain. When this works, it can have surprisingly powerful effects—but none of this is possible unless I use Introduce Parameter Object to begin the process.
COMBINE FUNCTIONS INTO CLASS
from:
function base(aReading) {...}
function taxableCharge(aReading) {...}
function calculateBaseCharge(aReading) {...}
to;
class Reading {
base() {...}
taxableCharge() {...}
calculateBaseCharge() {...}
}
Classes are a fundamental construct in most modern programming languages. They bind together data and functions into a shared environment, exposing some of that data and function to other program elements for collaboration. They are the primary construct in objectoriented languages, but are also useful with other approaches too.
When I see a group of functions that operate closely together on a common body of data (usually passed as arguments to the function call), I see an opportunity to form a class. Using a class makes the common environment that these functions share more explicit, allows me to simplify function calls inside the object by removing many of the arguments, and provides a reference to pass such an object to other parts of the system.
In addition to organizing already formed functions, this refactoring also provides a good opportunity to identify other bits of computation and refactor them into methods on the new class.
COMBINE FUNCTIONS INTO TRANSFORM
from:
function base(aReading) {...}
function taxableCharge(aReading) {...}
to:
function enrichReading(argReading) {
const aReading = _.cloneDeep(argReading);
aReading.baseCharge = base(aReading);
aReading.taxableCharge = taxableCharge(aReading);
return aReading;
}
Software often involves feeding data into programs that calculate various derived information from it. These derived values may be needed in several places, and those calculations are often repeated wherever the derived data is used. I prefer to bring all of these derivations together, so I have a consistent place to find and update them and avoid any duplicate logic.
One way to do this is to use a data transformation function that takes the source data as input and calculates all the derivations, putting each derived value as a field in the output data. Then, to examine the derivations, all I need do is look at the transform function.
An alternative to Combine Functions into Transform is Combine Functions into Class (144) that moves the logic into methods on a class formed from the source data. Either of these refactorings are helpful, and my choice will often depend on the style of programming already in the software. But there is one important difference: Using a class is much better if the source data gets updated within the code. Using a transform stores derived data in the new record, so if the source data changes, I will run into inconsistencies.
One of the reasons I like to do combine functions is to avoid duplication of the derivation logic. I can do that just by using Extract Function (106) on the logic, but it’s often difficult to find the functions unless they are kept close to the data structures they operate on. Using a transform (or a class) makes it easy to find and use them.
SPLIT PHASE
from:
const orderData = orderString.split(/\s+/);
const productPrice = priceList[orderData[0].split("-")[1]];
const orderPrice = parseInt(orderData[1]) * productPrice;
to:
const orderRecord = parseOrder(order);
const orderPrice = price(orderRecord, priceList);
function parseOrder(aString) {
const values = aString.split(/\s+/);
return ({
productID: values[0].split("-")[1],
quantity: parseInt(values[1]),
});
}
function price(order, priceList) {
return order.quantity * priceList[order.productID];
}
When I run into code that’s dealing with two different things, I look for a way to split it into separate modules. I endeavor to make this split because, if I need to make a change, I can deal with each topic separately and not have to hold both in my head together. If I’m lucky, I may only have to change one module without having to remember the details of the other one at all.
One of the neatest ways to do a split like this is to divide the behavior into two sequential phases. A good example of this is when you have some processing whose inputs don’t reflect the model you need to carry out the logic. Before you begin, you can massage the input into a convenient form for your main processing. Or, you can take the logic you need to do and break it down into sequential steps, where each step is significantly different in what it does.
The most obvious example of this is a compiler. It’s a basic task is to take some text (code in a programming language) and turn it into some executable form (e.g., object code for a specific hardware). Over time, we’ve found this can be usefully split into a chain of phases: tokenizing the text, parsing the tokens into a syntax tree, then various steps of transforming the syntax tree (e.g., for optimization), and finally generating the object code. Each step has a limited scope and I can think of one step without understanding the details of others.
Splitting phases like this is common in large software; the various phases in a compiler can each contain many functions and classes. But I can carry out the basic splitphase refactoring on any fragment of code—whenever I see an opportunity to usefully separate the code into different phases. The best clue is when different stages of the fragment use different sets of data and functions. By turning them into separate modules I can make this difference explicit, revealing the difference in the code.
Moving Features Between Objects
MOVE FUNCTION
from:
class Account {
get overdraftCharge() {...}
}
to:
class AccountType {
get overdraftCharge() {...}
}
The heart of a good software design is its modularity—which is my ability to make most modifications to a program while only having to understand a small part of it. To get this modularity, I need to ensure that related software elements are grouped together and the links between them are easy to find and understand. But my understanding of how to do this isn’t static—as I better understand what I’m doing, I learn how to best group together software elements. To reflect that growing understanding, I need to move elements around.
All functions live in some context; it may be global, but usually it’s some form of a module. In an objectoriented program, the core modular context is a class. Nesting a function within another creates another common context. Different languages provide varied forms of modularity, each creating a context for a function to live in.
One of the most straightforward reasons to move a function is when it references elements in other contexts more than the one it currently resides in. Moving it together with those elements often improves encapsulation, allowing other parts of the software to be less dependent on the details of this module.
Similarly, I may move a function because of where its callers live, or where I need to call it from in my next enhancement. A function defined as a helper inside another function may have value on its own, so it’s worth moving it to somewhere more accessible. A method on a class may be easier for me to use if shifted to another.
Deciding to move a function is rarely an easy decision. To help me decide, I examine the current and candidate contexts for that function
MOVE FIELD
from:
class Customer {
get plan() {return this._plan;}
get discountRate() {return this._discountRate;}
}
to:
get plan() {return this._plan;}
get discountRate() {return this.plan.discountRate;}
}
As soon as I realize that a data structure isn’t right, it’s vital to change it. If I leave my data structures with their blemishes, those blemishes will confuse my thinking and complicate my code far into the future.
I may seek to move data because I find I always need to pass a field from one record whenever I pass another record to a function. Pieces of data that are always passed to functions together are usually best put in a single record in order to clarify their relationship. Change is also a factor; if a change in one record causes a field in another record to change too, that’s a sign of a field in the wrong place. If I have to update the same field in multiple structures, that’s a sign that it should move to another place where it only needs to be updated once.
I usually do Move Field in the context of a broader set of changes. Once I’ve moved a field, I find that many of the users of the field are better off accessing that data through the target object rather than the original source. I then change these with later refactorings. Similarly, I may find that I can’t do Move Field at the moment due to the way the data is used. I need to refactor some usage patterns first, then do the move.
MOVE STATEMENTS INTO FUNCTION
result.push(`<p>title: ${person.photo.title}</p>`);
result.concat(photoData(person.photo));
function photoData(aPhoto) {
return [
`<p>location: ${aPhoto.location}</p>`,
`<p>date: ${aPhoto.date.toDateString()}</p>`,
];
}
to:
result.concat(photoData(person.photo));
function photoData(aPhoto) {
return [
`<p>title: ${aPhoto.title}</p>`,
`<p>location: ${aPhoto.location}</p>`,
`<p>date: ${aPhoto.date.toDateString()}</p>`,
];
}
Removing duplication is one of the best rules of thumb of healthy code. If I see the same code executed every time I call a particular function, I look to combine that repeating code into the function itself. That way, any future modifications to the repeating code can be done in one place and used by all the callers. Should the code vary in the future, I can easily move it (or some of it) out again with Move Statements to Callers (217).
I move statements into a function when I can best understand these statements as part of the called function. If they don’t make sense as part of the called function, but still should be called with it, I’ll simply use Extract Function (106) on the statements and the called function. That’s essentially the same process as I describe below, but without the inline and rename steps.
MOVE STATEMENTS TO CALLERS
from:
emitPhotoData(outStream, person.photo);
function emitPhotoData(outStream, photo) {
outStream.write(`<p>title: ${photo.title}</p>\n`);
outStream.write(`<p>location: ${photo.location}</p>\n`);
}
to:
emitPhotoData(outStream, person.photo);
outStream.write(`<p>location: ${person.photo.location}</p>\n`);
function emitPhotoData(outStream, photo) {
outStream.write(`<p>title: ${photo.title}</p>\n`);
}
Functions are the basic building block of the abstractions we build as programmers. And, as with any abstraction, we don’t always get the boundaries right. As a code base changes its capabilities—as most useful software does—we often find our abstraction boundaries shift. For functions, that means that what might once have been a cohesive, atomic unit of behavior becomes a mix of two or more different things.
One trigger for this is when common behavior used in several places needs to vary in some of its calls. Now, we need to move the varying behavior out of the function to its callers. In this case, I’ll use Slide Statements (223) to get the varying behavior to the beginning or end of the function and then Move Statements to Callers. Once the varying code is in the caller, I can change it when necessary.
Move Statements to Callers works well for small changes, but sometimes the boundaries between caller and callee need complete reworking. In that case, my best move is to use Inline Function (115) and then slide and extract new functions to form better boundaries.
REPLACE INLINE CODE WITH FUNCTION CALL
from:
let appliesToMass = false;
for(const s of states) {
if (s === "MA") appliesToMass = true;
}
to:
appliesToMass = states.includes("MA");
Functions allow me to package up bits of behavior. This is useful for understanding—a named function can explain the purpose of the code rather than its mechanics. It’s also valuable to remove duplication: Instead of writing the same code twice, I just call the function. Then, should I need to change the function’s implementation, I don’t have to track down similarlooking code to update all the changes. (I may have to look at the callers, to see if they should all use the new code, but that’s both less common and much easier.)
If I see inline code that’s doing the same thing that I have in an existing function, I’ll usually want to replace that inline code with a function call. The exception is if I consider the similarity to be coincidental—so that, if I change the function body, I don’t expect the behavior in this inline code to change. A guide to this is the name of the function. A good name should make sense in place of inline code I have. If the name doesn’t make sense, that may be because it’s a poor name (in which case I use Rename Function (124) to fix it) or because the function’s purpose is different to what I want in this case—so I shouldn’t call it.
I find it particularly satisfying to do this with calls to library functions—that way, I don’t even have to write the function body.
SLIDE STATEMENTS
formerly: Consolidate Duplicate Conditional Fragments
from:
const pricingPlan = retrievePricingPlan();
const order = retreiveOrder();
let charge;
const chargePerUnit = pricingPlan.unit;
to:
const pricingPlan = retrievePricingPlan();
const chargePerUnit = pricingPlan.unit;
const order = retreiveOrder();
let charge;
Code is easier to understand when things that are related to each other appear together. If several lines of code access the same data structure, it’s best for them to be together rather than intermingled with code accessing other data structures. At its simplest, I use Slide Statements to keep such code together. A very common case of this is declaring and using variables. Some people like to declare all their variables at the top of a function. I prefer to declare the variable just before I first use it.
Usually, I move related code together as a preparatory step for another refactoring, often an Extract Function (106). Putting related code into a clearly separated function is a better separation than just moving a set of lines together, but I can’t do the Extract Function (106) unless the code is together in the first place.
SPLIT LOOP
from:
let averageAge = 0;
let totalSalary = 0;
for (const p of people) {
averageAge += p.age;
totalSalary += p.salary;
}
averageAge = averageAge / people.length;
to:
let totalSalary = 0;
for (const p of people) {
totalSalary += p.salary;
}
let averageAge = 0;
for (const p of people) {
averageAge += p.age;
}
averageAge = averageAge / people.length;
You often see loops that are doing two different things at once just because they can do that with one pass through a loop. But if you’re doing two different things in the same loop, then whenever you need to modify the loop you have to understand both things. By splitting the loop, you ensure you only need to understand the behavior you need to modify.
Splitting a loop can also make it easier to use. A loop that calculates a single value can just return that value. Loops that do many things need to return structures or populate local variables. I frequently follow a sequence of Split Loop followed by Extract Function (106).
Many programmers are uncomfortable with this refactoring, as it forces you to execute the loop twice. My reminder, as usual, is to separate refactoring from optimization (Refactoring and Performance (64)). Once I have my code clear, I’ll optimize it, and if the loop traversal is a bottleneck, it’s easy to slam the loops back together. But the actual iteration through even a large list is rarely a bottleneck, and splitting the loops often enables other, more powerful, optimizations.
REPLACE LOOP WITH PIPELINE
from:
const names = [];
for (const i of input) {
if (i.job === "programmer")
names.push(i.name);
}
to:
const names = input
.filter(i => i.job === "programmer")
.map(i => i.name)
;
Like most programmers, I was taught to use loops to iterate over a collection of objects. Increasingly, however, language environments provide a better construct: the collection pipeline. Collection Pipelines allow me to describe my processing as a series of operations, each consuming and emitting a collection. The most common of these operations are map, which uses a function to transform each element of the input collection, and filter which uses a function to select a subset of the input collection for later steps in the pipeline. I find logic much easier to follow if it is expressed as a pipeline—I can then read from top to bottom to see how objects flow through the pipeline.
REMOVE DEAD CODE
if(false) {
doSomethingThatUsedToMatter();
}
When we put code into production, even on people’s devices, we aren’t charged by weight. A few unused lines of code don’t slow down our systems nor take up significant memory; indeed, decent compilers will instinctively remove them. But unused code is still a significant burden when trying to understand how the software works. It doesn’t carry any warning signs telling programmers that they can ignore this function as it’s never called any more, so they still have to spend time understanding what it’s doing and why changing it doesn’t seem to alter the output as they expected.
Once code isn’t used any more, we should delete it. I don’t worry that I may need it sometime in the future; should that happen, I have my version control system so I can always dig it out again. If it’s something I really think I may need one day, I might put a comment into the code that mentions the lost code and which revision it was removed in—but, honestly, I can’t remember the last time I did that, or regretted that I hadn’t done it.
Commenting out dead code was once a common habit. This was useful in the days before version control systems were widely used, or when they were inconvenient. Now, when I can put even the smallest code base under version control, that’s no longer needed.
Organizing Data
SPLIT VARIABLE
formerly: Remove Assignments to Parameters
from:
let temp = 2 * (heigh + width)
console.log(temp);
temp = height * width;
console.log(temp);
to:
const perimeter = 2 * (heigh + width)
console.log(perimeter);
const area = height * width;
console.log(area);
Variables have various uses. Some of these uses naturally lead to the variable being assigned to several times. Loop variables change for each run of a loop (such as the i in for (let i=0; i<10; i++)). Collecting variables store a value that is built up during the method.
Many other variables are used to hold the result of a longwinded bit of code for easy reference later. These kinds of variables should be set only once. If they are set more than once, it is a sign that they have more than one responsibility within the method.
Any variable with more than one responsibility should be replaced with multiple variables, one for each responsibility. Using a variable for two different things is very confusing for the reader.
RENAME FIELD
from
class Organization {
get name() {}
}
to
class Organization {
get title() {}
}
Names are important, and field names in record structures can be especially important when those record structures are widely used across a program. Data structures play a particularly important role in understanding. Many years ago Fred Brooks said, “Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.” While I don’t see many people drawing flowcharts these days, the adage remains valid. Data structures are the key to understanding what’s going on.
Since these data structures are so important, it’s essential to keep them clear. Like anything else, my understanding of data improves the more I work on the software, so it’s vital that this improved understanding is embedded into the program. You may want to rename a field in a record structure, but the idea also applies to classes. Getter and setter methods form an effective field for users of the class. Renaming them is just as important as with bare record structures.
REPLACE DERIVED VARIABLE WITH QUERY
from:
get discountedTotal() {
return this._discountedTotal;
}
set discount(aNumber) {
const old = this._discount;
this._discount = aNumber;
this._discountedTotal += old - aNumber;
}
to:
get discountedTotal() {return this._baseTotal - this._discount;}
set discount(aNumber) {this._discount = aNumber;}
One of the biggest sources of problems in software is mutable data. Data changes can often couple together parts of code in awkward ways, with changes in one part leading to knockon effects that are hard to spot. In many situations it’s not realistic to entirely remove mutable data—but I do advocate minimizing the scope of mutable data at much as possible.
One way I can make a big impact is by removing any variables that I could just as easily calculate. A calculation often makes it clearer what the meaning of the data is, and it is protected from being corrupted when you fail to update the variable as the source data changes.
A reasonable exception to this is when the source data for the calculation is immutable and we can force the result to being immutable too. Transformation operations that create new data structures are thus reasonable to keep even if they could be replace with calculations. Indeed, there is a duality here between objects that wrap a data structure with a series of calculated properties and functions that transform one data structure into another. The object route is clearly better when the source data changes and you would have to manage the lifetime of the derived data structures. But if the source data is immutable, or the derived data is very transient, then both approaches are effective.
CHANGE REFERENCE TO VALUE
from:
class Product {
applyDiscount(arg) {
this._price.amount -= arg;
}
}
to:
class Product {
applyDiscount(arg) {
this._price = new Money(this._price.amount - arg, this._price.currency);
}
}
When I nest an object, or data structure, within another I can treat the inner object as a reference or as a value. The difference is most obviously visible in how I handle updates of the inner object’s properties. If I treat it as a reference, I’ll update the inner object’s property keeping the same inner object. If I treat it as a value, I will replace the entire inner object with a new one that has the desired property.
If I treat a field as a value, I can change the class of the inner object to make it a Value Object. Value objects are generally easier to reason about, particularly because they are immutable. In general, immutable data structures are easier to deal with. I can pass an immutable data value out to other parts of the program and not worry that it might change without the enclosing object being aware of the change. I can replicate values around my program and not worry about maintaining memory links. Value objects are especially useful in distributed and concurrent systems.
This also suggests when I shouldn’t do this refactoring. If I want to share an object between several objects so that any change to the shared object is visible to all its collaborators, then I need the shared object to be a reference.
CHANGE VALUE TO REFERENCE
from:
let customer = new Customer(customerData);
to:
let customer = customerRepository.get(customerData.id);
A data structure may have several records linked to the same logical data structure. I might read in a list of orders, some of which are for the same customer. When I have sharing like this, I can represent it by treating the customer either as a value or as a reference. With a value, the customer data is copied into each order; with a reference, there is only one data structure that multiple orders link to.
If the customer never needs to be updated, then both approaches are reasonable. It is, perhaps, a bit confusing to have multiple copies of the same data, but it’s common enough to not be a problem. In some cases, there may be issues with memory due to multiple copies—but, like any performance issue, that’s relatively rare.
The biggest difficulty in having physical copies of the same logical data occurs when I need to update the shared data. I then have to find all the copies and update them all. If I miss one, I’ll get a troubling inconsistency in my data. In this case, it’s often worthwhile to change the copied data into a single reference. That way, any change is visible to all the customer’s orders.
Changing a value to a reference results in only one object being present for an entity, and it usually means I need some kind of repository where I can access these objects. I then only create the object for an entity once, and everywhere else I retrieve it from the repository.
Simplifying Conditional Expressions
DECOMPOSE CONDITIONAL
from:
if (!aDate.isBefore(plan.summerStart) && !aDate.isAfter(plan.summerEnd))
charge = quantity * plan.summerRate;
else
charge = quantity * plan.regularRate + plan.regularServiceCharge;
to:
if (summer())
charge = summerCharge();
else
charge = regularCharge();
One of the most common sources of complexity in a program is complex conditional logic. As I write code to do various things depending on various conditions, I can quickly end up with a pretty long function. Length of a function is in itself a factor that makes it harder to read, but conditions increase the difficulty. The problem usually lies in the fact that the code, both in the condition checks and in the actions, tells me what happens but can easily obscure why it happens.
As with any large block of code, I can make my intention clearer by decomposing it and replacing each chunk of code with a function call named after the intention of that chunk. With conditions, I particularly like doing this for the conditional part and each of the alternatives. This way, I highlight the condition and make it clear what I’m branching on. I also highlight the reason for the branching.
CONSOLIDATE CONDITIONAL EXPRESSION
from:
if (anEmployee.seniority < 2) return 0;
if (anEmployee.monthsDisabled > 12) return 0;
if (anEmployee.isPartTime) return 0;
to:
if (isNotEligableForDisability()) return 0;
function isNotEligableForDisability() {
return ((anEmployee.seniority < 2)
|| (anEmployee.monthsDisabled > 12)
|| (anEmployee.isPartTime));
}
Sometimes, I run into a series of conditional checks where each check is different yet the resulting action is the same. When I see this, I use and and or operators to consolidate them into a single conditional check with a single result.
Consolidating the conditional code is important for two reasons. First, it makes it clearer by showing that I’m really making a single check that combines other checks. The sequence has the same effect, but it looks like I’m carrying out a sequence of separate checks that just happen to be close together. The second reason I like to do this is that it often sets me up for Extract Function (106). Extracting a condition is one of the most useful things I can do to clarify my code. It replaces a statement of what I’m doing with why I’m doing it.
The reasons in favor of consolidating conditionals also point to the reasons against doing it. If I consider it to be truly independent checks that shouldn’t be thought of as a single check, I don’t do the refactoring.
REPLACE NESTED CONDITIONAL WITH GUARD CLAUSES
from:
function getPayAmount() {
let result;
if (isDead)
result = deadAmount();
else {
if (isSeparated)
result = separatedAmount();
else {
if (isRetired)
result = retiredAmount();
else
result = normalPayAmount();
}
}
return result;
}
to:
function getPayAmount() {
if (isDead) return deadAmount();
if (isSeparated) return separatedAmount();
if (isRetired) return retiredAmount();
return normalPayAmount();
}
I often find that conditional expressions come in two styles. In the first style, both legs of the conditional are part of normal behavior, while in the second style, one leg is normal and the other indicates an unusual condition.
These kinds of conditionals have different intentions—and these intentions should come through in the code. If both are part of normal behavior, I use a condition with an if and an else leg. If the condition is an unusual condition, I check the condition and return if it’s true. This kind of check is often called a guard clause.
The key point of Replace Nested Conditional with Guard Clauses is emphasis. If I’m using an ifthenelse construct, I’m giving equal weight to the if leg and the else leg.
This communicates to the reader that the legs are equally likely and important. Instead, the guard clause says, “This isn’t the core to this function, and if it happens, do something and get out.”
I often find I use Replace Nested Conditional with Guard Clauses when I’m working with a programmer who has been taught to have only one entry point and one exit point from a method. One entry point is enforced by modern languages, but one exit point is really not a useful rule. Clarity is the key principle: If the method is clearer with one exit point, use one exit point; otherwise don’t.
REPLACE CONDITIONAL WITH POLYMORPHISM
from:
switch (bird.type) {
case 'EuropeanSwallow':
return "average";
case 'AfricanSwallow':
return (bird.numberOfCoconuts > 2) ? "tired" : "average";
case 'NorwegianBlueParrot':
return (bird.voltage > 100) ? "scorched" : "beautiful";
default:
return "unknown";
to:
class EuropeanSwallow {
get plumage() {
return "average";
}
}
class AfricanSwallow {
get plumage() {
return (this.numberOfCoconuts > 2) ? "tired" : "average";
}
}
class NorwegianBlueParrot {
get plumage() {
return (this.voltage > 100) ? "scorched" : "beautiful";
}
]
Complex conditional logic is one of the hardest things to reason about in programming, so I always look for ways to add structure to conditional logic. Often, I find I can separate the logic into different circumstances—highlevel cases—to divide the conditions. Sometimes it’s enough to represent this division within the structure of a conditional itself, but using classes and polymorphism can make the separation more explicit.
A common case for this is where I can form a set of types, each handling the conditional logic differently. I might notice that books, music, and food vary in how they are handled because of their type. This is made most obvious when there are several functions that have a switch statement on a type code. In that case, I remove the duplication of the common switch logic by creating classes for each case and using polymorphism to bring out the typespecific behavior.
Another situation is where I can think of the logic as a base case with variants. The base case may be the most common or most straightforward. I can put this logic into a superclass which allows me to reason about it without having to worry about the variants. I then put each variant case into a subclass, which I express with code that emphasizes its difference from the base case.
Polymorphism is one of the key features of objectoriented programming—and, like any useful feature, it’s prone to overuse. I’ve come across people who argue that all examples of conditional logic should be replaced with polymorphism. I don’t agree with that view. Most of my conditional logic uses basic conditional statements—if/else and switch/case. But when I see complex conditional logic that can be improved as discussed above, I find polymorphism a powerful tool.
INTRODUCE SPECIAL CASE
from:
if (aCustomer === "unknown") customerName = "occupant";
to:
class UnknownCustomer {
get name() {return "occupant";}
}
A common case of duplicated code is when many users of a data structure check a specific value, and then most of them do the same thing. If I find many parts of the code base having the same reaction to a particular value, I want to bring that reaction into a single place.
A good mechanism for this is the Special Case pattern where I create a specialcase element that captures all the common behavior. This allows me to replace most of the specialcase checks with simple calls.
A special case can manifest itself in several ways. If all I’m doing with the object is reading data, I can supply a literal object with all the values I need filled in. If I need more behavior than simple values, I can create a special object with methods for all the common behavior. The specialcase object can be returned by an encapsulating class, or inserted into a data structure with a transform.
A common value that needs specialcase processing is null, which is why this pattern is often called the Null Object pattern. But it’s the same approach for any special case—I like to say that Null Object is a special case of Special Case.
INTRODUCE ASSERTION
from:
if (this.discountRate)
base = base - (this.discountRate * base);
}
to:
assert(this.discountRate >= 0);
if (this.discountRate)
base = base - (this.discountRate * base);
Often, sections of code work only if certain conditions are true. This may be as simple as a square root calculation only working on a positive input value. With an object, it may require that at least one of a group of fields has a value in it.
Such assumptions are often not stated but can only be deduced by looking through an algorithm. Sometimes, the assumptions are stated with a comment. A better technique is to make the assumption explicit by writing an assertion.
An assertion is a conditional statement that is assumed to be always true. Failure of an assertion indicates a programmer error. Assertion failures should never be checked by other parts of the system. Assertions should be written so that the program functions equally correctly if they are all removed; indeed, some languages provide assertions that can be disabled by a compiletime switch.
I often see people encourage using assertions in order to find errors. While this is certainly a Good Thing, it’s not the only reason to use them. I find assertions to be a valuable form of communication—they tell the reader something about the assumed state of the program at this point of execution. I also find them handy for debugging, and their communication value means I’m inclined to leave them in once I’ve fixed the error I’m chasing. Selftesting code reduces their value for debugging, as steadily narrowing unit tests often do the job better, but I still like assertions for communication.
Making Method Calls Simpler
Rename Method
- Problem: The name of a method does not explain what the method does.
- Solution: Rename the method.
Add Parameter
- Problem: A method does not have enough data to perform certain actions.
- Solution: Create a new parameter to pass the necessary data.
Remove Parameter
- Problem: A parameter is not used in the body of a method.
- Solution: Remove the unused parameter.
Separate Query from Modifier
- Problem: Do you have a method that returns a value but also changes something inside an object?
- Solution: Split the method into two separate methods. As you would expect, one of them should return the value and the other one modifies the object.
Parameterize Method
- Problem: Multiple methods perform similar actions that are different only in their internal values, numbers or operations.
- Solution: Combine these methods by using a parameter that will pass the necessary special value.
Replace Parameter with Explicit Methods
- Problem: A method is split into parts, each of which is run depending on the value of a parameter.
- Solution: Extract the individual parts of the method into their own methods and call them instead of the original method.
Preserve Whole Object
- Problem: You get several values from an object and then pass them as parameters to a method.
- Solution: Instead, try passing the whole object.
Replace Parameter with Method Call
- Problem: Before a method call, a second method is run and its result is sent back to the first method as an argument. But the parameter value could have been obtained inside the method being called.
- Solution: Instead of passing the value through a parameter, place the value-getting code inside the method.
Introduce Parameter Object
- Problem: Your methods contain a repeating group of parameters.
- Solution: Replace these parameters with an object.
Remove Setting Method
- Problem: The value of a field should be set only when it is created, and not change at any time after that.
- Solution: So remove methods that set the field’s value.
Hide Method
- Problem: A method is not used by other classes or is used only inside its own class hierarchy.
- Solution: Make the method private or protected.
Dealing with Generalization
Generalization produces its own batch of refactorings, mostly dealing with moving methods around a hierarchy of inheritance. Pull Up Field (320) and Pull Up Method (322) both promote function up a hierarchy, and Push Down Method (328) and Push Down Field (329) push function downward. Constructors are a little bit more awkward to pull up, so Pull Up Constructor Body (325) deals with those issues. Rather than pushing down a constructor, it is often useful to use Replace Constructor with Factory Method (304).
Pull Up Method
from:
class Employee {...} // Java
class Salesman extends Employee {
private String name;
}
class Engineer extends Employee {
private String name;
}
to:
class Employee {
protected String name;
}
class Salesman extends Employee {...}
class Engineer extends Employee {...}
Eliminating duplicate code is important. Two duplicate methods may work fine as they are, but they are nothing but a breeding ground for bugs in the future. Whenever there is duplication, there is risk that an alteration to one copy will not be made to the other. Usually, it is difficult to find the duplicates. The easiest case of using Pull Up Method is when the methods have the same body, implying there’s been a copy and paste. Of course it’s not always as obvious as that. I could just do the refactoring and see if the tests croak—but that puts a lot of reliance on my tests. I usually find it valuable to look for the differences—often, they show up behavior that I forgot to test for.
Push Down Method
from:
class Employee {
get quota {...}
}
class Engineer extends Employee {...}
class Salesman extends Employee {...}
to:
class Employee {...}
class Engineer extends Employee {...}
class Salesman extends Employee {
get quota {...}
}
If a method is only relevant to one subclass (or a small proportion of subclasses), removing it from the superclass and putting it only on the subclass(es) makes that clearer. I can only do this refactoring if the caller knows it’s working with a particular subclass—otherwise, I should use Replace Conditional with Polymorphism (272) with some placebo behavior on the superclass.
Pull Up Constructor Body
from:
class Party {...}
class Employee extends Party {
constructor(name, id, monthlyCost) {
super();
this._id = id;
this._name = name;
this._monthlyCost = monthlyCost;
}
}
to:
class Party {
constructor(name){
this._name = name;
}
}
class Employee extends Party {
constructor(name, id, monthlyCost) {
super(name);
this._id = id;
this._monthlyCost = monthlyCost;
}
}
Constructors are tricky things. They aren’t quite normal methods—so I’m more restricted in what I can do with them.
If I see subclass methods with common behavior, my first thought is to use Extract Function (106) followed by Pull Up Method (350), which will move it nicely into the superclass. Constructors tangle that—because they have special rules about what can be done in what order, so I need a slightly different approach. If this refactoring starts getting messy, I reach for Replace Constructor with Factory Function (334).
Replace Constructor with Factory Method
from:
leadEngineer = new Employee(document.leadEngineer, 'E');
to:
leadEngineer = createEngineer(document.leadEngineer);
Many objectoriented languages have a special constructor function that’s called to initialize an object. Clients typically call this constructor when they want to create a new object. But these constructors often come with awkward limitations that aren’t there for more general functions. A Java constructor must return an instance of the class it was called with, which means I can’t replace it with a subclass or proxy depending on the environment or parameters. Constructor naming is fixed, which makes it impossible for me to use a name that is clearer than the default. Constructors often require a special operator to invoke (“new” in many languages) which makes them difficult to use in contexts that expect normal functions.
A factory function suffers from no such limitations. It will likely call the constructor as part of its implementation, but I can freely substitute something else.
Refactoring to Patterns
Refactoring to Patterns is an interesting merger of two of the most important concepts in software engineering that have arisen in the last ten years. However, it does not belong on every software engineer’s bookshelf. For those who would expect it to be a comprehensive reference tool, the text will be disappointing. However, for those with the ability, time, and desire to extend the ideas presented, it will be a very useful read. The existing pattern literature provides us with the what, the when, the why, and the where of a pattern’s use, but not the how. The major contribution of Refactoring to Patterns is that it provides a solid explanatory framework for describing how one can introduce and remove patterns from code. Just as patterns and refactorings were introduced in their respective texts, Refactoring to Patterns does not purport to be a complete volume, but a starting point. To Kerievsky’s credit, he does acknowledge that the book is a work in progress that is being published now because of its usefulness, as opposed to its completeness. While some books fulfill their mission as a reference, Refactoring to Patterns should be read through before serving as a tool. It is, in many ways, a series of case studies, as opposed to a catalog of refactorings.
- Chain Constructors
- Replace Multiple Constructors with Creation Methods
- Encapsulate Subclasses with Creation Methods
- Extract Creation Class
- Replace Conditional Calculations with Strategy
- Replace Implicit Tree with Composite
- Encapsulate Composite with Builder
- Extract Special-Case Logic into Decorators
- Replace Hard-Coded Notifications with Observer
- Move Accumulation to Collecting Parameter
- Replace One/Many Distinctions with Composite
- Compose Method
- Separate Versions with Adapters
- Adapting Legacy Systems
- Replace Hard-Coded Notifications with Observer
- Move Accumulation to Collecting Parameter