Stop Pasting Code
This topic seems almost too basic to even discuss but I've run into these issues fairly often so appearently they do need discussion

When coding somethings to keep in mind, how can we code both efficiently and defensively. In other words, how can we get things done faster and with less bugs.

Here's a few ideas. They might seem obvious but like I said, I see lots of code that doesn't follow these ideas and so I'm bringing them up.

#1) If you do something a lot, figure out a way to make it easier to do.

Here's two examples from my own experience. I have a library to read .ini files. .Ini files are something that Windows first introduced me to. They are a very simple text based format that look something like this

; ---- game settings ----
[player]
numlives=5
 
[ogre]
hitpoints=20
weapon=sword
speed=1.34
 
[dragon]
hitpoints=300
weapon=firebreath
speed=6.5
I wrote a library to read these files. It parses the file into sections and label-value pairs. Let's assume I started off with an interface something like this

// some global vars because it's simple to understand
int    g_playerNumLives;
int    g_ogreHitPoints;
string g_ogreWeapon;
float  g_ogreSpeed;
int    g_dragonHitPoints;
string g_dragonWeapon;
float  g_dragonSpeed;
 
// ...
 
const IniFile* pIni = ReadIniFile("c:/test/game_setttings.ini");
const IniSection* pSection = pIni->GetFirstSection();
while (pSection != NULL)
{
    if (strcmp(pSection->GetName(), "player") == 0)
    {
        const IniLine* pLine = pSection->GetFirstLine();
        while (pLine != NULL)
        {
            if (strcmp(pLine->GetId(), "numlives") == 0)
            {
               g_playerNumLives = atol(pLine->GetValue());
            }
            pLine = pSection->GetNextLine(pLine);
        }
    }
    else if (strcmp(pSection->GetName(), "ogre") == 0)
    {
        const IniLine* pLine = pSection->GetFirstLine();
        while (pLine != NULL)
        {
            if (strcmp(pLine->GetId(), "hitpoints") == 0)
            {
               g_orgeHitPoints = atol(pLine->GetValue());
            }
            else if (strcmp(pLine->GetId(), "weapon") == 0)
            {
               g_orgeWeapon = pLine->GetValue();
            }
            else if (strcmp(pLine->GetId(), "speed") == 0)
            {
               g_orgeSpeed = (float)atof(pLine->GetValue());
            }
            pLine = pSection->GetNextLine(pLine);
        }
    }
    else if (strcmp(pSection->GetName(), "dragon") == 0)
    {
        const IniLine* pLine = pSection->GetFirstLine();
        while (pLine != NULL)
        {
            if (strcmp(pLine->GetId(), "hitpoints") == 0)
            {
               g_dragonHitPoints = atol(pLine->GetValue());
            }
            else if (strcmp(pLine->GetId(), "weapon") == 0)
            {
               g_dragonWeapon = pLine->GetValue();
            }
            else if (strcmp(pLine->GetId(), "speed") == 0)
            {
               g_dragonSpeed = (float)atof(pLine->GetValue());
            }
            pLine = pSection->GetNextLine(pLine);
        }
    } 
    pSection = pIni->GetNextSection(pSection);
}
That seems fine and I've seen lots of code like that but let's look at ways to improve it both for efficiency and for preventing bugs. And, as a someone who creates lots of tools to support less techie users, let's try to make it more user friendly.

Some of the things that come to mind.

We could add a way to find a specific section instead of having to loop through all of them. We could provide functions for getting values by name within a section. If we did that the code would then look something like this.

// ...
 
const IniFile* pIni = ReadIniFile("c:/test/game_setttings.ini");
const IniSecton* pSection;
pSection          = pIni->FindSection("player");
g_playerNumLives  = pSection->GetInt("numlives");
pSection          = pIni->FindSection("ogre");
g_orgeHitPoints   = pSection->GetInt("hitpoints");
g_orgeWeapon      = pSection->GetString("weapon");
g_orgeSpeed       = pSection->GetFloat("speed");
pSection          = pIni->FindSection("dragon");
g_dragonHitPoints = pSection->GetInt("hitpoints");
g_dragonWeapon    = pSection->GetString("weapon");
g_dragonSpeed     = pSection->GetFloat("speed");
Of course it took a little more thought and time to write IniFile::FindSection() and the IniSection::Get???() functions but in the end, all the new code I write is much smaller and much simpler.

Once I've changed to this style of API it's easier to add new functionally to make it more robust. For example, originally when I read "numlives" the code used atol(). Well, the code in IniSection::GetInt handles not only formats like 123, -456 but it also handles hex formats like 0xABCD or $abcd or even ABCDh as well as binary formats like 0101010b. That means I only have to change that code in the library and all programs that use the library get that new functionality.

Another is the find function. It doesn't just use strcmp but instead uses stricmp making it case-insensitive which is better for less techie users. Because it's in one place, all code that uses the library gets the same treatment. If we leave the compare external like before, then it's up to the whim of each programmer if they call strcmp or stricmp or something else. Since we put it in a library we get more consistancy. We don't have to give some rules to try to get all programmers to conform. Conformancy and consistency become free and it's even less work.

Other things we could do. We could change the interface so that each Get___ function takes a reference and returns a bool if success or failure. We could also supply a default and maybe print an error if the value is not found.

// ...
 
const IniFile* pIni = ReadIniFile("c:/test/game_setttings.ini");
const IniSection* pSection;
pSection = pIni->FindSection("player");
pSection->GetInt(g_playerNumLives, "numlives", 10);
pSection = pIni->FindSection("ogre");
pSection->GetInt(g_orgeHitPoints, "hitpoints", 10);
pSection->GetString(g_orgeWeapon, "weapon", "club");
pSection->GetFloat(g_orgeSpeed, "speed", 1.0f);
pSection = pIni->FindSection("dragon");
bool error = pSection->GetInt(g_dragonHitPoints, "hitpoints", 100) |
             pSection->GetString(g_dragonWeapon, "weapon", "fire") |
             pSection->GetFloat(g_dragonSpeed, "speed", 1.0f)      ;
if (error)
{
    printf ("dragon was missing settings\n");
}
I only showed an example of the dragon checking for errors where it only cares if any field was missing where as the other ones don't care. Of course if the library itself printed the errors that saves us the trouble of doing it ourselves. That could be problematic since every program usually has a different way of reporting errors but putting the error reporting in will save time most of the time and in the worst case you could design your library to take an error reporting callback that you could use to get it to report errors to fit whatever program you are using the library in.

In another example. We could do a similar thing with IniFile::FindSection. Make it take a reference to a section pointer that it fills in and have it return a bool for errors.

bool IniFile::FindSection(const IniSection*& pSection, const char* pSectionName)
{
   bool status = true; // assume success;
   // find the section.
   ...
   // if the section does not exist set pSection to a fake section with no fields
   if (sectionNotFound)
   {
      static IniSection bogus;
 
      pSection = &bogus;
      status   = false; // return failure
      printf ("couldn't find section (%s)\n", pSectionName);
   }
   return status;
}
Now, even if the section does not exist your code doesn't have to check for it but it still can if it wants to. This is important, at least in game development, the game needs to run at all times. With teams of 50 to 200 people you can't have the game fail to run when one person puts in bad data but at the same time you don't want errors to go un-noticed. With a design like this the game will continue to run but errors will be printed for any section not found and for any fields missing and resonable defaults will be supplied if they are missing.

We've made the code easier to use. We've made it harder to use wrong. You don't have to check for errors if you don't want but they will be reported. You can still check though if you absolutely need to.

You can apply similar ideas to other areas of your code.

One example, don't hand parse command line arguments. Languages like Perl or Python come with command line parsing libaries. Use them. For C/C++, write one or use one. Stop copying and pasting code. Stop writing more code than you need to. Stop making your code have to work hard.

The standard parsers deal with all kinds of issues and they make your code consistant. You don't have worry if some command format is
command -arg1 -arg2=param
and another is
command -arg1 -arg2 param
and another is
command /arg /arg2 param
You don't have to wonder if
command "thing with space"
will work and even if you wrote your own library, if you find a bug it's fixed in one place and that bug gets fixed in all your tools. My library also handles responses files so if I need to supply 1000 files names or something I can put them in a file can do
command @listoffiles.txt
Anyway, maybe I shouldn't have used the commandline example because I know the majority of programs I've seen the source too don't use libraries or the standards. But, those same issues above are so common in other areas as well. I've seen this before.

void FunctionThatEffectsAGlobal()
{
   ... // effect global here
}
Later it's decided to make the code threaded the globals need to be locked for thread safe access. I've seen this alot.

FunctionToLock();
FunctionThatEffectsAGlobal()
FunctionToUnlock();
 
...
 
FunctionToLock();
FunctionThatEffectsAGlobal()
FunctionToUnlock();
 
...
 
FunctionToLock();
FunctionThatEffectsAGlobal()
FunctionToUnlock();
where as wouldn't this be better?

void FunctionThatEffectsAGlobal()
{
   FunctionToLock();
   ... // effect global here
   FunctionToUnlock();
}
Not only is it less code, I only had to change one place, not every place that called the function that effects a global. It also means you can't call that function incorrectly. You can't accidentally call it without locking which would be a really hard to find bug in a multi threaded program. Sure, the first time you do it it might surround some global access with a lock. But the second time you find yourself locking the same variable somewhere else in the code should trigger the thought that maybe you should turn it into a function.

The title of this article is "Stop Pasting Code". The reason is I see so many programmers code something once and then copy and paste it again and again in the same program. Sure, at first it seems like the easiest thing to do but in the long run it makes your code harder to maintain, harder to know what the right thing to do is and harder to make global changes. And, it's even in the end more work for you. So please, give it a little more thought and see if it wouldn't be better to refactor a little if you find yourself copying and pasting code.



Pass it on

Comments:
I agree with the title the post; in fact its so natural for me to turn reuseable code into functions, I hadn't really thought about it explicitly before, but I guess some people don't code like this.

For performance reasons, I don't agree with your Locking functionality. What if you need to call FunctionThatEffectsAGlobal() hundreds of times in rapid succession? The overhead of all those locks and unlocks would grind you to a halt; its much more efficient to Lock(), do the work (100s), Unlock().
posted by mrdonutJune 9, 2008 at 20:42 [ e ]

So put a count your locks.

static int lockCount;

void Lock()
{
  if (lockCount == 0)
  {
     ReallyLock();
  }
  ++lockCount;
}

void Unlock()
{
  --lockCount;
  if (lockCount == 0)
  {
    ReallyUnlock();
  }
}

void FuncThatAccessesGlobal()
{
   Lock();
   ...access global here...
   Unlock();
}

void CodeThatAccessesGlobalsALot(
)
{
   Lock();
   FuncThatAccessesGlobal();
   FuncThatAccessesGlobal();
   FuncThatAccessesGlobal();
   FuncThatAccessesGlobal();
   FuncThatAccessesGlobal();
   FuncThatAccessesGlobal();
   FuncThatAccessesGlobal();
   Unlock();
}

Put something to lock/unlock in the function and if you have a bunch of calls all together put a lock/unlock around all of them. You won't waste time relocking/unlocking but you still make an API that is hard to use wrong.

Of course if even that is too slow then optimize your special case but it's almost always better to code for safety and easy of use and then profile later to get rid of bottlenecks than it is to code fast and lose and lose hours, days and more because of hidden bugs that could have been prevented by better code design.

posted by greggmanJune 9, 2008 at 23:15 [ e ]
I'm afraid that doesn't work. Taking the Lock() function, its possible for two threads to simultaneously pass the

if (lockCount == 0)

conditional and then both execute the Lock() before setting lockCount. What you need is an atomic test-and-set, which can't be done in straight C/C++; you need either assembly or semaphores.
posted by mrdonutJune 10, 2008 at 12:07 [ e ]

Fine. Make lockcount thread specific. Sheesh!

I still stand by my point. Copying and pasting code is for bad and desiging APIs to be as error proof as possible is good. Premature optimization is also bad.

Picking nits is like correcting someone's spelling.

posted by greggmanJune 11, 2008 at 0:11 [ e ]
Haha, well, yes I do agree with the points you're making, its just the locking example I don't agree with But that discussion probably belongs under "multi-threaded programming techniques" rather than this topic...

I'll add that I don't think this is *premature* optimization; I prefer to write the API so that it can be optimized 'behind the scenes' without changing the API after implementation. Changing the way the user has to lock/unlock a global would be an API change, and undesirable in my belief -- coz normally  once you've chosen an API for your code, its very hard to change it.
posted by mrdonutJune 11, 2008 at 12:20 [ e ]
Gregg, the code blocks are going far on left - with no line return, and the whole view is broken here. I can't read this post. Using Firefox 2.0.0.14.
posted by omarJuly 1, 2008 at 8:46 [ e ]
refresh. It should be fixed.
posted by greggmanJuly 2, 2008 at 0:49 [ e ]
Thanks! it works.
posted by omarJuly 2, 2008 at 1:10 [ e ]