Archive | Writing Code RSS feed for this section

Announcing Castalia 2014.1, with Function Prototype Synchronization

I’m excited to announce the first release of Castalia for 2014.

The best part is an exciting new feature: Function Prototype Synchronization.

Here’s the deal: Most functions (or procedures) that you write in Delphi have their prototype in two places: One in the interface section of the unit, and the other in the implementation section. If you want to change the prototype (add a parameter, change from a procedure to a function, etc…), you have to change it in two places. It gets annoying.

So, I added a “Function Prototype Synchronization” tool to Castalia. Here’s how it works:

When you want to change the prototype, make the change, in either the implementation or interface prototype.

After you’ve made your changes, hit a hotkey (Ctrl+Shift+S by default), and Castalia will automatically synchronize the two prototypes.

It’s that simple.

Here’s a quick little video showing it in action:

Get Adobe Flash player

Function Prototype Synchronization is available in the Professional and Suite editions of Castalia.

(A couple notes: Prototype Synchronization only changes the method you’re working on. It won’t change ancestor or descendant classes. Prototype Synchronization does not [currently] work with overloaded methods.)

Here’s what else is new in Castalia 2014.1:

  • Completely redesigned options dialog
  • Fixed: Edit cursor doesn’t appear correctly when the file being edited has syntax errors
  • Fixed: Error in unit resolution which caused “Eliminate With” refactoring to not work properly
  • Several other small bugfixes and performance improvements

Users with a current maintenance subscription can download Castalia 2014.1 right now at Everyone else can try it free at

Read full story · Comments { 2 }

Castalia 2013.4.1 is now Available

Merry Christmas!

I hope everyone has been enjoying time with loved ones for the holidays. I certainly have, but today I put the finishing touches on a Castalia bugfix release, version 2013.4.1.

Here’s what’s new:

  • Fixed: Access violation when switching between form/code in a datamodule under certain conditions.
  • Fixed: Parameter field width modifier triggers a syntax error if the parameter is a function call or typecast.

Also, I’ve begun a “user experience program” for Castalia. You’ve probably seen things like this before. It’s a system that collects some anonymous data about how you use Castalia, and sends it back to me so I can analyze the data and continue to improve it for you. It’s anonymous, and none of the code you write or anything else you produce is ever collected. If you’re uncomfortable with it, you don’t have to participate, but I hope you will.

Users with a current maintenance subscription can download Castalia 2013.4.1 at the customer site.

Everyone else can try a free trial of Castalia 2013.4.1.

Happy Holidays!

Read full story · Comments { 2 }

Making Slow Code go Faster: Measuring Speed to Improve Performance


rulerLast week, I wrote a somewhat lengthy blog post describing a number of things I’ve done recently to improve Castalia’s speed and responsiveness. Several of you asked for more information about the instrumentation that I used to measure performance times. In this post, I’ll describe two techniques for measuring the speed of a segment of code.

The very easy way

In a nutshell, measuring the speed of a segment of code is incredibly easy. You look at what time it is when the code starts running, then you look again when it’s done, and the difference tells you how long it took the code to run.

So, the only thing we need to figure out how to do is how to “look at what time it is.” I used two mechanisms to do this. The first one is the GetTickCount function. GetTickCount is a Windows API, not part of the VCL. It simply returns the number of milliseconds that have elapsed since the computer was started up. Here’s an example of how to use GetTickCount:

  StartTime, EndTime, Delta: Integer;
  StartTime := GetTickCount;
  //Code you want to measure here
  EndTime := GetTickCount;
  Delta := EndTime - StartTime;
  //Show Delta so you know the elapsed time

It’s really that simple. This will tell you how many milliseconds it took to execute the code in begin measured.

There is one problem with GetTickCount, however: It’s not reliably accurate for timespans less than about 15 milliseconds. Above that, it’s fine, and is OK for an approximation to figure out the big bottlenecks might be, but what if you want to measure the execution of a function that’s a lot faster than 15ms already?

The slightly less easy way

The next way takes just a little bit more work, but not much. Windows includes a mechanism called Performance Counters, intended for exactly this purpose. In working with Castalia, I used the QueryPerformanceCounter function exactly the same way as GetTickCount: Get the time at the start of the code I’m measuring, get the time again at the end, and record the difference.

While GetTickCount measures time in milliseconds, QueryPerformanceCounter’s measurement of time is hardware-dependent. In fact, QueryPerformanceCounter relies on the CPU having a high-resolution performance counter, and won’t work if your CPU doesn’t have one (as far as I’m aware, all modern desktop CPU’s have one, so you probably don’t need to worry about that).

Technical historical aside: At its inception, QueryPerformanceCounter was intended to be an easy way to access the Pentium CPU’s RDTSC instruction, which returns the number of CPU cycles that have elapsed. MultiProcessor/MultiCore/Hyperthreaded architectures have made measuring CPU cycles unreliable, so now RDTSC simply measures some very small unit of time.

Because it’s measuring much faster units of time, and needs much larger numbers, QueryPerformanceCounter requires an Int64. Here’s an example of using QueryPerformanceCounter to measure the time of some code:

  StartTime, EndTime, Delta: Int64;
  //Code you want to measure here
  Delta := EndTime - StartTime;
  //Show Delta so you know the elapsed time

That’s very simple too. We can use those numbers for comparisons as-is, but I like to translate them into milliseconds in order to get a more familiar unit of measure that’s also comparable between different systems. To do that, we need to know exactly what unit of time QueryPerformanceCounter is measuring. To get that information, we use the function QueryPerformanceFrequency.

QueryPerformanceFrequency tells us the number of units per second that the high-resolution performance counter is using. It also takes an Int64. We only need to call it once per application, since it doesn’t really change (actually, it can if the CPU gets throttled for power/temperature reasons, but lets assume that it won’t happen while you’re running your tests). To find out how many high-resolution units are in a millisecond, do something like this:

  Frequency, UnitsPerMS: Int64;
  UnitsPerMS := Frequency div 1000;

Then you can divide your performance Delta by UnitsPerMS to find out how many milliseconds your code is actually taking to run. For example, in a VM on my Sandy Bridge i7, Frequency = 3,579,545. That means there are approximately 3,580 “units” per millisecond. If the QueryPerformanceCounter delta on some code is 385, that means the code took 0.1 milliseconds to run.

That’s a MUCH higher resolution than GetTickCount

Why not use QueryPerformanceCounter all the time?

slowThere’s one problem with QueryPerformanceCounter: It can be very slow compared to GetTickCount. On the same Sandy Bridge i7 as above, with Windows 7, QueryPerformanceCounter is well over 100 times slower than GetTickCount.

This probably only matters if you’re measuring the speed of code that includes calls to QueryPerformanceCounter. This is easy to do (and not realize you’re doing) if you measure the time of a function that calls another function which you’re also instrumenting. That’s just something to keep in mind and be careful about.

If you’re wanting to ship instrumented code to users, be very careful about overusing QueryPerformanceCounter to do it; it can actually negatively affect the speed of your code.

Showing the information

In the code examples above, I sort of skipped an important step, which is actually displaying the delta values. There are a few ways to do that, each with their own set of caveats.

For a VCL Forms application, the easiest is probably with a call to ShowMessage(IntToStr(Delta)). This shows you the number quickly, but it also blocks the application, and would be downright impossible to manage if you were measuring each iteration of a large loop or each level of a deeply nested set of function calls.

Another alternative would be the OutputDebugString API. OutputDebugString logs any string to the debugger. In Delphi, this string will appear in the “Messages” window in the IDE. Use this when you want to record a lot of data as you use your application, then analyze it later. Beware, however, that this has the same caveat as QueryPerformanceCounter: It’s not a very fast function, and overusing it can have negative effects on your code’s performance. If you use calls to OutputDebugString in your performance-sensitive code, you’ll definitely want to remove them in your release build.

Finally, you could store the data internally, then show the result once using one of the mechanisms above. For example, if you were testing the speed of each iteration of a loop, and the loop was running 10,000 times, you could create an array of 10,000 integers, put the Delta from each loop in each one, and then display them all at once however you like (or, calculate an average and display that). This is the approach I used most with my recent work on Castalia, and it worked very well.

Final thoughts

If you want to improve performance, you have to first measure performance. Using GetTickCount or QueryPerformanceCounter, you can measure the speed of your code with varying degrees of reliability and resolution. Choose the one that best meets your needs.

Appendix A (Added about 7 hours after this post was initially published)

Reader Alan C posted a comment below pointing out the TStopWatch utility class, which encapsulates all of the above into a Delphi-friendly class. Because Castalia is compatible with Delphi versions back to Delphi 5, and TStopWatch was added in Delphi 2010, I didn’t use it in the work I’ve done with Castalia this year.

TStopWatch does indeed encapsulate the functionality described above, defaulting to use the high resolution timer if it’s available, and using GetTickCount otherwise. Still, even if you use TStopWatch, it’s good to understand the limitations and performance impact that it can have. Thanks Alan for pointing this out!

This post was written by Jacob Thurman, creator of Castalia, the Smart Code Editor for Delphi Programmers. Check out Castalia at

Read full story · Comments { 6 }

Making Slow Code go Fast

This year’s Castalia releases have seen huge improvements in performance.  Because performance optimization is important to Delphi programmers, and something we all have to face at some point, I’ve decided to write about what I did to get those performance gains. I hope this will help you to improve the code you write too.

Identifying the Problem

The Qualitative Problem

slowThere had been “qualitative” concerns about Castalia’s performance for a while. Users would write things like “it’s slow” or “it feels sluggish.” Talking to users who had these complaints, I learned that the performance complaints were in two areas:

1. The code editor would lag behind when scrolling using the mouse scroll wheel

2. The code editor could lag behind when typing code, if you typed sufficiently fast enough.

So, at the beginning of 2013, I determined that I was going to get the bottom of these issues and fix them once and for all, even if I had to rewrite the whole parser in hand-optimized assembler code.

The Quantitative Problem

rulerThe secret of real performance improvement is measurement. I needed some way to compare Castalia’s performance to the performance of Delphi without Castalia, and then to measure whether the improvements I would make were actually having an impact. Qualitative descriptions like “slow” or “sluggish” aren’t things you can compare; I needed real numeric data.

I spent a day writing some code instrumentation. In other words, I wrote a simple framework that would allow me to measure the exact time that certain operations took. I could time the execution of a function, a single line of code, or response to an event like a key press, mouse click, or a paint event in the code editor.

Next, I gathered data on current performance. First, I ran Delphi with only my instrumentation enabled, and no other Castalia features. This allowed me to see how fast Delphi responds to key presses, scroll wheel motion, and other events.

Then, I enabled all of Castalia’s features, and ran the exact same tests, collecting data to see how Delphi responded to those same events, but with Castalia doing its thing along the way.

Comparing these two data sets gave me the difference in performance between Delphi by itself and Delphi with Castalia. Now, instead of words like “slow” or “sluggish,” I had actual numbers. This turned a qualitative problem into a quantitative problem, with good measurements, and things that can be measured can be improved!

Finding Bottlenecks

trafficjamThe next step, now that I had this data, was to figure out where the problem actually was. The question I needed to answer was “What code is actually creating the problem?

I have to admit, I started down the wrong path on this. I had gotten a lot of emails from users saying things like “It’s slow because the parser runs after every keystroke,” and “The parser isn’t fast enough for this enormous file I’m working on.” Those are very reasonable complaints, and since certain parts of the Castalia parser do need to run for every keystroke in the editor, it had better be incredibly fast, even on very large files.

After several days of work, I had to come to the conclusion that the parser wasn’t the problem. The parser was already blindingly fast. It could churn through huge files in single-digit milliseconds. I even wrote a quick program to generate syntactically correct gibberish source code files, and had it generate a file of over 300,000 lines of code, which the Castalia parser handled in just a few milliseconds. The parser was plenty fast. The problem had to be somewhere else.

In hindsight, my next strategy was much smarter. I commented out the initialization code for every Castalia feature except the code that was necessary for the performance measurements. Then, I gradually re-enabled features one at a time, running the performance tests and collecting data again.

When enabling a feature created a spike in the response time to some event, I would add more and more instrumentation code as needed, until I had narrowed it down to some specific thing that I could work on optimizing. Then, I would work through that bottleneck, bringing the performance time down as low as possible.

Fixing the Problem(s)

toolsSometimes, when we’re fixing bugs or improving performance, we find a single fix that solves the problem. There have been many times where a seemingly complex bug was, in-fact, the result of a single-line error that only needed a single-line fix.

Unfortunately, there was no single fix that solved all of Castalia’s performance problems. The huge gains in performance that have happened this year are the cumulative result of a number of big improvements. There was one that was bigger than all the others, but it certainly wasn’t a small single-line fix. More on that later…

For the remainder of this post, I’m going to outline a few specific bottlenecks that existed, and how I fixed them.

Getting Source Code from the Editor

Castalia has a function called GetSourceFromModule that gets source code out of the Delphi editor so Castalia can do things with it. This routine was written years ago, before the IDE supported Unicode, and Unicode support had been rather inelegantly shoved in. Since this routine gets called fairly frequently, and wasn’t as fast as it could be, I rewrote the whole function, making a number of changes to make it faster:

  • First, I learned that it’s faster to declare a variable of type AnsiString and call SetLength multiple times than to allocate a new PAnsiChar each time you need a string of a different size.
  • I wrote smarter string manipulation code to eliminate a memory allocation and a string concatenation.
  • Finally, some smarter coding eliminated a string conversion from UTF8 to WideString for the Unicode-Enabled versions of Delphi.

These changes made the GetSourceFromModule routine 20% faster, and made overall keystroke response 7% faster.


Caching is a large and complex subject, about which entire books (and several dissertations and theses) have been written. The basic idea of caching is this: If generating data is expensive, but figuring out whether it needs to be regenerated is cheap, then put the generated data in a cache that’s cheap, and only re-generate it when it has changed, instead of every time it’s needed.

The GetSourceFromModule routine described above is a good example of this. Getting the source code out of the editor is actually fairly expensive (where “expensive” means time-consuming).  But figuring out if the code has changed since the last time you read it is much cheaper. So, instead of every Castalia feature that wants the source code calling GetSourceFromModule, there’s now a cache layer that takes care of calling GetSourceFromModule once for each time the code changes, and caching it into a string that it can give to Castalia’s various sub-modules much more quickly.

There were several more places where I implemented a caching strategy. The net result was an 11% improvement in keystroke response time.

Record Pooling

Castalia’s structural highlighting requires a fairly complex data structure to be generated. This data structure contains info about all of the extra syntax highlighting that Castalia needs to draw into the code editor. Because this structure has to be generated every time the code changes (because a code change could be a structural change), generating the structure needs to be incredibly fast.

This data structure consists of a large number of records, allocated with a call to New and destroyed with a call to Dispose. Calling New and Dispose thousands of time for each regeneration of the tree proved to be a bottleneck, so I implemented a “pooling” strategy.

In a nutshell, now Castalia’s structural highlighting sort of has its own special memory manager. When a record in the data structure is freed, instead of releasing the memory back to the actual memory manager, it remains in a “pool” of already-allocated but currently-unused data structures. When a new one is needed, instead of allocating one through the memory manager, the next empty one from the pool is used. If the pool is “empty,” then more records are allocated by the real memory manager and thrown into the pool.

Unfortunately, I seem to have misplaced the spreadsheet that had the performance numbers for this improvement, so I can’t tell you what the improvement was exactly, but I remember that it was significant. If I find that document in the future, I’ll update this post with the right numbers.

Navigation Toolbar Repopulation

This was the huge one. While it seemed at first glance that the big bottleneck in performance was the parser, it was, in fact the navigation toolbar. Both the scroll wheel and keystroke response were being affected by this (though the keystroke response issue was only noticeable if you typed fast – considerably faster than I usually do while coding).

The culprit was repopulating the dropdown boxes for classes and methods. In a unit of significant size, an extremely fast typist, or when scrolling, there would be a noticeable “pause” every few keys or every few scroll lines while the navigation toolbar updated.

There were several fixes (and a few missteps) that contributed to the resolution of this problem:

First, caching the data: There’s no need to repopulate the toolbar if nothing has changed. This largely eliminated the scrolling issue, but not the keystroke issue, since a keystroke would usually trigger a change, which would render the cache invalid, triggering the slow repopulation.

Next, get rid of the timer that was periodically looking to see if it needed to update. Instead, make the update event-driven. This is basically a tweak on the caching fix above, so it suffers from the same problem: The problem only occurs when the data is in a state of change anyway.

Now for a misstep: The obvious instinct here is to make the repopulation faster. Faster sorting, build a faster data structure, etc… Part way into this effort, I realized that There’s no need to populate the dropdowns every time the file changes. Most of the time, they only need to show the current class and method, and I could rewrite them to do that without having to fully repopulate all of the data. So…

New plan. Don’t REpopulate the dropdowns at all. Now, the dropdowns are only populated with all of their items on demand, when they’re dropped down. I ended up writing a new dropdown control specifically for this (they were just TComboBoxes before). The new navigation toolbar doesn’t keep a data structure of all the classes/methods in your file. It only knows the name of the current class/method, and then only populates the data in those dropdowns when you click on them.

THEN, with the improved dropdown in place, I made the population very fast, using some very different techniques than I would have used if I had just been trying to make the old TComboBoxes repopulate faster. (I also wrote some major search improvements along the way, since I was rewriting the control anyway).

The final performance improvement with the navigation toolbar changes was HUGE. For many cases, the navigation toolbar now refreshes so fast that its new performance can’t even be measured as a fraction of the old time (that is, what used to take 400ms now takes <1ms). That’s at a big improvement, right?

Final Thoughts

Performance optimization can be both frustrating and fun. Next time you’re tasked with an optimization project, remember a few key things:

  • There is important value in both qualitative and quantitative data. Quantitative data lets you measure and improve, but it’s the qualitative that makes users happy.
  • When it comes to performance optimization, never assume that the bottleneck is in any particular place. I was surprised over and over again when I found the real issues. I thought I would spend most of my time improving the parser, and that turned out not to be an issue at all. Meanwhile, the biggest issue was the navigation toolbar, which I didn’t even consider as a possible culprit when I first started doing this.
  • Sometimes the performance problem isn’t because your code is slow, but because your architecture is less than ideal in the first place (for example, the navigation toolbar repopulating with data that will never be seen).

Since I started this off with the qualitative comments of Castalia users, I’ll finish with a quote from an email I got just a few days ago, from a long-time Castalia user who said….

“The new version is even ‘Snappier’!!!! Great job!” – Raymond S., Castalia user since 2008

Come try the new, faster-than-ever-before Castalia at


Read full story · Comments { 5 }

Announcing Castalia 2013, with Smart Symbol Search

I’m very excited to announce the release of Castalia 2013, which is quite simply the best Castalia I’ve ever produced.

Smart Symbol Search

First, Castalia 2013 includes a brand new code navigation tool called “Smart Symbol Search.” Smart Symbol Search opens a window in which you can type all or part of the name of ANY class name, method name, field, property, or variable, and navigate quickly to the declaration of that symbol in your code.

Smart Symbol Search

Here’s my favorite part: Smart Symbol Search recognizes your “camel case” symbol names, so if you have a method called FormCreate, you just type “fc,” and Smart Symbol Search will take you to the FormCreate method. It works with underscores too. To go to the declaration of a field called number_of_items, just type “noi.”

Smart Symbol Search is available in Castalia Professional and Castalia Suite (see below).

For more details about Smart Symbol Search, visit


Performance Improvements

Next, Castalia 2013 has some HUGE speed improvements in structural highlighting and the navigation toolbar. For people who have used Castalia before, you need to see it to believe it.

Fixed Bugs

Castalia 2013 has a number of bugfixes. Here are the major ones:

  • Fixed: Memory Leak in Structural Highlighting that can cause Delphi to crash when editing very large files
  • Fixed: Free-floating text search toolbar causes Access Violation if closed with no files open
  • Fixed: Access Violations when closing Delphi under certain circumstances
  • Fixed: Order of structural highlighting colors in options doesn’t match its use in the code editor
  • Fixed: Access Violation when using “Rename Method” refactoring under certain conditions
  • …and several more smaller fixes.

New Editions

Starting with Castalia 2013, Castalia is available in three editions: Castalia Essentials, Castalia Professional, and Castalia Suite.

Castalia Essentials includes Castalia’s famous Advanced Syntax Highlighting, a basic Navigation Toolbar, Modeless Text Search, and Smart Keys.

Castalia Professional includes everything in Castalia Essentials, and adds the new Smart Symbol Search. Code Templates, IDE Macro Scripting, Selection Expansion, Live Syntax Checking, Favorites, AutoSave, the Split Editor, Inline Variable Declaration, Automatic Line Wrap, Automatic Case Correction, and Project Statistics.

Finally, Castalia Suite includes all of the above, plus Refactoring and Code Metrics tools.

For Current Customers

Current Castalia users who have licenses for the pre-2013 Castalia Essentials have been upgraded to Castalia Professional, effective immediately.

Try Castalia 2013 Today

Download Castalia 2013, the smarter code editor for Delphi Programmers:

P.S. Did I mention that Castalia 2013 has improved memory usage too? I wrote a new code synchronization subsystem that manages memory better, which is just one more reason why this is the best Castalia I’ve ever released. Enjoy!

Read full story · Comments { 0 }

For programmers, by a programmer

Hi. My name is Jacob, and I'm the creator of Castalia.

I starting programming in 1986, learning Lightspeed Pascal on a Mac Classic. Today, I'm a professional programmer, teacher, and entrepreneur.

I have a Master's Degree in Computer Science, and I still love Pascal and Delphi.

I believe that writing code is the heart and soul of software development, and I love helping programmers write code more effectively.