Searching in Delphi Part 1 : Regular Expressions

Being able to find elements in your code quickly and easily is critical to being productive in any IDE. Spend too long looking for things and you start to lose your train of thought. Over the years Delphi has introduced lots of different ways to search your code, some of them simple text-based matching, some of them much more capable search engines that actually understand the structure of your code. However, I regularly meet developers who aren’t aware of many of them, beyond doing a simple search using the Search | Find (Ctrl-F) menu option, or the same across multiple files using Search | Find in Files (Shift-Ctrl-F).

Starting with this post I’m going to try and address some of that. Each post will focus on a different way to find things in your project. I’m going to assume, however, that everyone can use the basic Find and Find in Files functionality, so I won’t cover that. However, I will start by covering one feature of both of those text-based searches that seems to be underused : Regular Expressions.

When doing a text-based search, usually the more specific you can be with your search string, the fewer false matches (ie. matches that you are not actually looking for) you’ll get. Problem is you often either don’t know enough of the exact text around your match, or you’re trying to match multiple lines with different text around your string. So you end up with a not very specific search string, and so many matches you waste a bunch of time trying to find the relevant ones, or worse, miss some relevant ones amongst the deluge.

You’re probably never going to get the perfect result of just the items you’re looking for. It ends up being a trade-off between the time spent creating the search string vs the time spent wading through the results, and often a good enough balance, is, well, good enough. This is where regular expressions can help

Let me give you a concrete example.

The DUnit framework makes heavy use of an interface called ITest. Let’s say you want to find all places in TestFramework.pas where an ITest is passed in as a parameter to a method.

search1 Well, we can start by hitting Ctrl-F, type in ITest and hit enter and as you can see in the screeshot to the left, we get 143 results for just this one unit.

As you should also be able to see, there are a whole bunch of results that aren’t actually what we’re looking for. It’s matching ITestDecorator, ITestListener, even comments. Further, even where it has found ITest, it isn’t just showing us where it’s been used as a method parameter, it’s showing us everything.

So, how can we try to narrow down our search results? Well, we could try searching for “: ITest” and this gets us fewer results, but it still has problems. First, it misses places where there is no space between the colon and ITest. It also has false positives on field and variable declarations of type ITest. We could search on “: ITest)” which removes the field and variable declarations, but we’ve now lost the instances where there are additional parameters in the method signature after the ITest parameter. Just to complicate matters further, what if the parameter is an array of ITest? We’ve missed those too.

OK, now that my strawman is suitably setup, think about how we could more accurately identify just the ITest method parameters. Perhaps if we look for a colon followed by zero or more characters, followed by ITest, followed by zero or more characters, followed by a closing bracket.

I’m a long, long way from being a regex expert, I can fairly easily remember 3 or 4 syntax elements without looking at a reference, beyond that I’m struggling. However, even these few can be pretty powerful. In fact, in this example I only need to remember three things to construct my search:

  • the dot or period character in a regex matches any single character (except line breaks). So c.t would match cat and cut, but not cart.
  • the asterisk character in a regex tries to match the previous token zero or more times. t* would match tt, ttt, etc. Plus you can use them together, so .* tells it to match zero or more instances of any character. So c.*t would match cat, cut and also cart.
  • Lastly, you escape characters that have special meaning in a regex (such as the asterisk and dot above) by using the backslash. Let’s say you actually want to match on a dot, you’d need to specify that as \.

That’s almost everything I remember about regular expressions, but it’s enough for a lot of cool searches.

Back to my example. I said I wanted to search for:

  1. a colon
  2. followed by zero or more characters
  3. followed by ITest
  4. followed by zero or more characters
  5. followed by a closing bracket

Based on what we’ve just discussed, the regex for each of the above would look like:

  1. :
  2. .*
  3. ITest
  4. .*
  5. \)

Note, I had to escape the closing bracket, as it has a special meaning in a regex (I’ve just looked it up as I didn’t remember what it was for. It’s for grouping multiple tokens)

The actual regex looks like this :.*ITest.*\)

search2To use it, you need to enable Regular Expressions in your searches. In the screen shot above it’s not shown, but if you make your edit window wide enough (or click on the >> image next to Case Sensitive above) you’ll get the option to turn on Regular Expressions. (prior to Delphi 2010 it’s a checkbox on the modal dialog that pops up when you hit Ctrl-F). Then just type :.*ITest.*\) into the same search window you normally would.

 

We’ve now got dramatically fewer results (around half). Note this isn’t perfect, but that’s not what my aim is. I was aiming for better search results, not perfect search results. In this example I’ve counted 4 false positives (it still matches parameters who’s type starts with ITest, such as ITestListener), but it seems to cover all the other cases I mentioned, and 69 out of 73 is a much better score than 69 out of 143.

I could no doubt get rid of the remaining false positives, and if you’re keen to try, there’s a good short regex reference and also a tutorial up on http://www.regular-expressions.info. There’s also a list of the syntax supported by the IDE’s regex engine here. However what I have is a pretty good trade-off between results and effort. I often find if I try and make my regexes too complex, I start to spend more time debugging the search string than looking at the result. That might just be my hazy knowledge of regex syntax, however.

How often do I use a regex in my searches? Honestly, not often. However, it’s like a lot of things, once you get comfortable with it, you find yourself using it more often. Next time you find yourself with a search that is returning a very inaccurate result set, maybe it’s worth considering them. I find in those occasions, with very little effort I can find what I’m looking for much more quickly and accurately than otherwise.

Next up, I’ll start looking at some of the other, less well known options you have for searching your source.

2 Comments

  • Just to complete your wonderfull post, if you only want match “ITest” inteface and not ITestListener as you mentioned in your example just check the “Whole Words” option of the serach and it should solve your problem.

    King regards

  • Oups!!!!
    After a better test I should apologize. It don’t solve the problem. Sorry for that. ;-(

Join the Discussion

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">