Comparing Regex.Replace with explicit string operations. category 'perfo test', language C#, created 26-Jun-2009, version V1.0, by Luc Pattyn |
License: The author hereby grants you a worldwide, non-exclusive license to use and redistribute the files and the source code in the article in any way you see fit, provided you keep the copyright notice in place; when code modifications are applied, the notice must reflect that. The author retains copyright to the article, you may not republish or otherwise make available the article, in whole or in part, without the prior written consent of the author. Disclaimer: This work is provided “as is”, without any express or implied warranties or conditions or guarantees. You, the user, assume all risk in its use. In no event will the author be liable to you on any legal theory for any special, incidental, consequential, punitive or exemplary damages arising out of this license or the use of the work or otherwise. |
The Regex
class is a powerful tool for performing string operations; however I have always been
suspicious as to its performance level. This little experiment compares several ways of stripping a set of characters
from a given string.
The environment used is the Microsoft .NET Framework (version 2.0 or above) and the C# programming language.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;
namespace RegexTest {
class Program {
private static int REPEAT=1000000;
private static Stopwatch sw=new Stopwatch();
private static List<string> logs=new List<string>();
static void Main(string[] args) {
string s="3232323 sdsadsd 171617181 sddsddfe 323243";
string sremove="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
sw.Reset();
sw.Start();
for (int i=0; i<REPEAT; i++) Regex.Replace(s, "[a-z]", "");
sw.Stop();
log("regex(26)="+sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
for (int i=0; i<REPEAT; i++) Regex.Replace(s, "[A-Za-z]", "");
sw.Stop();
log("regex(52)="+sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
for (int i=0; i<REPEAT; i++) Regex.Replace(s, "[A-Za-z!@#]", "");
sw.Stop();
log("regex(55)="+sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
Regex regex=new Regex("[A-Za-z!@#]", RegexOptions.Compiled);
for (int i=0; i<REPEAT; i++) regex.Replace(s, "");
sw.Stop();
log("regex(compiled-incl)="+sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
Regex regex2=new Regex("[A-Za-z!@#]", RegexOptions.Compiled);
for (int i=0; i<REPEAT; i++) regex2.Replace(s, "");
sw.Stop();
log("regex(compiled-excl)="+sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
for (int i=0; i<REPEAT; i++) RemoveLetters(s);
sw.Stop();
log("for+isalpha="+sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
for (int i=0; i<REPEAT; i++) RemoveLetters(s, sremove);
sw.Stop();
log("for+IndexOf="+sw.ElapsedMilliseconds);
File.WriteAllLines("RegexTest.txt", logs.ToArray());
log("Done (hit ENTER to exit)");
Console.ReadKey();
}
public static void log(string s) {
Console.WriteLine(s);
logs.Add(s);
}
public static string RemoveLetters(string original) {
StringBuilder sb=new StringBuilder();
for (int i=0; i<original.Length; i++) {
if (!char.IsLetter(original, i)) sb.Append(original[i]);
}
return sb.ToString();
}
public static string RemoveLetters(string original, string remove) {
StringBuilder sb=new StringBuilder();
foreach (char c in original) {
if (remove.IndexOf(c)<0) sb.Append(c);
}
return sb.ToString();
}
}
}
This is what got logged, all times are in milliseconds:
regex(26)=8620
regex(52)=8698
regex(55)=8820
regex(compiled-incl)=5845
regex(compiled-excl)=5751
for+isalpha=1158
for+IndexOf=3500
Removing all letters is five or more times faster with explicit code based on IsAlpha
;
removing an arbitrary set of characters using IndexOf
is some two times faster than
any regex attempt, even the compiled one that did not measure the constructor time.
When performance is key, I will consider writing some C# code rather than using Regex
unless the job
at hand is sufficiently complex to benefit from its expressive power.
Perceler |
Copyright © 2012, Luc Pattyn |
Last Modified 02-Sep-2013 |