Modern C++ as a Better Compiler

Last week I made the case that Standard C++ offers both productivity and performance and showed how C++ can be just as concise and elegant as C# and indeed more so. But I didn’t really address performance so let’s do so now. It’s trendy today to refer to your platform or framework of choice as being native. Everyone’s native these days. Even managed code is native. It wasn’t long ago that such a statement would have been greeted with incredulity but today it seems as if the marketing folks have hijacked the word and we’re all a happy native family. But performance doesn’t lie and when it comes to native code there’s nothing quite like Standard C++.

Here’s a question for you. Can a library outperform a compiler? With all of the talk about native code generation it’s helpful to remember that modern C++ is in many ways a code generator. The C++ community is literally bursting with impressive examples where smart C++ developers are coming up with ways of programming C++ at compile time in what is often called metaprogramming or generic programming.

But how can a library outperform a compiler? Standard C++ attempts to be the best possible language in which to write very efficient libraries. While parts of the language may be complex, libraries written in C++ should be easy to use. Modern C++ for the Windows Runtime is just such a library.

But how can a library outperform a compiler?! Doesn’t a library still need to be compiled? Well let’s run a little experiment and the Windows Runtime provides a good environment in which to compare libraries and compilers. You see, the Windows Runtime defines a binary platform that’s intended to be projected into different programming languages. When you write a Windows app in C# you are using the language projection provided by the C# compiler. The same goes for JavaScript and other language projections that I’ve heard of. But when it comes to C++ it’s a different story. Standard C++ is a different kind of language. Although it offers multiple programming paradigms it has a certain bias toward systems programming. This is why it’s so popular among operating system developers. It’s also not the kind of language that can support a language projection directly via the compiler unless you go and change the C++ language itself. This is precisely what the Visual C++ compiler attempts to do with its C++/CX language extension.

But that’s not how C++ was meant to be used. If the language doesn’t provide what you need then you can write a library. There’s no need to invent a new language or change the fundamental structure of the C++ language itself. Still, because the Visual C++ compiler offers up a compiler-based implementation of a Windows Runtime language projection we can now go ahead and compare the compiler’s performance against that of a Windows Runtime language projection implemented as a library using only Standard C++.

I’ll begin with a simple Windows Runtime component that offers up a class called Sample with a single static property returning an IVectorView of strings. An IVectorView is just a read-only vector with a portable ABI that is understood by different language projections. Using Modern C++ for the Windows Runtime I’m left simply having to implement this Strings method, which represents that static Strings property within the component:

class SampleFactory : public SampleFactoryT<SampleFactory>
{
public:
    
    IVectorView<String> Strings()
    {
        // code goes here
    }
};

Since I’m implementing this component in modern C++, I can use whatever modern or standard libraries that I’m most familiar with as a C++ developer. Let’s use a few standard containers to build a really big vector of strings:

vector<String> values;
wstring const value = L"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

for (unsigned i = 0; i != 10000000; ++i)
{
    values.emplace_back(value.c_str(), i % value.size() + 1);
}

I’m using the standard string to act as a sort of template and then I’m filling a vector of Windows Runtime strings with a range of values that look something like this:

A
AB
ABC
ABCD
ABCDE
ABCDEF
ABCDEFG
...

And this will simply repeat until the vector contains ten million strings. So I have a really big collection of strings. I now need to pass it back to the caller and I can do that by simply wrapping it inside an implementation of IVectorView as follows:

return VectorView(move(values));

So my component’s static property boils down to this:

IVectorView<String> Strings()
{
    vector<String> values;
    // pump full of values ...
    return VectorView(move(values));
}

The VectorView function will create a COM object that implements the necessary interfaces such that this standard vector of strings may be transported across ABI boundaries very efficiently.

With the component implemented I can return to app development and see what this looks like from various language projections. First let’s look at C#:

var timer = new Stopwatch();
IReadOnlyList<string> strings = Sample.Strings;
long sum = 0;
timer.Start();

foreach (string s in strings)
{
    sum += s.Length;
}

timer.Stop();
m_sum = sum;
m_elapsed = (long)timer.Elapsed.TotalMilliseconds;

I call Sample.Strings to retrieve the collection of strings and then use a C# foreach statement to iterate over the collection. The sum is just a sanity check to confirm that each implementation walked the same collection of strings. In this case I’m using the .NET Framework’s Stopwatch class to measure how long it takes to iterate over the collection.

This is a very simple and unscientific test but it should give us a good idea of how efficiently each language projection can iterate over a collection expressed in terms of Windows Runtime collection interfaces. There’s a lot of memory involved and a lot of virtual function calls. A language projection is going to have to be very careful to manage this efficiently, but I’m sure any decent compiler can figure it out.

I ran a release build of this C# version a few times and it consistently calculated a sum of 264999712 characters in around 2619 milliseconds. Now let’s take a look at C++/CX:

IVectorView<String ^> ^ strings = Sample::Strings;
long long sum = 0;
auto start = Now();

for (String ^ s : strings)
{
    sum += s->Length();
}

m_elapsed = Elapsed(start);
m_sum = sum;

In this case I’m using a pair of functions that use the operating system’s high resolution performance counter to measure milliseconds. Other than that, the samples are equivalent, the sums match, but the elapsed time is around 628 milliseconds. And finally we come to the standard C++ approach:

IVectorView<String> strings = Sample::Strings();
long long sum = 0;
auto start = Now();

for (String const & s : strings)
{
    sum += s.Length();
}

m_elapsed = Elapsed(start);
m_sum = sum;

Here again you’ll notice that the Strings property is projected as a method and it returns a vector view of strings without any hats. From a performance perspective, the ‘const &’ in the range-based for statement is purely a matter of style and convention and the omission of which would make no difference at run time. Again the sums match, but the elapsed time is even faster at 447 milliseconds!

perf

Can a library outperform a compiler? It’s perhaps a bit of a philosophical question but it’s clear that the C++ compiler is insanely good at optimizing Standard C++. The library developer is also in the driver’s seat and is able to optimize everything from resource management, algorithms, iterators and adapters, and so much more. Clearly C# does not provide ‘native’ performance. Although C++/CX gets you a lot closer it does so by trading productivity and you lose the essence of the C++ language. I could go on to explain why C# is so much slower but the bottom line is that only Standard C++ allows you to do anything about it. And that’s the point. If you’re using C# or C++/CX you’re at the mercy of the compiler. Only Standard C++ lets you go beyond the compiler. Modern C++ for the Windows Runtime is for those of you who love C++ but also want to create Windows apps.