Protocol Buffer Message Reflection API in C#

TL;DR

This blog post shows how to replace .NET reflection with the functionality available on the IMessage interface, when dealing with C# objects generated using Protobuf's protoc compiler.

IMessage.Descriptor is the way to access Protobuf Reflection API which allows you to inspect your Protobuf object's meta-data and access information about types, properties and their values, with less reliance on the more expensive .NET reflection.

In the end, a benchmark test is done which shows using IMessage is a lot faster, compared to .NET reflection, since the metadata accessed using IMessage is generated during compile time and made available on the object itself.

How does Protobuf Reflection API work?

Let's see the protoc output of two simple messages below defined in a greeter.proto file:

syntax = "proto3";

message HelloRequest {
  string name = 1;
}

message HelloReply {
  string message = 1;
}

The output of $./protoc --csharp_out=./ greeter.proto are 3 classes:

  1. public sealed partial class HelloRequest

  2. public sealed partial class HelloReply

  3. public static partial class GreeterReflection

Like you probably already guessed based on the name, it is the GreeterReflection class that contains all the logic for reflection. The protoc compiler will generate 1 such reflection class per .proto file:

public static partial class GreeterReflection
{
    public static FileDescriptor Descriptor
    {
        get { return descriptor; }
    }
    private static FileDescriptor descriptor;
    static GreeterReflection()
    {
        byte[] descriptorData = System.Convert.FromBase64String(
            string.Concat(
              "Cg1ncmVldGVyLnByb3RvIhwKDEhlbGxvUmVxdWVzdBIMCgRuYW1lGAEgASgJ",
              "Ih0KCkhlbGxvUmVwbHkSDwoHbWVzc2FnZRgBIAEoCWIGcHJvdG8z")
            );

        descriptor = FileDescriptor.FromGeneratedCode(
            descriptorData,
            new FileDescriptor[] { },
            new GeneratedClrTypeInfo(null, null, new GeneratedClrTypeInfo[] {
                new GeneratedClrTypeInfo(
                    typeof(HelloRequest), HelloRequest.Parser, 
                    new[]{ "Name" },
                    null, 
                    null,
                    null, 
                    null
                ),
                new GeneratedClrTypeInfo(
                    typeof(HelloReply), 
                    HelloReply.Parser, 
                    new[]{ "Message" }, 
                    null, 
                    null, 
                    null, 
                    null
                )
            }));
    }
}

But the class does not only contain the reflection logic, it also contains the actual data. You can see two Base64 strings hardcoded and you can see property names Name and Message being passed to the GeneratedClrTypeInfo constructor (all the properties for every message defined in the greeter.proto file).

What is inside the Base64 strings? Let's try to reverse engineer and see what we get:

var data = System.Convert.FromBase64String
(
    string.Concat
    (
        "Cg1ncmVldGVyLnByb3RvIhwKDEhlbGxvUmVxdWVzdBIMCgRuYW1lGAEgASgJ",
        "Ih0KCkhlbGxvUmVwbHkSDwoHbWVzc2FnZRgBIAEoCWIGcHJvdG8z"
    )
);
FileDescriptorProto desc = FileDescriptorProto.Parser.ParseFrom(data);

The string representation of the FileDescriptorProto object is shown below:

{
    "name": "greeter.proto",
    "messageType": [
        {
            "name": "HelloRequest",
            "field": [
                {
                    "name": "name",
                    "number": 1,
                    "label": "LABEL_OPTIONAL",
                    "type": "TYPE_STRING"
                }
            ]
        },
        {
            "name": "HelloReply",
            "field": [
                {
                    "name": "message",
                    "number": 1,
                    "label": "LABEL_OPTIONAL",
                    "type": "TYPE_STRING"
                }
            ]
        }
    ],
    "syntax": "proto3"
}

The above data is eventually used to construct a MessageDescriptor collection which contains descriptors for all the messages in the .proto file. The order of descriptors in the collection is the same as the order they are declared inside the .proto file.

Each message class then implements the generic IMessage<T> interface (which in turn implements its non-generic IMessage counterpart). HelloRequest message is defined first, so the implementation of the Descriptor property would need to return the first descriptor and implementation for HelloReply the second one:

public sealed partial class HelloRequest : IMessage<HelloRequest>
{
    /* Rest of the class is omitted for brewity */
    public static MessageDescriptor Descriptor
    {
        get { return GreeterReflection.Descriptor.MessageTypes[0]; }
    }
    MessageDescriptor IMessage.Descriptor
    {
        get { return Descriptor; }
    }
}
public sealed partial class HelloReply : IMessage<HelloReply>
{
    /* Rest of the class is omitted for brewity */
    public static MessageDescriptor Descriptor
    {
        get { return GreeterReflection.Descriptor.MessageTypes[1]; }
    }
    MessageDescriptor IMessage.Descriptor
    {
        get { return Descriptor; }
    }
}

As the official demo shows, we can enumerate and retrieve the values of HelloRequest object properties by relying only on the IMessage interface:

IMessage message = new HelloRequest() { Name = "Bob" };
var descriptor = message.Descriptor;
foreach (var field in descriptor.Fields.InDeclarationOrder())
{
    Console.WriteLine(
        "Field {0} ({1}): {2}",
        field.FieldNumber,
        field.Name,
        field.Accessor.GetValue(message));
}

Use case: ASP.NET gRPC server side & Interceptors

Let's say all of your proto-message contracts define a Header message as the first field and you would like to access this information inside your Interceptor pipeline (or inside your MediatR Pipeline Behaviour):

syntax = "proto3";
service Greeter {
  rpc SayHello (HelloRequest) returns (HelloReply);
}
message HelloRequest {
  Header header = 1;
  string name = 2;
}
message HelloReply {
  Header header = 1;
  string message = 2;
}
message Header {
  string id = 1;
}

We don't know anything about the types in the below context. We only know that TRequest and TResponse both implement IMessage and that they both contain a property which holds an object of type Header declared as the first field, so let's use that knowledge to extract the Header value:

public class MyInterceptor : Interceptor
{
    public override Task<TResponse>
        UnaryServerHandler<TRequest, TResponse>(
            TRequest request,
            ServerCallContext context,
            UnaryServerMethod<TRequest, TResponse> continuation)
    {
        var message = request as IMessage;
        var header = (Header) message
                .Descriptor
                .Fields
                .InDeclarationOrder()[0]
                .Accessor
                .GetValue(message);
        Console.WriteLine(header.Id);
        return base.UnaryServerHandler(request, context, continuation);
    }
}

Benchmarking against .NET Reflection

Let's now benchmark different ways of extracting the Header object using:

  1. .NET Reflection

  2. IMessage Reflection

  3. .NET Reflection with caching*

  4. IMessage Reflection with caching*

*Caching is simply done by storing the PropertyInfo or IFieldAccessor in a static constructor context, since they are immutable and won't change at runtime.

The benchmark project is available on GitHub.

|                                        Method |      Mean |
|---------------------------------------------- |----------:|
|            Benchmark_Using_Dot_Net_Reflection | 77.929 ns |
|     Benchmark_Using_Cached_Dot_Net_Reflection |  7.734 ns |
|                 Benchmark_IMessage_Reflection | 11.651 ns |
|     Benchmark_Using_Cached_IMessage_Reflection|  3.218 ns |

Summary

Even though this is a micro-optimization, it is something worth considering if you are running .NET reflection on your hot path; for example in a gRPC interceptor.