Getting started with protobuf (Protocol Buffer)

Lizen Shakya
5 min readJan 31, 2021

Protocol Buffer aka protobuf is the most commonly used IDL (Interface Definition Language) for gRPC. It is a high-performance, compact binary wire format invented by Google who uses it internally so they can communicate with their internal network services at a very high speed.

Why use protobuf instead of JSON and XML?

JSON and XML are the most commonly used to send and receive messages in the REST API and RPC method. Out of this JSON is the most popular format as it is flexible, efficient, platform-neutral, and human-readable. But for some cases, these formats are not fast enough or lightweight enough when transmitting data between the systems. Mainly, XML to serialize message requests but they big, bloated, and slow to parse.

When you serialize or encode a protobuf, it is converted into binary format, which is significantly smaller than even JSON.

In addition, it is much faster to parse and encode and we can also create strongly typed objects, making them easier to work with.

Other advantages includes:

  • less ambiguous with explicit data types
  • smaller
  • faster
  • Serializes and deserializes structured data to communicate via binary.
  • As a highly-compressed format, it doesn’t achieve JSON’s level of human-readability.

What is in a proto file…

The first step when working with protocol buffers is to define the structure for the data you want to serialize in a proto file: this is an ordinary text file with a .proto extension.

The first line when working with protocol buffers is to define the version of the syntax.

syntax = "proto3";

It specifies that you are using proto3 syntax, which is the latest proto version else the protobuf compiler will assume you are using proto2. This must be the first non-empty, non-comment line of the file.

You can add an optional package specifier to a .proto file to prevent name clashes between protocol message types.

package foo.bar;

Here, foo.bar is the package name, which you can define.

Then, the next step will be defining the messages. Protocol buffer data is structured as messages, where each message is a small logical record of information containing a series of name-value pairs called fields.

A scalar message field can have different types to specify on .proto file and the corresponding type in the automatically generated class. More can be found on the link.

message Person {
string name = 1;
int32 id = 2;
bool isEmployed = 3;
}

Then, once you’ve specified your data structures, you use the protocol buffer compiler protoc to generate data access classes in your preferred language(s) from your proto definition. These provide simple accessors for each field, like name() and set_name(), as well as methods to serialize/parse the whole structure to/from raw bytes. So, for instance, if your chosen language is C++, running the compiler on the example above will generate a class called Person. You can then use this class in your application to populate, serialize, and retrieve Person protocol buffer messages.

As this is in the form of a contract, both the client and server need to have the same proto file. The proto file acts as the intermediary contract for client to call any available functions from the server.

Service definition

If you want to use your message types with an RPC (Remote Procedure Call) system, you can define an RPC service interface in a .proto file and the protocol buffer compiler will generate service interface code and stubs in your chosen language. So, for example, if you want to define an RPC service with a method that takes your HelloRequest and returns a HelloResponse, you can define it in your .proto file as follows:

service HelloService {
rpc SayHello (HelloRequest) returns (HelloResponse);
}

Protobuf compiler

To generate the Java, Python, C++, Go, Ruby, Objective-C, or C# code you need to work with the message types defined in a .proto file, you need to run the protocol buffer compiler protoc on the .proto

Installation of protoc (protobuf compiler)

Linux, using apt or apt-get, for example:

$ apt install -y protobuf-compiler
$ protoc --version # Ensure compiler version is 3+

macOS, using Homebrew:

$ brew install protobuf
$ protoc --version # Ensure compiler version is 3+

The protocol buffer compiler, protoc, is used to compile .proto files, which contain service and message definitions and will generate as output, source files according to the configured language by its arguments, in this case, js.

protoc --proto_path=protos --js_out=import_style=commonjs,binary:build/ fileName.proto

By default, the compiler generates code with Closure-style imports. If you specify a library option when running the compiler, the compiler creates a single .js file with your specified library name. Otherwise the compiler generates a .js file for each message in your .proto file. The names of the output files are computed by taking the library value or message name (lowercased), with the following changes:

  • A .js the extension is added.
  • The proto path (specified with the --proto_path= or -I command-line flag) is replaced with the output path (specified with the --js_out= flag).
  • fileName.proto is the name of the proto file to compile

Getting it all together

Let us define a proto file login.proto

syntax = "proto3";
message Login {
string userName = 1;
string password = 2;
}

Now let us compile this with proto compiler

$ protoc --js_out=import_style=commonjs,binary:. login.proto

protoc has generated login_pb.js from login.proto for you. Now you can use them anywhere you want, like this:

// Serialization
const pb = require('./login_pb')
const data = { userName: 'Issac', password: 'Newton' }
var msg = new pb.Login();
msg.setuserName(data.userName)
msg.setPassword(data.password)
// Deserialization
const bytes = msg.serializeBinary();
const msg2 = pb.Login.deserializeBinary(bytes)
console.log(msg2.getStatus(), msg2.getMessage())

The serialized data you got is UInt8Array .

Summary:

Protobuf is an ideal format for data serialization. It’s much smaller than JSON and allows for the explicit definition of interfaces. It will quickly pay off if we invest small amount of time into it.

Thank You :)

References:

--

--