/ java

Lightning - The Lightning Fast Java Serializer

It's been a pretty long time since the last blog entry. A lot of things happened holding me back from posting updates.

Hopefully this time has gone and I started a new project.

Currently I'm working on cleaning up the API for my serializer to define serializable classes and members, prior to release the first milestone.

But before asking my actual question I think there are some things needs to be clarified about Lightning.

Lightning is a highspeed, low latency serializer for value- or transportobjects (or whatever you want to call them). So it's not to be used for all cases as the standard serialization of Java but since most transferable objects can be implemented as plain valueobjects you can use Lightning for most cases where you want to serialize / transfer data.

A second important point is another "workaround" of a problem many people find when using standard serialization. Problems with (de-)serialization often happen hours, maybe days after startup of servers: problems when reading data using incompatible classversions. Lightning uses a ClassDefinitions to hold informations about serializable properties and distributes them for example inside a cluster. So the masternode transfers it's own ClassDefinitionContainer over to newly connecting nodes and the new node just tries to compare it with his container. If there's a non matching class definition the node can be disconnected before corrupting the clusterdata.

This means Lightning uses a Fast-Failing approach for being sure that class definitions will be consistent.

The most visible difference between Lightning and the standard serialization would be that you have to define which properties will be serialized. By default non of the class members will be serialized compared to standard serialization where all "non-transient" members are serialized by default.

I started Lightning as a Proof-Of-Concept implementation due to my old employer had the need for a really fast, simple to configure and low bytesize (for the datastream) serialization for distributing valueobject inside a JGroups featured cluster.

The projectname was choosen as an incentive to tell myself "if the name does not match the behavior, forget about it". I was impressed that the name seems to match it in any way as shown by small benchmarks (yeah I know, microbenchmarking ;-)):

Lightning Serializer build time: 199 ms
Lightning Serialization Avg: 1023,97 ns, runs: 800000, size: 40 bytes
Lightning Deserialization Avg: 1097,72 ns, runs: 800000, size: 45 bytes

Java Serialization Avg: 4069,38 ns, runs: 800000, size: 375 bytes
Java Deserialization Avg: 20290,70 ns, runs: 800000, size: 375 bytes

In this example you can see the difference in bytesize (the small differences belongs to implementation of the microbenchmark to prevent HotSpot optimizations) and in terms of speed.

What is "build time"? How does Lightning works internally? And last but not least why is Lightning that fast?

That's possibly the three questions that comes most people into mind by reading the benchmark results.

Let's start by a question that can even be answered by other serializers (like Kryo) "why is the bytestream that small?". This question can shortly be answered in two sentences. As mentioned above Lightning uses a ClassDefinitionContainer to know about the internals of the bytestream, so that nearly nothing beside real values needs to be in the stream. In addition to that, every class in the ClassDefinitionContainer has a unique ClassId so canonical classnames are only distributed when transfering the ClassDefinitionContainers.

The first of the three questions I want to move to the end of the description since it'll answer itself :-)

The other two questions "How does Lightning works internally?" and "Why is Lightning that fast?" can be answered together. Lightning uses a lot of tricks to speed up the serialization process which can be categorized by 4 categories.

  1. Lightning knows different strategies to serialize objecttrees.
    This means Lightning can be instructed to either know about multiple references to the same objects in an objecttree (and only serializes them once) or to ignore such multi-references objects resulting in possibly serialize them multiple times.

  2. Lightning uses Bytecode-Generation.
    After the definitions of serializable class members Lightning switches to a buildphase creating an bytecode implementation of every marsheller. This marshallers does not use any reflection and can be HotSpt-optimized later on.

  3. Lightning uses direct memory acces using sun.misc.Unsafe.
    Normally, on supported runtime environments, Lightning uses sun.misc.Unsafe to support direct memory access for properties. This means reading / writing directly from / to the memory areas of the objects in the heap. By using this technique a lot of memory- and boundchecks can be prevented resulting in a much higher speed (this access is even faster that accessing direct ByteBuffers - ByteBuffer.allocateDirect(...)). Even when using a class called Unsafe sounds pretty messy a lot of the internal implementation on SUN / Oracle JVMs using this class for high performance accesses and to prevent reflective access.

  4. Lightning uses PropertyAccessor classes.
    This is an extension to the systems described in point 2 and 3. Inside the buildphase all properties will be wrapped into such a PropertyAccessor to optimize access and prevent later request by the Java Security-System. In addition to this different kinds of access methods like reflection (on non Unsafe supporting environments) or bytecode access can be hidden and normalized.

I think this should be enough as a first introduction into Lightning. If there are any questions or remarks feel free to tell me.

For people who are read to that point and being interested in Lightning, the project is released using Apache License 2 on Github.
It would be nice to find more attendees being interested in implementation or extending Lightning.

Now my initial question:
As I mentioned above I'm working on cleaning up the API to define which class members should be serialized. To define them Lightning uses SerializerDefinitions, can be compared to Modules in Guice, and a Fluent-API to define properties. It would be fine if you could have a look over the new API to see if all fluent combinations are easy to understand and intuitionally. If there are any remarks or improvement proposals on the Fluent-API please let me know.

public class ExampleSerializerDefinition 
		extends AbstractSerializerDefinition {

	protected void configure() {
		// Define serializable class using custom
		// implementation of Marshaller
		serialize(Bar.class).using(new BarMarshaller());
		// Define serializable class using annotated members
		// or methods (by usage of Lightning's
		// @com.github.lightning.metadata.Attribute annotation)
		serialize(Foo.class).attributes().excludes("value1", "value2");

		// Define serializable class using annotated members / methods
		// (by usage of custom annotation)
		serialize(Foo.class).attributes(Attribute.class).excludes("value1", "value2");

		// Define serializable class using custom definition of properties
			property("value").using(new SomeSpecialIntegerMarshaller())

		// Define serializable class using a different implementation
		// of PropertyFinderStrategy
		serialize(Foo.class).using(new FooPropertyFinderStrategy());

		// Install child definition
		install(new SomeChildSerializerDefinition());