Announcing the Beta of DataSonnet – The Open Source Data Mapper

For some time ModusBox has been looking at options for an open source data transformation technology for use within our platform, PortX Integration Hub.

Integration Hub can be thought of as a virtual ESB thanks to its ability to do general-purpose integration processing at scale. “Virtual” because the user interacts with a browser-based application and works at a simple abstraction layer. In previous posts we’ve talked about Integration Hub’s features and how it can work with almost any full-featured ESB (Mule ESB, Camel, WSO2, etc.).

Mapping is at the heart of integration

Transformation is a fundamental capability of an ESB. And for good reason.

Transformation of data is crucial in the movement of data and transactions between one system and another. Just like ensuring two people who speak different languages can share their meaning, translation of data allows different systems to share information.

There are data transformation tools out there. Unfortunately, most of them are proprietary and remove a developer or an organization’s flexibility. To use today’s proprietary transformation tools either locks a company into a specific solution or makes it cost-prohibitive for some developers to access these tools.

Many integration developers resort to building these translations in code, which in some ways goes against the very goals of modern integration strategy, abstraction and reusability. Largely, it’s because they don’t know there is a better way, or simply don’t have access to the right tools.

In continuing to evolve Integration Hub we saw the need for a data transformation tool that is fit for integration scenarios and portable across any ESB. In fact, our goal is beyond the typical ESB, to support a broad range of data transformation scenarios.

The result? We’re introducing our open source data transformation tool, DataSonnet.

Functional goals

While the development of the right tool wasn’t simple, our goals were straightforward:

  • The tool needed to handle the full complexity of transformation of hierarchical message formats (such as XML/EDI/etc.).
  • It needed to to support visual mapping tools for a more user-friendly experience for non-developers.
  • It needed to provide long-term ownership and portability of transformations, specifically to eliminate vendor lock-in.

The point about avoiding vendor lock-in is a big one for us. From what we’ve seen, data transformation is the grunt work of integration, often more than 50% of the actual effort. When you have all of that time and effort invested in encoding the transformation rules for your organization using a proprietary transformation tool, it becomes a ball and chain tying you to that platform. This makes it very costly to consider switching to another integration platform, even if the features or price point of your current platform no longer serve your needs.

Technical design goals

Those functional goals led to the technical design goals used in the development of DataSonnet:

  • Open source – eliminate the vendor lock-in that proprietary mapping tools create
  • Scriptable mapping code, i.e. can be saved and loaded as a text file and run without compiling
  • Needed to support very complex mapping by advanced developers, but simple enough to learn and use by occasional developers for common use cases
  • Must run efficiently in the JVM – since many ESB products are Java-based
  • Include support for other non-JVM languages (especially Javascript)
  • High performance
  • Functional programming style
  • Rich and extensible standard library
  • Not overly XML-centric
  • Support for building and re-using libraries

The creation of an open source data transformation tool

So how did we arrive at creating DataSonnet? As we looked around, we did not find an existing open source mapping technology that met all of our purposes.

We didn’t want to take on inventing our own language from scratch. It would take significant effort and rigor to get it right, and language development is not the core strength of our integration-focused engineering team. The other challenge with creating a new language is justifying to the developer community why they should adopt yet another language.

As we searched we came across an existing language, Jsonnet, that met almost all of our criteria.

Jsonnet is:

  • A data templating language
  • Under the permissive Apache 2.0 license
  • Created by Google
  • A solidly designed language
  • Adopted in the community and supported and used by companies like DataBricks
  • Implemented in C++, Go, and Scala, with support for JavaScript possible

However, JSonnet was originally designed for building dynamic configuration files for infrastructure automation. While the core language is powerful, there were a few things missing to make this a good tool for the kinds of data transformations typically needed in integration.

Datasonnet builds on SJSonnet, an open source implementation of the JSonnet language in Scala, by adding:

  • Conventions for calling JSonnet when transforming data
  • Several additional function libraries useful to typical data transformation tasks and scenarios
  • Tooling to support working on data transformation problems

Let’s look at an example that builds on the Quick Start Tutorial: You’ve got a list of items, each item having an id, a remaining quantity count, and some having secret information. You want to produce a list of all the items that are in stock, with the secret data hidden.


Payload
  {
   ”from”: “Rafael Gómez”,
   ”to”: “李娜”,
   ”items”: [
    {
     ”id”: “A-42″,
     ”remainingQuantity”: 5,
     ”secretInfo”: “DO NOT REVEAL THIS”
    },
    {
     ”id”: “C-1″,
     ”remainingQuantity”: 0
    },
    {
     ”id”: “C-2″,
     ”remainingQuantity”: 3,
     ”secretInfo”: “Top Secret information”
   }
   ]
  }

Mapping
  local overlay = {
   inStock: super.remainingQuantity > 0,
   secretInfo:: null
  };
  local items = [item + overlay for item in payload.items];
  local inStockItems = [item for item in items if item.inStock];
  {
   shipper: payload.from,
   items: inStockItems,
   totalItems: std.length(inStockItems),
   totalRemainingQuantity: std.foldl(function(total, item) total + item.remainingQuantity, inStockItems, 0)
  }


Output
  {
   ”items”: [
    {
     ”id”: “A-42″,
     ”inStock”: true,
     ”remainingQuantity”: 5
    },
    {
     ”id”: “C-2″,
     ”inStock”: true,
     ”remainingQuantity”: 3
    }
   ],
   ”shipper”: “Rafael Gómez”,
   ”totalItems”: 2,
   ”totalRemainingQuantity”: 8
  }

In this short example, several powerful aspects of the language are demonstrated:

  • Using local variables to reuse transformation transformations
  • Hiding data
  • Combining objects
  • Transforming objects in an array
  • Filtering an array
  • Using JSonnet Standard Library functions

Be sure to go through the Quick Start Tutorial to understand these features better and start to build your own transformations with DataSonnet.

We believe we’ve come up with a solution that is powerful enough to handle even the most complex transformation tasks that come up in integration solutions, portable enough that you can keep your transformations as you evolve your infrastructure choices, and easy enough to learn and use, even for occasional developers. It is built on an existing, established language that makes it reasonable for other developers to use, adapt, and contribute to its development. We are building our browser-based drag-drop transformation tooling for PortX on this solid foundation.

This is an early Beta, there is a lot more coming. Please give DataSonnet a try, give us feedback, join us in contributing to DataSonnet, and participate in the community.

Learn more about DataSonnet:
DataSonnet website
Quick Start Tutorial
Cookbook