RapidMiner Extensions

The Custom Operators extension allows us to create RapidMiner operators, and bundle them to extensions inside RapidMiner, without programming. I use this technology to publish some solutions I developed over the years in RapidMiner.

The following extensions are available on the RapidMiner Marketplace, easily installable inside RapidMiner Studio. The rest is coming soon. Just open the Marketplace in RapidMiner Studio and search for the extension’s name to install it.

My tutorial for creating custom operators in the RapidMiner Community

Database Envy

This extension contains operators for things that can be done with SQL in databases, but not easily in RapidMiner.

See the examples in the Community Samples repository in the folder /Community Data Science:

  • Extension Example – Database Envy – Custom Join
  • Extension Example – Database Envy – Window Functions

Apply Window Function

This operator calculates groupwise aggregations and ranks on an example set, based on the RapidMiner-WindowFunctions project.

Expression-based Join

I blogged about this before. Using this operator you can join on inequalities, multiple criteria, using mathematical functions etc.

JSON processing with jq

This is another technique developed earlier and now available in an easy-to use extension. Until now, an additional library had to be installed. Now it is bundled with the extension, so no separate installation is necessary.

The operators can apply jq expressions on an attribute in an example set, or on document objects (these are typically the output of operators like Get Page in web API processes).

GeoProcessing

RapidMiner missed capabilities for processing geographic data for a very long time. My blog series about GIS processing was a start, but the technique again required the installation of large number of modules. These are now bundled in the extension. The following operators are available in the GeoProcessing extension:

  • Read Shapefile: If GIS data are not in a database, they are most likely in a Shapefile. This operator reads the geometries and the attributes from the Shapefile.
  • Coordinates to Geometry: If you have two attributes, X and Y, that characterize points in a coordinate system, you can convert these to a geometry attribute. This attribute can be used with the rest of the operators for things like measuring distances, joining etc.
  • Geometry to Coordinates: Given a geometry attribute, X and Y coordinates are extracted. For points this is straightforward; for other geometries (lines, polygons) the centroid (central point) is extracted. The newly created coordinates can be used for a basic visualization in RapidMiner.
  • Reproject Geometry: Systems storing global coordinates usually return them in the WGS84 (latitude, longitude) format. This is helpful for many tasks, but for working with absolute measures (meter or yard based), we need to recalculate the geometry to an appropriate local coordinate system. This allows functions like length, distance or area to work and return their result in meters or square meters.
  • Calculate measures on geometry: Applies functions like length, area or GeometryType on a geometry attribute and returns the result.
  • Transform Geometry: Creates a new geometry using transformation functions, or determines attributes of the geometry attribute.
  • Calculate Geometry Relation: This operator takes two geometry attributes and applies functions like intersects, contains, overlaps, union etc. on them.
  • Geographic Join: Joins two example sets with geometry attributes using functions like contains, covers, intersects, overlaps etc. This is reasonably fast even with large example sets.

See this tutorial in the RapidMiner Community for a detailed example using many of the operators. An example process is available in the Community Samples repository in the folder Community Data Science.

Technical notes for GeoProcessing

The attributes are nominal (strings), in the Well Known Text (WKT) representation. If you read geometries from a database, use the appropriate function like ST_AsText to get them in this format.

License

Extensions created with the Custom Operator extensions are automatically AGPL licensed. Their „source code“ is available. You can right click on an operator from these extensions in RapidMiner Studio and open the process behind it. So you can easily check how it works and improve it.