Paxata Raises Bar on Self-Service Data Preparation

08.06.2015

Paxata, provider of the only Adaptive Data Preparation™ platform for the enterprise, today announced the general availability of its Summer ’15 release. Responding to enterprise customer needs, the latest version of its award-winning solution includes significant platform and application enhancements. The Paxata Summer ’15 release is the first in the data prep industry to address critical enterprise-grade security, manageability and performance functionality.

“From the outset, our product vision has been to leverage in-memory distributed computing, artificial intelligence, cloud architecture and modern UX design technologies to deliver the most comprehensive solution that unifies IT and business user requirements for data quality, data integration, enrichment, governance and collaboration into a single platform,” said Nenshad Bardoliwalla, Co-Founder and VP of Products. “With the significant innovation we have delivered with this release, we take another major step in our journey to arm every organization with the critical enterprise information fabric which transforms data into valuable information.”

“Paxata shines when it comes to data integration, quality, enrichment and governance, providing a spreadsheet-like application front-end and machine-learning back-end,” said Krishna Roy, Senior Analyst, Data Platforms and Analytics at 451 Research. “Paxata’s Summer ’15 release builds on their success with business analysts and delivers enhancements suited for enterprise needs, including massive performance power achieved by optimized Spark integration, dynamic provisioning of elastic clusters for Cloud tenants, Kerberos support to add enterprise-grade security and authentication, and richer governance of data prep projects, including deeper integration with Splunk for user and project monitoring.”
“While the market is quickly becoming cluttered with many players who have followed us by trying to offer some elements of self-service data preparation or rebranded their existing legacy solutions, none have had Paxata’s singular focus from the outset on delivering a comprehensive data preparation platform built to scale,” said Prakash Nanduri, Co-Founder and CEO of Paxata. “I am thrilled our vision and commitment has been met with tremendous market response, as we cross the 50 customer mark with 400 percent year-over-year revenue growth to prove it.”

Business analysts and IT teams continue to be Paxata’s focus and this release features new functionality for data integration, data quality and governance, including:

• Two-factor governance capabilities for data prep functions and projects – often, IT teams are perceived as bottlenecks rather than enablers because they stand between the data and the analysts who need it. Paxata has designed a unique approach to give both teams what they need without compromise: data administrators have control of all functional permissions (who can perform what types of functions), while resource permissions (who has access to what datasets and projects) can be set by the analysts.

• IntelliFusion™ custom match options – provides users with the ability to dynamically select the IntelliFusion matching method of their choice, either:
o automatic/fuzzy join, which allows users to join data, regardless of word order changes or different types of punctuation
o exact match, where the data values need to be exactly the same in order to join, very much like a traditional relational database or
o the newly added custom match, which allows users to choose whether to ignore or respect word order, white space or other punctuation

• The introduction of multiple publish points within a single project – this unique capability lets users publish multiple Paxata AnswerSets from a single project. Rather than creating multiple projects that are slightly varied based on the different analytic requirements, analysts can start with a single project, then produce different AnswerSets as they work, and each is saved and remembered.

• Enhanced find and replace – in addition to replacing the “matched” value in a cell, Paxata customers can now also replace the entire contents of the cell. This allows users to do find and replace of an entire cell’s contents. For example, within in a string of text users can do full cell replace of a formal name to an abbreviation, or vice versa, which expedites the process of data normalization required in data integration projects.

• Cell-level histograms – provides the ability to visualize histogram information directly inside of individual cells in order to improve data quality. Paxata histograms, which are graphical representation of the distribution of numerical data, can be viewed at the individual cell level so that users can see the relative size of the histograms as they scroll up and down within a grid.
As always, all Paxata platform releases provide back-end advancements across five key areas: performance, efficiency, elasticity, connectivity and scalability. The Summer ’15 release includes:
Performance

• On-line aggregation – the ability to compute all aggregates (average, count, first, last, max, min, median, sum, variance standard deviation) in an on-line fashion. This increases responsiveness and performance while dramatically reducing the amount of memory required by Spark and the network.
Elasticity
• Tenant-level elasticity – Paxata now provides for dynamic provisioning of elastic clusters per tenant, allowing shared services organizations to deploy computational resources and associated chargebacks to each tenant dynamically and independently while still managing the entire infrastructure as a single system.

Efficiency
• User and project monitoring – through extensive integration with Splunk, enterprise-class log monitoring is now available for Paxata administrators who want real-time operational intelligence around top or unique users and behaviors, the number of data prep projects created, the average length of user sessions, tracking exceptions within the cluster, and even cluster utilization by tenant within a shared services environment.

Connectivity – with emphasis on enterprise-grade security and controls
• Support for Cloudera Distribution of Hadoop 5.4.0 and Apache Spark 1.3.0 – Paxata now takes advantage of the latest innovations in Apache Spark and CDH.
• Kerberos support for Hadoop and MongoDB – this provides important security and authentication enhancements to meet the requirement of enterprise customers. In addition to the existing Kerberos support for secured access to HDFS, Paxata now supports the ability for MongoDB, which stores all of the application metadata, to be connected via Kerberos authentication protocols.

Scalability
• Persistent columnar caching – while sampling data is sometimes a desired approach, it is frequently dangerous for mission critical enterprise applications. Paxata latest improvements in its columnar architecture on Apache spark allows a customer’s entire data set to be worked on interactively, regardless of data set size. Paxata’s adaptive persistent columnar caching:
o persists data across applications, so systems can be taken down and brought back up while still retaining the data
o reduces roll-back or re-compute efforts by closely tracking data lineage as it moves through the cache
o supports random access anywhere in the data, unlike Spark’s cache which is designed for saving the state of an entire partition of data

This release has been deployed to Paxata Cloud customers. Customers with Paxata on site can get release details by logging into PaxWorld for support.

paxata.com

Aug 6, 2005Cassie Balentine

Paxata Raises Bar on Self-Service Data Preparation

Product Centrics

Quick Links