By Marcin Grabinski
DevOps is a culture entailing close collaboration between software developers and IT operations teams, with the goal of rolling out higher quality software and frequent modifications to this software, faster. DevOps has helped many companies radically transform their development operations. For example, with DevOps in place, Amazon engineers are now deploying code every 11.7 seconds on average; Etsy executes 50 deployments per day; and Adobe has been able to meet a recent 60 percent spike in app development demand.
The software development industry is also beginning to understand the importance of addressing security requirements as part of the DevOps process. Gartner recently coined this “DevOpsSec,” or “the three legs of the agile triangle.” DevOps shops must balance the drive for speed and agility with the enterprise need to protect critical assets, applications, and services.
Testing Comes Under the Security Microscope
There’s one area of software development particularly in need of greater attention to security—and that is testing. According to a recent survey, 83 percent of U.S. companies report using live customer data in testing processes, because they believe this delivers the most accurate representation of application performance and behavior in “real life.” Any time live customer data is extracted from a production environment, for any purpose, there is a chance for mishandling or loss—accidental or otherwise. Another 83 percent noted they give real customer data to outsourcers for testing processes, which increases the risk for loss or mishandling exponentially. These ingrained habits create a perfect storm that can expose testers to major risk.
The reputational damage suffered by companies who fail to protect personal data can translate directly to revenue losses. New privacy regulations are now upping this ante. The European Union’s (EU’s) recently passed General Data Protection Regulation (GDPR) is often perceived as primarily impacting EU-based companies, but these new laws actually apply to any organization that possesses data on EU customers. This includes more than half of large U.S.-based businesses—according to the same survey mentioned above. Come May 2018, businesses that fail to mask their customer data in testing processes will find themselves in direct violation of GDPR.
Meeting the Test Data Privacy Mandate
Like many other information security initiatives, test data privacy projects can be complex and difficult, but they can be aligned with DevOps initiatives. Suggestions for getting started include inventorying sensitive data, determine the ideal disguise rule, and create lookout tables.
Inventory sensitive data and now what can be kept. The first step is to take inventory of all sensitive data, by creating and identifying the columns of information that need to be disguised. Contrary to popular belief, this does not include names, and in fact eliminating names can make it unnecessarily difficult to identify customer records as data moves across transaction paths and platforms in the testing process. For example, in the basic encryption model, an input of “Marcin Grabinski” may deliver an output of “KL/BCrWkXniHAdoN0zhLEw.” Upon brief glance, this is indistinguishable to testers and therefore unacceptable—unless the process is automated, but most testing continues to be manual.
The goal of test data privacy is not to disguise data itself, but make it reasonably difficult to identify individuals—a concept known as “pseudonymisation.” It’s ok to use real customer names from the production database, as long as these names are not linked to home addresses, date of birth, passport, license number, or any other identifying information. Keeping real, easily recognizable names—for example, Jane Doe—makes testing processes more rapid, efficient, and accurate for manual testers as they track application execution in the testing environment.
Determine the ideal disguise rule. Once companies determine what information needs to be masked, the next step is to create a disguise rule for each type of sensitive data. There are various techniques, the best known being encryption—or, the process of encoding messages or information so only authorized parties can read them.
As discussed above, the challenge with standard encryption is that it can make it very difficult for testers to identify what type of information he or she is viewing. Take the example noted above, “KL/BCrWkXniHAdoN0zhLEw”—not only is this not easy to simply view and recall as a name, but one can’t even tell if it is a name—or an address, or a phone number for that matter. In the context of test data privacy, format-preserving encryption tends to work better. This flavor of encryption keeps the original format of input data while masking it—thus making it more useful for data testing purposes. An example is phone numbers in the U.S.—these may be reflected as three digits-three digits-four digits—so the tester knows he or she is looking at a phone number, but the content is encrypted—giving a different set of digits.
While format-preserving encryption can work great for data like phone numbers, it doesn’t always work well for data that doesn’t follow a standard format, like addresses. This is especially true for organizations dealing with GDPR, since addresses are formatted differently across countries. Data translation, which takes existing data records stored within an organization’s files, and scrambles and assigns them to mask sensitive data values—is a good option for organizations looking to mask address information that follows a uniform format.
Create lookup tables. If realistic names or addresses must be used, there is no magic to ease the process—data lookup tables must be set up. However, this doesn’t need to be a time-intensive or painful process. First, there doesn’t need to be a huge volume of data records. Some organizations think that in order for testing to be comprehensive, they need to test as many rows as a production contact table contains. This is not true—it is perfectly acceptable to test only one to five percent of data records.
The same ratio is good for addresses. A huge volume is not required. As we discussed above, masking addresses can be challenging due to the wide variety in formats, and inconsistencies between items like zip codes and cities and streets, which can render records invalid for testing. There are ways to get around this—some organizations opt to keep true address information—as long as other personally identifiable information attached to it is properly masked, and some even remove this private address information altogether, replacing it with substitute addresses. For example, a large organization with multiple locations in the same country can swap in these addresses as “dummies” to avoid having any real private addresses in their testing environment.
Conclusion
In a DevOps team, everyone is moving quickly—especially testers—and organizations need to insulate them from unnecessary risk. Substantial efforts aimed at producing excellent software on time and on budget will be severely undercut if the software products are not secure, or if customers’ data privacy is compromised anywhere in the creation process. Security solutions are evolving and becoming more adaptable to the pace of DevOps. As this happens, one important way to address security in software development is through properly masking all production-level customer data used in the testing process. The techniques described above are a great way to get started.
Marcin Grabinski, EMEA Technical Solution Specialist, Compuware, has almost 20 years of experience in the IT sector, including over 15 in mainframe. Grabinski is Compuware’s resident expert in its Test Data Privacy solution and he has been involved in the execution of numerous data privacy projects with major companies in the financial services sector across Europe.
SoftwareMag, Nov2016