As the hype over cloud computing continues, many CFOs wonder about worst-case scenarios. For the purposes of auditing or in case of litigation, how will the company retrieve crucial e-mails, HR records, transaction data and other financial documents in a cloud environment?
“The actual processes and procedures are completely dependent upon the cloud provider, so the process will vary in each case,” says Jim Shook
, director of EMC Corporation’s e-discovery and compliance legal team. “Sometimes the cloud provider will make tools available for self-help, but most frequently the discovery process is completely ad-hoc.”
Shook spoke to Carol Ko of Asia Cloud Forum
’s sister publication, owned by Questex Media), on electronic discovery (e-discovery) issues in a cloud environment and how companies can make sure they have the ability to access needed information for auditing, litigation and other purposes.
What is e-discovery?
It can actually mean a lot of different things. Most frequently, e-discovery
relates to identifying, preserving, collecting and processing electronically stored information for the purposes of litigation.
For example, if a company is involved in a lawsuit in the United States, one of its obligations is to find all e-mail messages that relate to the issues that are disputed, and then provide them to the other side in the lawsuit, regardless of whether they are helpful or damaging to the case.
E-discovery is also a term that is being used more frequently outside of the litigation context. For example, many companies that conduct internal investigations or audits, and many regulators overseeing companies, require access to information that is stored electronically. The process is very similar to that in litigation, but has broader implications, especially outside of the U.S.
Are you seeing growing demand for e-discovery in Asia?
There is absolutely a growing demand for e-discovery in Asia, and I can see three very strong reasons:
- Any company that transacts business with US or UK-based companies, or does business within those countries, may be subject to a lawsuit that will require e-discovery.
- Regulators throughout the world require access to information, and more frequently, that information is only available electronically. More data is “born digital” -- some studies suggest that 80 to 90% of information is never available on paper.
- Companies conducting internal audits or investigations also have a need for e-discovery processes, for the same reasons as the regulators.
How similar or different is conducting e-discovery in traditional self-managed data centers versus third-party managed cloud computing facilities?
There is an enormous difference.
In a self-managed data center, an entity has access to all of the data. It controls the personnel, and it can build tools or hire additional help to access the necessary information. In a third-party managed facility, the ability to conduct e-discovery is controlled by the relationship between the parties [frequently through a written contract] and the party needing the discovery is almost entirely dependent upon the cloud provider.
Sometimes the cloud provider will make tools available for self-help, but most frequently the discovery process is completely ad-hoc.
How is e-discovery actually being conducted in a cloud computing facility?
The actual processes and procedures are completely dependent upon the cloud provider, so the process will vary in each case. But from a high level, most of the time the steps would follow this general path:
1) The customer notifies the cloud provider that it has a need for e-discovery, and provides the cloud provider with some information to trigger the cloud provider’s responsibility to preserve data.
2) The customer provides the cloud provider with a scope of data that is needed. For example, if email has been outsourced to the cloud, the customer would have to specify the email that it requires by providing a date range, recipients, and keywords in the text of the message or some other filtering mechanism. This “filter” would need to fit the customer’s needs, on the one hand, and yet match the cloud provider’s capability to actually search for and preserve the data.
3) The cloud provider then must use its tools and/or processes to conduct the search and collect the data according to the customer’s specifications. Note that the cloud provider should create an audit path of who conducted the work and how it was performed, in case this is necessary later in the litigation process to establish authenticity and admissibility of the collected data [and to verify that discovery was correctly performed]. Unfortunately, most cloud providers today are unlikely to understand what is required in this step.
4) Once collected, the data is provided to the customer. If the data collected is a small amount, it might be sent on CDs, DVDs, a USB drive or even over FTP [file transfer protocol]. But many collections can be hundreds of gigabytes or even terabytes of data, which would require some alternative method of providing the data. Note also that the data should be preserved in its native format, not in a printed/image format;
5) Once the customer has its data, it can conduct further processing as needed.
Data can reside and move around anywhere on the globe, as in the case of public clouds. How can enterprises achieve forensically sound e-discovery?
There is some misunderstanding about the requirement for “forensically sound” processes in the discovery process. In most civil litigation and civil-based e-discovery, “forensically sound” processes are not needed, at least not in the sense that the term is generally used.
There are issues with cloud data in e-discovery that can make the data more difficult to authenticate and have admissible at trial. For the most part, the actual movement of data within a cloud infrastructure is not an issue in this process [although the data movement is a potential problem for privacy issues].
Being able to track how the data is collected and establishing where it was collected from are of more importance in the process for purposes of e-discovery.
What are the privacy issues that you alluded to?
This is a huge issue in the cloud world. In countries with strict privacy requirements, such as the EU, data probably should never leave the jurisdiction -- and of course, this can be more difficult in a cloud environment.
Even if the storage location of the data is still compliant with the applicable privacy law requirement -- such as through a Safe Harbor or Model Contract exception -- moving and processing the data can subject it to additional conflicting laws that make compliance very difficult.
A good example is with EU-based data that is transmitted to the U.S. Even if the requirements of the EU’s Data Protection Act have been initially met, the fact that the data is now in the U.S. can create conflicting requirements between further processing and the U.S.-based e-discovery requirements and subpoena powers.
Given the requirement of co-location and business continuity planning, which copy of a document will be taken by the court as the true and authentic copy -- the original one, the imaged copy, or the copy in the backup facility?
Fortunately, in an electronic world, there should be no difference between these choices as properly maintained electronic versions are precise copies of one another.
The only differences between these copies should relate to establishing how they wound up in the final location, and providing information about that business process so that the data can be authenticated.
There is a rule of evidence in the U.S. called the “Best Evidence” rule that might establish a preference for the original physical copy of a document in certain cases.
There might also be issues with authenticity and admissibility of the different “e-versions” depending on who held those copies and how they arrived at that location.
In a multi-tenanted data storage facility, how can users ensure their data is not tainted or altered as a result of e-discovery being conducted on its own or on its virtual neighbor's data?
This is another significant concern in the cloud environment. Cloud customers must be careful in conducting their up-front due diligence and in creating contractual provisions that give them protection from these types of issues.
Even so, putting data in the cloud necessarily creates new reliance on a third party, which can cause issues. That’s one trade-off for the other benefits provided by the cloud.
What is the average cost for conducting e-discovery in a cloud computing facility? Can the cost of e-discovery actually be built into a cloud computing service contract?
Unfortunately, I have not seen any data for typical or average costs.
The costs for handling e-discovery -- and the processes or even an SLA (service level agreement) -- should be referenced in a cloud computing service contract. However, the best model is probably not to try to build in the cost of responding to requests. These costs are likely to vary widely among customers -- and trying to burden each customer with a charge might not result in efficient pricing models.
Instead, an agreement should clearly outline for the customer what can [and cannot] be done, as well as the cost of such additional services.
Ideally, self-help tools should eventually be common in cloud computing environments, but those may take time to properly develop. Once self-help tools are available, it makes sense for the cost of e-discovery -- i.e,. the tool -- to be built into the agreement.
In the case of a state-prosecuted crime, how can enterprises protect their corporate data from being seized by the law-enforcement agencies?
This is another trade-off with the cloud, since the data is in the control of a third-party. Companies should build requirements into their cloud contracts that specify how and when they must be notified by the cloud provider when data is requested by subpoena or from a law enforcement agency. There should be penalties for failing to follow the process.
However, even if such provisions are in a contract, there are laws and situations where the cloud provider might be prohibited from notifying the customer that data has been requested.