ABSTRACT
As critical records are increasingly stored in electronic form, which tends to make for easy destruction and clandestine modification, it is imperative that they be properly managed to preserve their trustworthiness, i.e., their ability to provide irrefutable proof and accurate details of events that have occurred. The need for proper record keeping is further underscored by the recent corporate misconduct and ensuing attempts to destroy incriminating records. Currently, the industry practice and regulatory requirements (e.g., SEC Rule 17a-4) rely on storing records in WORM storage to immutably preserve the records. In this paper, we contend that simply storing records in WORM storage is increasingly inadequate to ensure that they are trustworthy. Specifically, with the large volume of records that are typical today, meeting the ever more stringent query response time requires the use of direct access mechanisms such as indexes. Relying on indexes for accessing records could, however, provide a means for effectively altering or deleting records, even those stored in WORM storage.In this paper, we establish the key requirements for a fossilized index that protects the records from such logical modification. We also analyze current indexing methods to determine how they fall short of these requirements. Based on our insights, we propose the Generalized Hash Tree (GHT). Using both theoretical analysis and simulations with real system data, we demonstrate that the GHT can satisfy the requirements of a fossilized index with performance and cost that are comparable to regular indexing techniques such as the B-tree. We further note that as records are indexed on multiple fields to facilitate search and retrieval, the records can be reconstructed from the corresponding index entries even after the records expire and are disposed of, Therefore, we also present a novel method to eliminate this disclosure risk by allowing an index entry to be effectively disposed of when its record expires.
- P. Bagwell. Ideal Hash Trees. Technical report, Programming Methods Laboratory, Institute of Core Computing Science, School of Computer and Communication Sciences, Swiss Institute of Technology Lausanne, 2001.Google Scholar
- B. Becker, S. Gschwind, T. Ohler, B. Seeger, and P. Widmayer. An Asymptotically Optimal Multiversion B-tree. The VLDB Journal: The International Journal on Very Large Data Bases, 5:264--275, 1996. Google ScholarDigital Library
- A. Z. Broder and A. R. Karlin. Multilevel Adaptive Hashing. In 1st ACM-SIAM Symposium on Discrete Algorithms, 1990. Google ScholarDigital Library
- Cohasset Associates, Inc. The role of optical storage technology. White Paper, Apr. 2003.Google Scholar
- Congress of the United States of America. Sarbanes-Oxley Act of 2002, 2002. Available at http://thomas.loc.gov.Google Scholar
- M. Dietzfelbinger, A. Karlin, K. Mehlhorn, F. M. A. D. Heide, H. Rohnert, and R. E. Tarjan. Dynamic Perfect Hashing: Upper and Lower Bounds. SIAM Journal on Computing, 23(4):738--761, 1994. Google ScholarDigital Library
- M. C. Easton. Key-Sequence Data Sets on Indelible Storage. IBM Journal of Research and Development, May 1986. Google ScholarDigital Library
- EMC Corp. EMC Centera Content Addressed Storage System, 2003. Available at http://www.emc.com/products/systems/centera_ce.jsp.Google Scholar
- R. J. Enbody and H. C. Du. Dynamic Hashing Schemes. ACM Computing Survey, 20(2), June 1988. Google ScholarDigital Library
- P. Gutmann. Secure Deletion of Data from Magnetic and Solid-State Memory. In 6th USENIX Security Symposium, July 1996. Google ScholarDigital Library
- W. W. Hsu and S. Ong. Fossilization: A process for establishing truly trustworthy records. Research Report RJ 10331, IBM Almaden Research Center, San Jose, CA, 2004. Available at http://www.research.ibm.com/resources/paper_search.shtml.Google Scholar
- IBM Corp. IBM TotalStorage DR550, 2004. Available at http://www-1.ibm.com/servers/storage/disk/dr.Google Scholar
- E. W. Myers. Efficient Applicative Data Types. In 11th ACM Symposium on Principles of Programming Languages, 1984. Google ScholarDigital Library
- National Institute of Standards and Technology. FIPS 180-1. Secure Hash Standard. US Department of Commerce, 1995.Google Scholar
- National Institute of Standards and Technology. FIPS PUB 197, Advanced Encryption Standard (AES), 2001.Google ScholarCross Ref
- Network Appliance, Inc. SnapLock#8482; Compliance and SnapLock Enterprise Software, 2003. Available at http://www.netapp.com/products/filer/snaplock.html.Google Scholar
- P. Rathmann. Dynamic Data Structures on Optical Disks. In 1st International Conference on Data Engineering, 1984. Google ScholarDigital Library
- T. Reps, T. Teitelbaum, and A. Demers. Incremental Context-Dependent Analysis for Language-based Editors, ACM Transactions on Programming Language Systems, 5:449--477, 1983. Google ScholarDigital Library
- N. Sarnak and R. E. Tarjan. Planar Point Location Using Persistent Search Tree. Communications of the ACM, 29(7), July 1986. Google ScholarDigital Library
- F. Scholer, H. Williams, J. Yiannis, and J. Zobel. Compression of inverted indexes for fast query evaluation. In 25th ACM Conference on Research and Development in Information Retrieval, 2002. Google ScholarDigital Library
- Securities and Exchange Commission. SEC Interpretation: Commission Guidance to Broker-Dealers on the Use of Electronic Storage Media under the Electronic Signatures in Global and National Commerce Act of 2000 with Respect to Rule 17a-4(f), 2001. Available at http://www.sec.gov/rules/interp/34--44238.htm.Google Scholar
- Socha Consulting LLC. The 2004 Socha-Gelbmann Electronic Discovery Survey, 2004.Google Scholar
- Sony Corp. AIT-2/AIT-3 WORM Drives & Libraries, 2003. Available at http://www.storagebysony.com/products/prod_hilite4.asp.Google Scholar
- M. Stonebraker. The Design of the POSTGRES Storage System. In 13th VLDB Conference, 1987. Google ScholarDigital Library
- T. Krijnen and L. G. L. T. Meertens. Making B-Trees Work for B.IW 219/83. The Mathematical Centre, Amsterdam, The Netherlands, 1983.Google Scholar
- The Enterprise Storage Group, Inc. Compliance: The effect on information management and the storage industry, May 2003.Google Scholar
- D. Wilton. How Many Words Are There In The English Language? Wilton's Word and Phrase Origins, 2001.Google Scholar
- Fossilized index: the linchpin of trustworthy non-alterable electronic records
Recommendations
Comparison between the zeroth-order Randić index and the sum-connectivity index
The zeroth-order Randić index and the sum-connectivity index are very popular topological indices in mathematical chemistry. These two indices are based on vertex degrees of graphs and attracted a lot of attention in recent years. Recently Li and Li (...
Szeged index, edge Szeged index, and semi-star trees
A semi-star tree is a star tree whose some edges may be replaced by paths of length more than one. In this paper we present some increasing and decreasing transformations for Szeged index of the semi-star trees. Then we introduce palm semi-star tree and ...
Inverted index maintenance strategy for flashSSDs
An inverted index is a core data structure of Information Retrieval systems, especially in search engines. Since the search environments have become more dynamic, many on-line index maintenance strategies have been proposed. Previous strategies were ...
Comments