ABSTRACT
Public-use sensor datasets are a useful scientific resource with the unfortunate feature that their provenance is easily disconnected from their content. To address this we introduce a technique to directly associate provenance information with sensor datasets. Our technique is similar to traditional watermarking but is intended for application to unstructured datasets. Our approach is potentially imperceptible given sufficient margins of error in datasets, and is robust to a number of benign but likely transformations including truncation, rounding, bit-flipping, sampling, and reordering. We provide algorithms for both one-bit and blind mark checking. Our algorithms are probabilistic in nature and are characterized by a combinatorial analysis.
- Rakesh Agrawal and Jerry Kiernan. Watermarking relational databases. In VLDB '02: Proceedings of the 28th international conference on Very Large Data Bases, pages 155--166. VLDB Endowment, 2002. Google ScholarDigital Library
- Karen S. Baker and Lynn Yarmey. Data stewardship: Environmental data curation and a web-of-repositories. The International Journal of Digital Curation, 4(2), 2009.Google ScholarCross Ref
- Deepavali Bhagwat, Laura Chiticariu, Wang Chiew Tan, and Gaurav Vijayvargiya. An annotation management system for relational databases. VLDB J., 14(4):373--396, 2005.Google ScholarCross Ref
- Peter Buneman, Adriane Chapman, and James Cheney. Provenance management in curated databases. In SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 539--550, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- Richard Chbeir and David Gross-Amblard. Multimedia and metadata watermarking driven by application constraints. In MMM. IEEE, 2006.Google ScholarCross Ref
- Wang chiew Tan. Containment of relational queries with annotation propagation. In In Proceedings of the International Workshop on Database and Programming Languages (DBPL, pages 37--53, 2003.Google Scholar
- Ingemar J. Cox and Matt L. Miller. The first 50 years of electronic watermarking. EURASIP J. Appl. Signal Process., 2002(2):126--132, 2002. Google ScholarDigital Library
- Yingwei Cui and Jennifer Widom. Lineage tracing for general data warehouse transformations. In 27th International Conference on Very Large Data Bases (VLDB 2001), 2001. Google ScholarDigital Library
- Jessica Fridrich and Miroslav Goljan. Comparing robustness of watermarking techniques. In Security and Watermarking of Multimedia Contents, volume 3657 of Proceedings of Spie--the International Society for Optical Engineering, 1999.Google Scholar
- Ashish Gehani and Ulf Lindqvist. Veil: A system for certifying video provenance. In ISM '07: Proceedings of the Ninth IEEE International Symposium on Multimedia, pages 263--272, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
- Harold I. Jacobson. The maximum variance of restricted unimodal distributions. Ann. Math. Statist., 40(5):1746--1752, 1969.Google ScholarCross Ref
- Jonathan Ledlie, Chaki Ng, David A. Holland, Kiran-Kumar Muniswamy-Reddy, Uri Braun, and Margo Seltzer. Provenance-aware sensor data storage. In Proceedings of the 1st IEEE International Workshop on Networking Meets Databases, 2005. Google ScholarDigital Library
- Thomas Lee, Stéphane Bressan, and Stuart E. Madnick. Source attribution for querying against semi-structured documents. In Sadri {16}, pages 33--39.Google Scholar
- Unkyu Park and John Heidemann. Provenance in sensornet republishing. In Proceedings of the 2nd International Provenance and Annotation Workshop, pages 208--292, Salt Lake City, Utah, USA, June 2008. Springer-Verlag. Google ScholarDigital Library
- Sagehen Creek Field Station Data Repository. http://sagehen.ucnrs.org/resources.htm. Last visited 10/29/09.Google Scholar
- Fereidoon Sadri, editor. CIKM'98 First Workshop on Web Information and Data Management (WIDM'98), Bathesda, Maryland, USA, November 6, 1998. ACM, 1998.Google Scholar
- SensorBase. http://sensorbase.org/. Last visited 10/29/09.Google Scholar
- Hubbard Brook Ecosystem Study. http://www.hubbardbrook.org/. Last visited 10/29/09.Google Scholar
- UbiSec&Sens Hmac-MD5 Implementation. http://www.ist-ubisecsens.org/downloads/hmac-md5/hmac-md5.php. Last visited 10/29/09.Google Scholar
- Jennifer Widom. Trio: A system for integrated management of data, acuray and lineage. In Proc. of the International Conference of Data Systems Research (CIDR), 2005.Google Scholar
Index Terms
- Self-identifying sensor data
Recommendations
Self-Identifying Data for Fair Use
Special Issue on Provenance, Data and Information QualityPublic-use earth science datasets are a useful resource with the unfortunate feature that their provenance is easily disconnected from their content. “Fair-use policies” typically associated with these datasets require appropriate attribution of ...
OPQL: A First OPM-Level Query Language for Scientific Workflow Provenance
SCC '11: Proceedings of the 2011 IEEE International Conference on Services ComputingProvenance, which is one kind of metadata that captures the derivation history of a data product, including its original data sources, intermediate products, and the steps that were applied to produce it, has become increasingly important in services ...
The perm provenance management system in action
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of dataIn this demonstration we present the Perm provenance management system (PMS). Perm is capable of computing, storing and querying provenance information for the relational data model. Provenance is computed by using query rewriting techniques to annotate ...
Comments