Abstract
The usage of Cloud Serviced has increased rapidly in the last years. Data management systems, behind any Cloud Service, are a major concern when it comes to scalability, flexibility and reliability due to being implemented in a distributed way. A Distributed Data Aggregation Service relying on a storage system meets these demands and serves as a repository back-end for complex analysis and automatic mining of any type of data. In this paper we continue our previous work on data management in Cloud storage. We present a formal approach to express retrieval and aggregation rules with a compact, yet powerful tool called Rule Markup Language. Our extended solution proposes a standard form to schemes and uses the tool to match the rules to the XML form of the structured data in order to obtain the unstructured entries from BlobSeer data storage system. This allows the Distributed Data Aggregation Service (DDAS) to bypass several steps when processing a retrieval request. Our new architecture is more loosely-coupled with a separate module, the new tool, used for transforming the XML entries to standard XML files which represent the final result. We model the dynamic behavior of the system using this new standard to ensure a simpler and efficient representation of the operations performed by the client while maintaining the constraints imposed by a distributed system running in the Cloud. Furthermore we prove that this method correctly performs the translation between the storage model’s unstructured view of data and the client’s structured objects.






Similar content being viewed by others
References
Aamodt, K., et al.: The ALICE experiment at the CERN LHC. JINST 3, S08002 (2008)
Bessani, A., Correia, M., Quaresma, B., André, F., Sousa, P.: Depsky: dependable and secure storage in a cloud-of-clouds. In: Proceedings of the sixth conference on Computer systems, EuroSys ’11, pp 31–46. ACM, New York, NY, USA (2011)
Brampton, A., MacQuire, A., Rai, I.A., Race, N.J.P., Mathy, L.: Stealth distributed hash table: a robust and flexible super-peered dht. In: Proceedings of the 2006 ACM CoNEXT conference, CoNEXT ’06, pp 19:1–19:12. ACM, New York, NY, USA (2006)
Cappello, F., Caron, E., Dayde, M., Desprez, F., Jegou, Y., Primet, P., Jeannot, E., Lanteri, S., Leduc, J., Melab, N., Mornet, G., Namyst, R., Quetier, B., Richard, O.: Grid’5000: A large scale and highly reconfigurable grid experimental testbed. In: Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing, GRID ’05, pp 99–106. IEEE Computer Society, Washington, DC, USA (2005)
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26, 4:1–4:26 (2008)
Chen, J., Sehrish, S., Liao, W.-K., Choudhary, A., Schuchardt, K.: Improving the average response time in collective i/o. In: Recent Advances in the Message Passing Interface, LNCS 6090, pp 71–73 (2011)
Glatard, T., Montagnat, J., Pennec, X.: Efficient services composition for grid-enabled data-intensive applications. In: Proceedings of the IEEE International Symposium on High Performance and Distributed Computing, pp 333–334 (2006)
Gorgan, D., Bacu, V., Rodila, D., Pop, F., Petcu, D.: Experiments on ESIP—Environment oriented satellite data processing platform. Earth Science Informatics 3(4), 297–308 (2010)
Hummer, W., Leitner, P., Dustdar, S.: Ws-aggregation: distributed aggregation of web services data. In: Proceedings of the 2011 ACM Symposium on Applied Computing, SAC ’11, pp 1590–1597. ACM, New York, NY, USA (2011)
Jacob, J.: A rule markup language and its application to uml. In: Leveraging Applications of Formal Methods, pp 26–41. Springer (2006)
Kulla, E., Spaho, E., Xhafa, F., Barolli, L., Takizawa, M.: Using data replication for improving qos in manets. In: Proceedings of the 2012 Seventh International Conference on Broadband, Wireless Computing, Communication and Applications, BWCCA ’12, pp 529–533. IEEE Computer Society, Washington, DC, USA (2012)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44, 35–40 (2010)
Lee, J.K., Sohn, M.M.: The extensible rule markup language. Commun. ACM 46(5), 59–64 (2003)
Nicolae, B., Antoniu, G., Bougé, L., Moise, D., Carpen-Amarie, A.: Blobseer: Next-generation data management for large scale infrastructures. J. Parallel Distrib. Comput. 71, 169–184 (2011)
Palankar, M. R., Iamnitchi, A., Ripeanu, M., Garfinkel, S.: Amazon s3 for science grids: a viable solution?. In: Proceedings of the 2008 international workshop on Data-aware distributed computing, DADC ’08, pp 55–64. ACM, New York, NY, USA (2008)
Pop, F., Gruia, C., Cristea, V.: Distributed algorithm for change detection in satellite images for Grid Environments. In: Parallel and Distributed Computing, 2007. ISPDC’07. Sixth International Symposium on (pp. 41-41). IEEE (2007)
Serbanescu, V., Pop, F., Cristea, V., Antoniu, G.: Architecture of distributed data aggregation service. In: Proceedings of the 2014 IEEE 28th International Conference on Advanced Information Networking and Applications, AINA ’14, pp 727–734. IEEE Computer Society, Washington, DC, USA (2014)
Song, S., Chen, L.: Indexing dataspaces with partitions. World Wide Web 16(2), 141–170 (2013)
Stam, A., Jacob, J., de Boer, F.S., Bonsangue, M.M., van der Torre, L.: Using xml transformations for enterprise architectures. In: Margaria, T., Steffen, B. (eds.) Leveraging Applications of Formal Methods, volume 4313 of Lecture Notes in Computer Science, pp 42–56. Springer Berlin Heidelberg (2006)
Sufyan Beg, M.M., Ahmad, N.: Soft computing techniques for rank aggregation on the world wide web. World Wide Web 6(1), 5–22 (2003)
Venugopal, S., Buyya, R., Ramamohanarao, K.: A taxonomy of data grids for distributed data sharing, management, and processing. ACM Comput. Surv., 38 (2006)
Xhafa, F., Kolici, V., Potlog, A.-D., Spaho, E., Barolli, L., Takizawa, M.: Data replication in p2p collaborative systems. In: Proceedings of the 2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, 3PGCIC ’12, pp 49–57. IEEE Computer Society, Washington, DC, USA (2012)
Yu, Y., Gunda, P.K., Isard, M.: Distributed aggregation for data-parallel computing: interfaces and implementations. In: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP ’09, pp 247–260. ACM, New York, NY, USA (2009)
Zhang, J, Tao, X., Wang, H.: Outlier detection from large distributed databases. World Wide Web 17(4), 539–568 (2014)
Acknowledgment
The research presented in this paper was supported by projects: “SideDOWN: Smart Internet Data Downloader and Aggregator,” ID: PN-II-IN-CI-2012-1-0324; CyberWater grant of the Romanian National Authority for Scientific Research, CNDI-UEFISCDI, project number 47/2012; MobiWay: Mobility Beyond Individualism: an Integrated Platform for Intelligent Transportation Systems of Tomorrow - PN-II-PT-PCCA-2013-4-0321; clueFarm: Information system based on cloud services accessible through mobile devices, to increase product quality and business development farms - PN-II-PT-PCCA-2013-4-0870.
The work was developed under the DataCloud@Work associated team between KerData and Myriads teams from INRIA Rennes - Bretagne Atlantique and the Computer Science Department from Politehnica University of Bucharest
The work is partly funded by the EU project FP7-610582 ENVISAGE: Engineering Virtualized Services (http://www.envisage-project.eu)
We would like to thank the reviewers for their time and expertise, constructive comments and valuable insight.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Serbanescu, V., Pop, F., Cristea, V. et al. A formal method for rule analysis and validation in distributed data aggregation service. World Wide Web 18, 1717–1736 (2015). https://doi.org/10.1007/s11280-015-0334-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-015-0334-4