skip to main content
10.1145/2076623.2076639acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

SciQL: bridging the gap between science and relational DBMS

Published: 21 September 2011 Publication History

Abstract

Scientific discoveries increasingly rely on the ability to efficiently grind massive amounts of experimental data using database technologies. To bridge the gap between the needs of the Data-Intensive Research fields and the current DBMS technologies, we propose SciQL (pronounced as 'cycle'), the first SQL-based query language for scientific applications with both tables and arrays as first class citizens. It provides a seamless symbiosis of array-, set- and sequence-interpretations. A key innovation is the extension of value-based grouping of SQL:2003 with structural grouping, i.e., fixed-sized and unbounded groups based on explicit relationships between elements positions. This leads to a generalisation of window-based query processing with wide applicability in science domains. This paper describes the main language features of SciQL and illustrates it using time-series concepts.

References

[1]
R. Agrawal et al. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In VLDB '95, pages 490--501, San Francisco, CA, USA, 1995.
[2]
C. M. Antunes and A. L. Oliveira. Temporal data mining: an overview. In Proceedings of the EPIA 2001 Workshop on Artificial Intelligence for Financial Time Series Analysis, 2001.
[3]
A. J. Bagnall and G. J. Janacek. Clustering time series from arma models with clipped data. In SIGKDD, pages 49--58, 2004.
[4]
F. Bancilhon et al., editors. Building an Object-Oriented Database System, The Story of O2. Morgan Kaufmann, 1992.
[5]
P. Baumann. A database array algebra for spatio-temporal data and beyond. In NGITS'2003, pages 76--93, 1999.
[6]
P. Baumann et al. The multidimensional database system RasDaMan. SIGMOD Rec., 27(2):575--577, 1998.
[7]
P. Boncz. Monet: A Next-Generation DBMS Kernel For Query-Intensive Applications. PhD thesis, UVA, Amsterdam, The Netherlands, May 2002.
[8]
P. G. Brown. Overview of SciDB: large scale array storage, processing and analysis. In SIGMOD, pages 963--968, New York, NY, USA, 2010. ACM.
[9]
C. Burrus, editor. Fast Fourier Transforms. Connexions, April 2009. http://cnx.org/content/col10550/1.21/.
[10]
J. P. Caraça-Valente and I. López-Chavarrías. Discovering similar patterns in time series. In SIGKDD, KDD '00, pages 497--505, 2000.
[11]
L. Chen, M. T. Özsu, and V. Oria. Robust and fast similarity search for moving object trajectories. In SIGMOD, pages 491--502, 2005.
[12]
R. Cornacchia et al. Flexible and efficient IR using Array Databases. VLDB Journal, special issue on IR&DB integration, 17(1):151--168, January 2008.
[13]
P. Cudre-Mauroux et al. A demonstration of SciDB: a science-oriented DBMS. PVLDB, 2(2):1534--1537, 2009.
[14]
S. B. Davidson. Tale of two cultures: Are there database research issues in bioinformatics? In SSDBM'02, page 3, Washington, DC, USA, 2002.
[15]
Dennis Shasha. Time series in finance: the array database approach. http://cs.nyu.edu/shasha/papers/jagtalk.html.
[16]
H. Ding et al. Querying and mining of time series data: experimental comparison of representations and distance measures. Proc. VLDB Endow., 1:1542--1552, August 2008.
[17]
M. J. Egenhofer. Why not SQL! International Journal of Geographical Information Systems, 6(2):71--85, 1992.
[18]
M. Falk et al. A First Course on Time Series Analysis. Chair of Statistics, University of Würzburg, 2006.
[19]
X. Ge and P. Smyth. Deformable markov model templates for time-series pattern matching. In SIGKDD, pages 81--90, 2000.
[20]
J. Gray, D. T. Liu, M. A. Nieto-Santisteban, A. S. Szalay, D. J. DeWitt, and G. Heber. Scientific data management in the coming decade. SIGMOD Record, 34(4):34--41, 2005.
[21]
M. Gyssens and L. V. S. Lakshmanan. A foundation for multi-dimensional databases. In VLDB, pages 106--115, 1997.
[22]
T. Hey, S. Tansley, and K. Tolle, editors. The Fourth Paradigm: Data-Intensive Scientific Discoveries. Microsoft Research, 2009. http://research.microsoft.com/en-us/collaboration/fourthparadigm/.
[23]
B. Howe and D. Maier. Algebraic manipulation of scientific datasets. VLDB J., 14(4):397--416, 2005.
[24]
D. Jiang et al. Interactive exploration of coherent patterns in time-series gene expression data. In SIGKDD, pages 565--570, 2003.
[25]
X. Jin, Y. Lu, and C. Shi. Similarity measure based on partial information of time series. In SIGKDD, pages 544--549, 2002.
[26]
V. Kavitha and M. Punithavalli. Clustering time series data stream - a literature survey. CoRR, abs/1005.4270, 2010.
[27]
E. Keogh et al. Dimensionality reduction for fast similarity search in large time series databases. Journal of Knowledge and Information Systems, 3(3):263--286, 2001.
[28]
E. Keogh et al. Finding surprising patterns in a time series database in linear time and space. In SIGKDD, pages 550--556, 2002.
[29]
E. Keogh and S. Kasetty. On the need for time series data mining benchmarks: a survey and empirical demonstration. In SIGKDD, pages 102--111, New York, NY, USA, 2002. ACM.
[30]
E. J. Keogh. A decade of progress in indexing and mining large time series databases. In VLDB, page 1268, 2006.
[31]
E. J. Keogh and M. J. Pazzani. An indexing scheme for fast similarity search in large time series databases. In SSDBM, pages 56--, Washington, DC, USA, 1999. IEEE Computer Society.
[32]
E. J. Keogh and M. J. Pazzani. A simple dimensionality reduction technique for fast similarity search in large time series databases. In PADKK '00, pages 122--133, London, UK, 2000. Springer-Verlag.
[33]
M. Kersten, Y. Zhang, M. Ivanova, and N. Nes. Sciql, a query language for science applications. In Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, AD '11, pages 1--12, 2011.
[34]
P. J. Killion et al. The longhorn array database (lad): An open-source, miame compliant implementation of the stanford microarray database (smd). BMC Bioinformatics, 4:32, 2003.
[35]
A. Lerner and D. Shasha. Aquery: query language for ordered data, optimization techniques, and experiments. In vldb'2003, pages 345--356. VLDB Endowment, 2003.
[36]
T. W. Liao. Clustering of time series data - a survey. Pattern Recognition, 38:1857--1874, 2005.
[37]
L. Libkin, R. Machlin, and L. Wong. A query language for multidimensional arrays: design, implementation, and optimization techniques. SIGMOD Rec., 25(2):228--239, 1996.
[38]
J. Lin et al. Visually mining and monitoring massive time series. In SIGKDD, pages 460--469. ACM, 2004.
[39]
D. Maier and B. Vance. A call to order. In PODS, pages 1--16, New York, NY, USA, 1993. ACM.
[40]
S. Makridakis. A survey of time series. International Statistical Review, 44:29--70, April 1976.
[41]
A. P. Marathe and K. Salem. Query processing techniques for arrays. VLDB J., 11(1):68--91, 2002.
[42]
J. Melton, J. E. Michels, V. Josifovski, K. Kulkarni, and P. Schwarz. SQL/MED: a status report. SIGMOD Rec., 31:81--89, September 2002.
[43]
MonetDB. http://monetdb.cwi.nl/.
[44]
A. Mueen and E. Keogh. Online discovery and maintenance of time series motifs. In SIGKDD, pages 1089--1098, 2010.
[45]
R. Ramakrishnan, D. Donjerkovic, A. Ranganathan, K. S. Beyer, and M. Krishnaprasad. Srql: Sorted relational query language. In SSDBM, pages 84--95, 1998.
[46]
SciDB Documentation. http://trac.scidb.org/wiki/LatestRelease.
[47]
SciDB Use Cases. http://www.scidb.org/use/.
[48]
SEED. Standard for the exchange of earthquake data, May 2010. http://www.iris.edu/manuals/SEEDManual_V2.4.pdf.
[49]
P. Seshadri et al. Sequence query processing. In SIGMOD, pages 430--441, New York, NY, USA, 1994. ACM.
[50]
P. Seshadri, M. Livny, and R. Ramakrishnan. The design and implementation of a sequence database system. In VLDB, pages 99--110. Morgan Kaufmann, 1996.
[51]
J. Shieh and E. Keogh. iSAX: indexing and mining terabyte sized time series. In SIGKDD, pages 623--631, 2008.
[52]
A. Shoshani, F. Olken, and H. K. T. Wong. Characteristics of scientific databases. In VLDB'84, pages 147--160, 1984.
[53]
A. Shoshani and H. K. T. Wong. Statistical and scientific database issues. IEEE Trans. Softw. Eng., 11(10):1040--1047, 1985.
[54]
S. W. Smith. The Scientist and Engineer's Guide to Digital Signal Processing. California Technical Publishing, 1997.
[55]
M. Stonebraker, J. Becla, D. J. DeWitt, K.-T. Lim, D. Maier, O. Ratzesberger, and S. B. Zdonik. Requirements for science data bases and SciDB. In CIDR. www.crdrdb.org, 2009.
[56]
A. R. van Ballegooij et al. Distribution Rules for Array Database Queries. In Proceedings of the International Workshop on Database and Expert Systems Application, pages 55--64, Copenhagen, Denmark, August 2005.
[57]
M. Vlachos et al. Indexing multidimensional time-series. The VLDB Journal, 15:1--20, January 2006.
[58]
L. Wei and E. Keogh. Semi-supervised time series classification. In SIGKDD, pages 748--753, 2006.
[59]
J. Yang, W. Wang, and P. S. Yu. Mining asynchronous periodic patterns in time series data. In SIGKDD, pages 275--279, 2000.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IDEAS '11: Proceedings of the 15th Symposium on International Database Engineering & Applications
September 2011
274 pages
ISBN:9781450306270
DOI:10.1145/2076623
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 September 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. SciQL
  2. array database
  3. array query language
  4. scientific databases
  5. time series

Qualifiers

  • Research-article

Conference

IDEAS '11

Acceptance Rates

Overall Acceptance Rate 74 of 210 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Optimizing Time Series Queries with VersionsProceedings of the ACM on Management of Data10.1145/36549622:3(1-27)Online publication date: 30-May-2024
  • (2023)NDRank: optimised parallel search for weather analoguesBig Earth Data10.1080/20964471.2023.21954687:2(276-297)Online publication date: 31-Mar-2023
  • (2022)60 Years of Databases (part three)PROBLEMS IN PROGRAMMING10.15407/pp2022.01.034(034-066)Online publication date: Mar-2022
  • (2022)A survey on machine learning in array databasesApplied Intelligence10.1007/s10489-022-03979-253:9(9799-9822)Online publication date: 12-Aug-2022
  • (2021)Array databases: concepts, standards, implementationsJournal of Big Data10.1186/s40537-020-00399-28:1Online publication date: 2-Feb-2021
  • (2020)A Relational Matrix Algebra and its Implementation in a Column StoreProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389747(2573-2587)Online publication date: 11-Jun-2020
  • (2020)On the Integration of Machine Learning and Array Databases2020 IEEE 36th International Conference on Data Engineering (ICDE)10.1109/ICDE48307.2020.00170(1786-1789)Online publication date: Apr-2020
  • (2020)Making an Array Database Language Server-Side Extensible2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378108(2743-2750)Online publication date: 10-Dec-2020
  • (2020)Integrating memory-mapping and N-dimensional hash function for fast and efficient grid-based climate data queryAnnals of GIS10.1080/19475683.2020.174335427:1(57-69)Online publication date: 2-Apr-2020
  • (2019)Accelerating array joining with integrated value-indexProceedings of the 31st International Conference on Scientific and Statistical Database Management10.1145/3335783.3335790(145-156)Online publication date: 23-Jul-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media