Analyzing data streams in scientific applications
Modern scientific instruments such as satellites, on-ground antennas, and simulators collect large volumes of data. For example, instruments monitoring the environment emit streams of environmental sensor readings, particle colliders produce streams of particle collision data, and software telescopes such as LOFAR33 produce very voluminous digitized radio signals. The measurement data is normally produced as streams rather than formats stored in conventional database tables. A stream has the property that data is ordered in time, and the data volume is potentially unlimited. Scientists perform a wide range of on-line analyses over the data streams. A conventional approach to data management using a relational database management system (DBMS) has the disadvantage that streaming data has to be loaded into a database before it can be queried and analyzed. If the data rate of a stream is too high, it will be impossible for the DBMS to load the streaming data fast enough. This creates backlogs of unanalyzed data, and the high data volume produced by scientific instruments can even be too large to store and process.2 Furthermore, offline data processing prevents timely analysis of interesting natural events as they occur.
Risch, T, Madden, S, Balakrishan, H, Girod, L, Newton, R, Ivanova, M.G, … Riedewald, M. (2009). Analyzing data streams in scientific applications. In Scientific Data Management: Challenges, Technology, and Deployment (pp. 399–429). doi:10.1201/9781420069815