Scrapple: Fast Analytical Query Evaluation via Advanced Query Recycling Techniques

The complex analytical queries characterizing decision support applications can be very expensive to compute, and the value of such applications is directly correlated to the speed at which answers can be returned to the user. Typically, once queries have been answered, database systems simply discard the results. However, a huge optimization opportunity is missed by doing this: there is tremendous latent energy in the discarded query results, if we only knew how to recycle them to help answer subsequent related queries. The goal of the project is to develop Scrapple, a principled database management system that aggressively reuses old query results to speed up the answering of new queries, resulting in potentially dramatic performance gains for a large class of decision support applications. Moreover, by using fully automated techniques, rather than today's laborious, one-off approaches to optimizing data warehouses, Scrapple also promises to dramatically reduce the total cost of ownership of a typical data warehouse. Scrapple's basic strategy is to view cached query results (and their intermediate subresults) as materialized views, and then employ advanced techniques for optimizing queries using materialized views to help answer subsequent queries. Executing this strategy requires solving several formidable technical challenges: existing techniques for optimizing queries using materialized views must be pushed to work with wider classes of queries, reformulation strategies must incoporate novel differential techniques, and fundamental theoretical questions regarding query equivalence and reformulation in this setting must be resolved.

Project Members

Principle Investigator: PhD Students: Post-docs: Collaborators:

Funding

Scrapple is supported by NSF CAREER Award IIS-1055107.

Publications

Education and Outreach

The main educational outreach activity of the first year was the organization and hosting of an inaugural edition of Northern California Database Day. This was a single day, informal workshop featuring student poster sessions and keynote talks delivered by three invited speakers: Jennifer Widom (Stanford), Molham Aref (LogicBlox Inc), and Sam Madden (MIT). It was free and open to the public, and was advertised in particular to undergraduate students as a way of introducing them to current trends in database research. We had an excellent turnout of around 100 participants, including many UC Davis undergraduates. The second edition is scheduled to be hosted in April, 2012 at UC Berkeley.