Scrapple: Fast Analytical Query Evaluation via Advanced Query
Recycling Techniques
The complex analytical queries characterizing decision support
applications can be very expensive to compute, and the value of such
applications is directly correlated to the speed at which answers can
be returned to the user. Typically, once queries have been answered,
database systems simply discard the results. However, a huge
optimization opportunity is missed by doing this: there is tremendous
latent energy in the discarded query results, if we only knew how to
recycle them to help answer subsequent related queries. The goal of
the project is to develop Scrapple, a principled database management
system that aggressively reuses old query results to speed up the
answering of new queries, resulting in potentially dramatic
performance gains for a large class of decision support applications.
Moreover, by using fully automated techniques, rather than today's
laborious, one-off approaches to optimizing data warehouses, Scrapple
also promises to dramatically reduce the total cost of ownership of a
typical data warehouse. Scrapple's basic strategy is to view cached
query results (and their intermediate subresults) as materialized
views, and then employ advanced techniques for optimizing queries
using materialized views to help answer subsequent queries. Executing
this strategy requires solving several formidable technical
challenges: existing techniques for optimizing queries using
materialized views must be pushed to work with wider classes of
queries, reformulation strategies must incoporate novel differential
techniques, and fundamental theoretical questions regarding query
equivalence and reformulation in this setting must be resolved.
Project Members
Principle Investigator:
PhD Students:
Post-docs:
Collaborators:
Funding
Scrapple is supported by NSF CAREER Award IIS-1055107.
Publications
- Mingmin Chen and Todd J. Green. Bag Equivalence of
Bounded-Symmetry Degree Conjunctive Queries with Inequalities.
AMW, 2011.
- Shan Shan Huang, Todd J. Green, and Boon Thau Loo. Datalog and Recursive Query Processing: an Interactive Tutorial. SIGMOD, 2011.
- Daniel Zinn, Bertram Ludaecher, and Todd J. Green. Win-Move is
Coordination-Free (Sometimes). ICDT, 2012.
- Todd J. Green and Zachary G. Ives.
Recomputing materialized instances after changes to mappings and
data. ICDE, 2012. Best Paper Award, runner-up.
Education and Outreach
The main educational outreach activity of the first year was the organization and hosting of
an inaugural edition of Northern California Database Day.
This was a single day, informal workshop featuring student poster sessions and keynote
talks delivered by three invited speakers: Jennifer Widom (Stanford), Molham Aref
(LogicBlox Inc), and Sam Madden (MIT). It was free and open to the public, and was
advertised in particular to undergraduate students as a way of introducing them to current
trends in database research. We had an excellent turnout of around 100 participants,
including many UC Davis undergraduates. The second edition is scheduled to be hosted in
April, 2012 at UC Berkeley.