SAPIEN provides the tools to add self-awareness for optimization to systems that monitor their performance by building models of this performance, then suggesting new configurations likely to improve this performance, continually learning. SAPIEN combines fast, industrial-strength machine learning and optimization implementations.
SAPIEN’s model building component is a proprietary general-purpose decision tree implementation that combines the Hoeffding Trees (Domingos and Hulten) and Random Forests (Breiman) algorithms with Multi-Task Learning (Caruana) and Automatic Model Calibration, yielding a unique and powerful combination of speed, scalability, accuracy, and robustness. SAPIEN’s optimization component is a proprietary model-driven implementation of Extremal Optimization (Boettcher) that supports arbitrary user-defined utility functions as well as hard and soft constraints.
The algorithms used provide a number of qualities desirable to system designers:
- Speed: All SAPIEN algorithms run in linear time and are written in highly optimized Java code for maximum performance
- Scalability: data streams with millions of rows and tens of thousands of columns are handled with ease without needing to fit in main memory
- Multi-task Learning and Optimization: SAPIEN can model and optimize multiple aspects of system performance simultaneously, even when those aspects are non-linearly correlated.
- Anytime Learning and Optimization: The model building and optimization algorithms can be stopped at anytime to produce best-so-far results
- Heterogeneous data support: floating point, integer, and string data can intermix freely with no normalization or preprocessing required.