Beyond AutoML : Data Science Automation
While the rise of AutoML platforms has provided for faster execution of “test and learn” ML development, it has also brought about additional challenges. In most ML and data science projects, ML development is only one part of the process. The earlier stages of the process that require handling multiple raw tables and manipulating them based on in-depth domain knowledge to create flat, aggregated feature tables is a far more complicated and time-consuming challenge. The data and feature engineering process in enterprise data science has to deal with such different data as relational, transactional, temporal, geo-locational, and text data, which never starts from a single, flat, aggregated and cleansed table.
Data science automation provides for a full-cycle automation process that includes data and feature engineering, in addition to standard AutoML. The ability to automatically generate features from massive and complex tables further accelerates data scientist productivity and can deliver new business insights that augment knowledge, by exploring millions of new feature hypotheses.
dotDataPy: Data Science Automation for Data Scientists
dotDataPy is an enterprise-grade data science automation platform designed to make the life of the data scientist easier, while also working within the framework preferred by data scientists. dotDataPy allows data scientists to leverage data science automation within Python and execute the full-cycle process from raw business data through data and feature engineering through machine learning with only a few lines of Python code. Data scientists can quickly explore and validate their use cases with minimal upfront efforts.
dotDataPy provides the power of automation, but is also flexible enough to handle advanced use cases. dotDataPy can interface with a standard Python dataframe (like Pandas or Spark dataframe), ensuring that your preferred Python tools can easily consume any output generated by dotDataPy. dotDataPy is also easily connected with any data sources through dataframes. For example, data scientists can leverage dotDataPy features in their preferred ML libraries to fine-tune or adjust models, based on advanced model requirements. Inversely, data scientists can combine domain-specific features they may have created manually with dotDataPy’s AI-derived features and create a unified model that leverages both domain expertise as well as AI-derived knowledge.
The world of data science is changing at a rapid pace. AutoML platforms have made it easier and faster for data scientists to develop advanced machine learning models without the traditional manual hassles and complications associated with the process. The challenge, however, is that much of the manual work done by data scientists has, until now, still been 100% manual. Platforms like dotDataPy are providing data scientists with the opportunity to accelerate the feature engineering to provide data scientists with broader insights and giving them the ability to deliver ML and AI models faster while still working within the Python ecosystem that is the “go-to” standard for the data science community.