Demystifying Feature Engineering for Machine Learning
  • プロダクト
    • dotDataとは?
    • AutoML 2.0とは?
    • dotDataが選ばれる理由
    • dotData Cloud
    • dotData Enterprise
    • dotData Py
    • dotData Stream
  • ソリューション
    • 業界別
      • 銀行
      • 保険
      • 製造
      • 小売
      • 製薬
      • 通信
    • 役割別
      • BI & データアナリスト
      • データサイエンティスト
      • 経営層
      • IT&ソフトウェア
    • 価値別
      • 加速
      • 民主化
      • 拡張・強化
      • 業務適用
  • ニュース関連
    • プレスリリース
    • 掲載記事
  • 会社情報
    • 会社情報
    • お問い合わせ
    • 経営陣
  • ブログ
  • USAサイト
  • プロダクト
    • dotDataとは?
    • AutoML 2.0とは?
    • dotDataが選ばれる理由
    • dotData Cloud
    • dotData Enterprise
    • dotData Py
    • dotData Stream
  • ソリューション
    • 業界別
      • 銀行
      • 保険
      • 製造
      • 小売
      • 製薬
      • 通信
    • 役割別
      • BI & データアナリスト
      • データサイエンティスト
      • 経営層
      • IT&ソフトウェア
    • 価値別
      • 加速
      • 民主化
      • 拡張・強化
      • 業務適用
  • ニュース関連
    • プレスリリース
    • 掲載記事
  • 会社情報
    • 会社情報
    • お問い合わせ
    • 経営陣
  • ブログ
  • USAサイト
お問い合わせ

  • Sachin Andhare
  • Blog
  • June 4, 2020

Demystifying Feature Engineering for Machine Learning

What is Feature Engineering

Let’s say you are addressing a complex business problem such as predicting customer churn or forecasting product demand using applied machine learning. Assuming a team is in place and the business case identified, where do you start? The first step is to collect the relevant data to train the machine learning (ML) algorithms. This is usually followed by the selection of the appropriate algorithm or ensemble of algorithms. Choosing the right algorithm depends on the business goals (Accuracy vs Interpretability), category of the problem (Regression or Classification), nature of data (Categorical or Numerical), desired outcome, and constraints (computational resources, training time, latency). Irrespective of the choice of algorithm, whether it is logistic regression, decision tree, boosting, or neural networks, there is a fundamental requirement of providing high-quality input data containing relevant business hypotheses and historical patterns aka Feature Engineering (FE). Often the algorithms get all the limelight and many people believe that algorithms are the secret weapons in the AI battle. But it is FE that performs the magic behind machine learning.

FE is the process of applying domain knowledge to extract analytical representations from raw data, making it ready for machine learning. It involves the application of business knowledge, mathematics, and statistics to transform data into a format that can be directly consumed by machine learning models. It starts from many tables spread across disparate databases that are then joined, aggregated, and combined into a single flat table using statistical transformations and/or relational operations.

Feature Engineering

Enterprise data to ML ready data using AI-powered Feature Engineering

Practical FE is far more complicated than simple transformation exercises such as One-Hot Encoding. To implement FE, you need to write hundreds or even thousands of SQL-like queries, performing a lot of data manipulation, as well as a multitude of statistical transformations.

The Significance of Feature Engineering

ML is driven by algorithms and the algorithms are dependent on data. If you know the historical data, you can detect the pattern. Once you uncover a pattern, you can build a hypothesis. Based on the hypothesis, you can predict the likely outcome such as which customers are likely to churn in a given time period. FE is all about finding the optimal combination of hypotheses.

FE is critical because if you provide the wrong hypotheses as an input, ML cannot make accurate predictions. The quality of any provided hypothesis is vital for the success of an ML model. Quality of feature is critically important from accuracy and interpretability point of view. FE is the most iterative, time-consuming, and resource-intensive process, involving interdisciplinary expertise. It requires technical knowledge but, more importantly, domain knowledge. The data science team builds features by working with domain experts, testing hypotheses, building and evaluating ML models, and repeating the process until the results become acceptable for businesses.

Feature Engineering Automation

FE automation has vast potential to change the traditional data science process. It significantly lowers skill barriers beyond ML automation alone, eliminating hundreds or even thousands of manually-crafted SQL queries, and ramps up the speed of the data science project even without a full light of domain knowledge. It also augments our data insights and delivers “unknown- unknowns” based on the ability to explore millions of feature hypotheses just in hours.

These days automated machine learning (AutoML) is gathering a lot of attention. AutoML is tackling one of the critical challenges that organizations struggle with: the sheer length of the AI and ML project, which usually takes months to complete, and the incredible lack of qualified talent available to handle it. While current AutoML products have undoubtedly made significant inroads in accelerating the AI and machine learning process, they fail to address the most significant step, the process to prepare the input of machine learning from raw business data, in other words, feature engineering.

To create a genuine shift in how modern organizations leverage AI and machine learning, the full cycle of data science development must involve automation. If the problems at the heart of data science automation are due to lack of data scientists, poor understanding of ML from business users, and difficulties in migrating to production environments, then these are the challenges that AutoML must also resolve.

AutoML 2.0, which automates the data and feature engineering, is streamlining FE automation and ML automation as a single pipeline and one-stop-shop. With AutoML 2.0, the full-cycle from raw data through data and feature engineering through ML model development takes days, not months, and a team can deliver 10x more projects.

Summary

Contrary to popular belief, algorithms are not the most distinguishing features of applied machine learning. FE influences the performance and accuracy of ML models more than anything else. It helps reveal the hidden patterns in the data and increases the predictive power of machine learning. In order for ML algorithms to work properly, you need to provide the right input data that algorithms can understand. Oftentimes this involves complex mathematical transformations on raw data. FE provides that input data into a single aggregated format optimized for ML. It is the secret sauce that enables AI/ML to do the magic. Whether it is preventing fraud in financial services, anomaly detection in manufacturing, or predicting customer churn for insurance companies, feature engineering is the most decisive factor for AI/ML success or failure.

Read More
  • Ryohei Fujimaki, PhD.
  • Blog
  • April 30, 2020

The Insurance Brain: AI-Driven Policy Recommendations

This article was originally posted February 18, 2020 on Forbes Cognitive World – AI Contributor Group.  dotData’s Founder and CEO – Ryohei Fujimaki, PhD was an interviewed contributor for this important information share.  

 

In today’s competitive global insurance market, insurers are striving to create new ways to successfully overcome two important and opposing forces: creating short-term revenue growth for the company, while also meeting customers’ needs for product offerings and services that are personalized, relevant, and provide long-term value.

In meeting these challenges, insurers realize the strategic importance of their data, and how AI and machine learning (ML) can help them better achieve their business goals. But while investments in AI are growing, challenges in resources, technology infrastructure, and the ability to operationalize models quickly and efficiently can prevent insurers from fully leveraging AI and data science to drive business impact.

These were some of the challenges faced by leading global insurance company MS&AD Insurance Group Holdings (MS&AD), and the impetus behind its development of an innovative solution that leverages AutoML 2.0 to optimize their data science investments. This fully automated data science platform enables MS&AD agents to create proposals to reflect the current and potential insurance needs of their customers, leading to increased revenue, as well as customer loyalty and satisfaction.

MS&AD, Innovation Through Technology

MS&AD Insurance Group Holdings is the world’s fifth-largest property and casualty insurance company with $50 billion in revenue. MS&AD has been a leading innovator in leveraging digital transformation to advance the insurance business. One of the critical goals in their digital transformation journey is to optimize customer value and utilization of its products and services.

The idea of using digital transformation to enhance customer experiences led to the development of a strategic digital platform, called MS1 Brain. The MS1 Brain platform utilizes AI and machine learning to analyze available customer data, such as contract details and history, accident information, and lifestyle changes, to predict customer needs and recommend the best products and services to meet those needs and drive long-term value. The platform also helps generate targeted customer communications, including personalized videos on products and services created to meet the specific needs of each customer.

MS1 Brain was created for MSI, an affiliate of MS&AD, to enable MSI’s agents to create personalized, data-driven proposals tailored to consumer needs. The platform also needed to be easy to use so that agents could generate proposals and leverage data without prior data expertise.

Challenges Along the Way to Innovation

MS1 Brain utilizes many AI models for its intelligent predictions and decisions. MS&AD faced challenges in scaling their data science practice and building the MS1 Brain – both because of the difficulty involved in creating effective machine learning models with transparency of AI-decisions, as well as in finding the right level and skill of talent. Adding to this dual-challenge was a very aggressive timeline for the development of MS1 Brain.

As a solution to this challenge, the business and innovation team at MS&AD identified automated machine learning (AutoML) as a critical accelerator to meet the development timeline of MS1 Brain. In particular, MS&AD found it particularly important to automate the feature engineering process, which is often the most manual and time-consuming part of data science projects, as well as to automate machine learning (a.k.a. AutoML 2.0).

AutoML 2.0: The Foundation of MS1 Brain

The primary motivation for choosing an AutoML 2.0 platform focused on three core areas: acceleration, augmentation, and democratization. Acceleration was essential to develop many AI models for MS1 Brain within a short timeframe and allows MS&AD to explore 10x more use cases and to build accurate models for production quickly. AutoML 2.0 explores millions of features and hundreds of ML models based on raw business data consisting of various raw relational and transactional data with billions of records just in hours. Augmentation was also critical as a direct output of automated feature engineering in AutoML 2.0. Through the features automatically designed by the platform, MS&AD discovered many deep business insights that provide explainability of AI-recommendations and also are useful to improve their services to meet the customers’ needs. Democratization was the third critical component. Beyond MS1 Brain, MS&AD needed to establish scalable and sustainable AI and ML capabilities. With the AutoML 2.0 platform, even business analysts could perform the end-to-end data science process with neither SQL/Python coding nor knowledge of sophisticated statistical and mathematical formulas.

Additionally, by augmenting the AutoML 2.0 solution in MS1 Brain with advanced automated video generation technologies, MS&AD was able to create a system that automatically analyzes customer data, and provides personalized video-based recommendations for products and services to its customers. This new capability has enabled MS&AD to optimize customer value, increase utilization of its products and services, and drive additional revenue growth.

Accelerating Business Innovation

While data science is becoming a valuable tool in the insurance industry, deriving value from AI and machine learning initiatives can be challenging. As with MS&AD, organizations that embrace new data science automation technologies will benefit from streamlined processes, greater transparency, and deeper insights to help drive short-term revenue growth while exceeding customer demands for long-term value. As a result, insurance organizations can rapidly scale their AI/ML initiatives to drive transformative business changes.

Read More
  • Carl Bowen
  • Media
  • February 19, 2020

The Insurance Brain: AI-Driven Policy Recommendations

Our CEO, Ryohei Fujimaki, PhD, recently discussed an insurance use case now published on Forbes Cognitive World titled – The Insurance Brain: AI-Driven Policy Recommendations.

Excerpt: “When MS&AD Insurance Group Holdings needed to create an AI-driven recommendation system for their MSI subsidiary, they turned to AutoML 2.0 to help drive results and achieve their goals…”

Read More

Recent Posts

  • AIで保険業界の保険解約率を削減
  • AutoMLの普及は、データサイエンティスト時代の終わりを意味するか?
  • NECとdotData、SaaS型クラウドサービス「dotDataCloud」を日本で販売開始
  • dotData、Amazon SageMakerを利用し、dotData StreamのMLOps機能を強化
  • dotData、Microsoft Azureへのデプロイをサポート、 Microsoft Azure Marketplaceにて提供開始 dotDataがAzure上で利用可能となり、 Azureユーザーのデータサイエンスおよび機械学習プロジェクトを加速

Search

Recent Comments

    Archives

    • April 2021
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • July 2018
    • March 2018

    Categories

    • Blog
    • Events
    • Media
    • Media-JP
    • Press Releases EN
    • Press Releases JP
    • Webinars
    • White Papers

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    dotData Logo in white

    Follow us on

    About

    • プロダクト
      • dotDataとは?
      • AutoML 2.0とは?
      • dotDataが選ばれる理由
      • dotData Cloud
      • dotData Enterprise
      • dotData Py
      • dotData Stream
    • ソリューション
      • 業界別
        • 銀行
        • 保険
        • 製造
        • 小売
        • 製薬
        • 通信
      • 役割別
        • BI & データアナリスト
        • データサイエンティスト
        • 経営層
        • IT&ソフトウェア
      • 価値別
        • 加速
        • 民主化
        • 拡張・強化
        • 業務適用
    • ニュース関連
      • プレスリリース
      • 掲載記事
    • 会社情報
      • 会社情報
      • お問い合わせ
      • 経営陣
    • ブログ
    • USAサイト

    News and Events

    • プロダクト
      • dotDataとは?
      • AutoML 2.0とは?
      • dotDataが選ばれる理由
      • dotData Cloud
      • dotData Enterprise
      • dotData Py
      • dotData Stream
    • ソリューション
      • 業界別
        • 銀行
        • 保険
        • 製造
        • 小売
        • 製薬
        • 通信
      • 役割別
        • BI & データアナリスト
        • データサイエンティスト
        • 経営層
        • IT&ソフトウェア
      • 価値別
        • 加速
        • 民主化
        • 拡張・強化
        • 業務適用
    • ニュース関連
      • プレスリリース
      • 掲載記事
    • 会社情報
      • 会社情報
      • お問い合わせ
      • 経営陣
    • ブログ
    • USAサイト

    Resources

    • プロダクト
      • dotDataとは?
      • AutoML 2.0とは?
      • dotDataが選ばれる理由
      • dotData Cloud
      • dotData Enterprise
      • dotData Py
      • dotData Stream
    • ソリューション
      • 業界別
        • 銀行
        • 保険
        • 製造
        • 小売
        • 製薬
        • 通信
      • 役割別
        • BI & データアナリスト
        • データサイエンティスト
        • 経営層
        • IT&ソフトウェア
      • 価値別
        • 加速
        • 民主化
        • 拡張・強化
        • 業務適用
    • ニュース関連
      • プレスリリース
      • 掲載記事
    • 会社情報
      • 会社情報
      • お問い合わせ
      • 経営陣
    • ブログ
    • USAサイト

    • 会社概要
    • お問い合わせ
    • dotDataの経営陣