[AI Story] To the AI-seeking manufacturers: how much data do I need?

May 16, 2022

AI technology enables high precision and efficiency of repetitive tasks to maximize users’ convenience. In manufacturing, AI is commonly used in smart factories to upgrade and automate processes. As a result, interest of deep learning-based artificial intelligence has also been growing.

However, with this increasing trend and growing interest of industrial AI, there are also many unknowns to implementing AI models to boost productivity. RTM will help define some challenges and provide guidance on how you can achieve performance

The first question you need to ask, as a company or team looking to implement an AI solution is: how much of what data is needed to develop and implement a successful AI solution?

All data consultants and research will answer, “It depends.” This is true, but also not helpful.

Therefore, we will explore examples from our experience in the high-tech manufacturing sector to provide a more understandable answer to the question.

What type of data do I need?

Typically, data in the manufacturing industry can be categorized into image data and time-series data.

Case 1

AI time-series model to detect faulty plasma process equipment from ‘A’ semiconductor and display equipment manufacturer

Case 2

AI image model to increase accuracy and efficiency of x-ray visual inspection from ‘B’ semiconductor manufacturer

Case 3

AI model to replace visual inspection to detect product anomalies from ‘C’ semiconductor manufacturer

Conclusion

Through these cases, we were able to determine some key insights regarding the relationship between the performance of a model and the acquired dataset volume.

1. Data volume is directly correlated to the quality of the AI model

High volume manufacturing, like semiconductor production require extensive amounts of data in order to consider all data parameters and process uncertainties. Our experience suggests in the 10K increments to yield 97~99% model performance – as demonstrated in Case 1 & Case 3.

On the other hand, visual inspections with image data tend to have more defined parameters and dimensions. As such process data with lesser uncertainty, 2-3K can be sufficient to yield model performance over 90% – as demonstrated in Case 2.

2. Data labeling is a difficult and sensitive, yet crucial task

In order to obtain accurate, reliable, and high-quality labeled data, Data Scientists must collaborate with process experts. At RTM, we ensure properly labeled data through continued partnerships with industry-leading domain experts.

3. Starting with small data volume can improve model performance

Training a model with reliable and high-quality data is most important. Therefore, using small datasets of quality data for the initial model can not only increase performance, but also save time.

4. Model performance depends on how you process and train your data

Quality high dimensional time-series data is often difficult to obtain or procure. Therefore, the general practice to overcome this data imbalance challenge is to train raw data via deep learning model. However, RTM’s data processing techniques demonstrate higher model performance by at least 10% than traditional approaches.

Request consultation

For more information, request a professional consultation session with our RTM specialist.

Contact

Research Lab

10F Kwangsung Bldg, 11, Yeoksam-ro 3-gil, Gangnam-gu, Seoul, Republic of Korea

Headquarter

7F Kwangsung Bldg, 11, Yeoksam-ro 3-gil, Gangnam-gu, Seoul, Republic of Korea

Email

admin@rtm.ai

Tel

02.2088.6780

Fax

070.7543.6780

© 2022 RTM. All Rights Reserved.

Research Lab Youtube Facebook

Research Lab

10F Kwangsung Bldg, 11, Yeoksam-ro 3-gil, Gangnam-gu, Seoul, Republic of Korea

Headquarter

7F Kwangsung Bldg, 11, Yeoksam-ro 3-gil, Gangnam-gu, Seoul, Republic of Korea

Email

Tel

Fax

admin@rtm.ai

02.2088.6780

070.7543.6780

© 2022 RTM. All Rights Reserved.