- R tutorial: Introducing out-of-sample error measures

R tutorial: Introducing out-of-sample error measures

Learn more about machine learning with R: https://www.datacamp.com/courses/machine-learning-toolbox

Hi! I'm Zach Deane Mayer, and I'm one of the co-authors of the caret package. I have a passion for data science, and spend most of my time working on and thinking about problems in machine learnin...
Learn more about machine learning with R: https://www.datacamp.com/courses/machine-learning-toolbox

Hi! I'm Zach Deane Mayer, and I'm one of the co-authors of the caret package. I have a passion for data science, and spend most of my time working on and thinking about problems in machine learning.

This course focuses on predictive, rather than explanatory modeling. We want models that do not overfit the training data and generalize well. In other words, our primary concern when modeling is "do the models perform well on new data?"

The best way to answer this question is to test the models on new data. This simulates real world experience, in which you fit on one dataset, and then predict on new data, where you do not actually know the outcome.

Simulating this experience with a train/test split helps you make an honest assessment of yourself as a modeler.

This is one of the key insights of machine learning: error metrics should be computed on new data, because in-sample validation (or predicting on your training data) essentially guarantees overfitting.

Out-of-sample validation helps you choose models that will continue to perform well in the future.

This is the primary goal of the caret package in general and this course specifically: don’t overfit. Pick models that perform well on new data.

Let's walk through a simple example of out-of-sample validation: We start with a linear regression model, fit on the first 20 rows of the mtcars dataset.

Next, we make predictions with this model on a NEW dataset: the last 12 observations of the mtcars dataset. The 12 cars in this test set will not be used to determine the coefficents of the linear regression model, and are therefore a good test of how well we can predict on new data.

In practice, rather than manually splitting the dataset, we'd actually use the createResamples or createFolds function in caret, but the manual split simplifies this example.

Finally, we calculate root-mean-squared-error (or RMSE) on the test set by comparing the predictions from our model to the actual MPG values for the test set.

RMSE is a measure of the model's average error. It has the same units as the test set, so this means our model is off by 5 to 6 miles per gallon, on average.

Compared to in-sample RMSE from a model fit on the full dataset, our model is signifigantly worse.

If we had used in-sample error, we would have fooled ourselves into thinking our model is much better than it actually is in reality.

It's hard to make predictions on new data, as this example shows. Out-of-sample error helps account for this fact, so we can focus on models that predict things we don't already know.

Let's practice this concept on some example data.

#rstats #r programming #data science #data analysis #learn r #r tutorial #data #big data #r for data science #r for data analysis #data science tutorial #data analysis tutorial #caret #machine learning with r

DataCamp

※本サイトに掲載されているチャンネル情報や動画情報はYouTube公式のAPIを使って取得・表示しています。

Timetable

動画タイムテーブル

動画数:1659件

Introduction - Working with the OpenAI API | How to Build Your Own AI Tools

Introduction

Working with the OpenAI API | How to Build Your Own AI Tools
2024年04月19日 
00:00:00 - 00:00:39
What is OpenAI, ChatGPT, and the OpenAI API? - Working with the OpenAI API | How to Build Your Own AI Tools

What is OpenAI, ChatGPT, and the OpenAI API?

Working with the OpenAI API | How to Build Your Own AI Tools
2024年04月19日 
00:00:39 - 00:01:28
What is an API? - Working with the OpenAI API | How to Build Your Own AI Tools

What is an API?

Working with the OpenAI API | How to Build Your Own AI Tools
2024年04月19日 
00:01:28 - 00:02:36
Using the OpenAI API vs. the web interface - Working with the OpenAI API | How to Build Your Own AI Tools

Using the OpenAI API vs. the web interface

Working with the OpenAI API | How to Build Your Own AI Tools
2024年04月19日 
00:02:36 - 00:03:09
Why use the OpenAI API? - Working with the OpenAI API | How to Build Your Own AI Tools

Why use the OpenAI API?

Working with the OpenAI API | How to Build Your Own AI Tools
2024年04月19日 
00:03:09 - 00:04:12
- Introduction - The Future of AI | What Comes Next For Generative AI Models?

- Introduction

The Future of AI | What Comes Next For Generative AI Models?
2024年04月19日 
00:00:00 - 00:00:13
- What performance improvements will we see in generative AI models? - The Future of AI | What Comes Next For Generative AI Models?

- What performance improvements will we see in generative AI models?

The Future of AI | What Comes Next For Generative AI Models?
2024年04月19日 
00:00:13 - 00:00:46
- What will drive LLM improvements? - The Future of AI | What Comes Next For Generative AI Models?

- What will drive LLM improvements?

The Future of AI | What Comes Next For Generative AI Models?
2024年04月19日 
00:00:46 - 00:01:47
- The challenges in improving LLM performance - The Future of AI | What Comes Next For Generative AI Models?

- The challenges in improving LLM performance

The Future of AI | What Comes Next For Generative AI Models?
2024年04月19日 
00:01:47 - 00:02:41
- Transitioning from generalized to specialized models - The Future of AI | What Comes Next For Generative AI Models?

- Transitioning from generalized to specialized models

The Future of AI | What Comes Next For Generative AI Models?
2024年04月19日 
00:02:41 - 00:03:16
- Other types of generative AI models that will shape the future - The Future of AI | What Comes Next For Generative AI Models?

- Other types of generative AI models that will shape the future

The Future of AI | What Comes Next For Generative AI Models?
2024年04月19日 
00:03:16 - 00:04:17