Data School

※本サイトに掲載されているチャンネル情報や動画情報はYouTube公式のAPIを使って取得・表示しています。

Videos

動画一覧

動画数:141件

My top 50 scikit-learn tips

My top 50 scikit-learn tips

If you already know the basics of scikit-learn, but you want to be more efficient and get up-to-date with the latest features, then THIS is the video for you. My name is Kevin Markham, and I've been teaching Machine Learning in Python with scikit-learn for more than 8 years. Over the next 3 hours, I'm going to share with you my top 50 scikit-learn tips. Each tip ranges from 2 to 8 minutes, and you can use the timestamp links below to skip along if you're already familiar with a particular tip. 👩‍💻 Code: https://github.com/justmarkham/scikit-learn-tips 🤖 Learn ML from me: https://courses.dataschool.io/ml-courses 💌 Weekly Data Science tips: https://tuesday.tips/ 50 TIPS: 0:00 - Introduction 1:03 - 1. Transform data with ColumnTransformer 4:19 - 2. Seven ways to select columns 8:18 - 3. "fit" vs "transform" 10:53 - 4. Don't use "fit" on new data! 15:05 - 5. Don't use pandas for preprocessing! 19:00 - 6. Encode categorical features 24:07 - 7. Handle new categories in testing data 27:16 - 8. Chain steps with Pipeline 30:19 - 9. Encode "missingness" as a feature 33:12 - 10. Why set a random state? 35:40 - 11. Better ways to impute missing values 41:22 - 12. Pipeline vs make_pipeline 44:08 - 13. Inspect a Pipeline 47:03 - 14. Handle missing values automatically 49:47 - 15. Don't drop the first categorical level 54:15 - 16. Tune a Pipeline 1:01:09 - 17. Randomized search vs grid search 1:05:42 - 18. Examine grid search results 1:08:10 - 19. Logistic regression tuning parameters 1:12:41 - 20. Plot a confusion matrix 1:15:37 - 21. Plot multiple ROC curves 1:17:21 - 22. Use the correct Pipeline methods 1:18:59 - 23. Access model coefficients 1:20:11 - 24. Visualize a decision tree 1:23:57 - 25. Improve a decision tree by pruning it 1:25:23 - 26. Use stratified sampling when splitting data 1:29:40 - 27. Impute missing values for categoricals 1:32:10 - 28. Save a model or Pipeline 1:33:47 - 29. Add multiple text columns to a model 1:35:35 - 30. More ways to inspect a Pipeline 1:37:28 - 31. Know when shuffling is required 1:42:32 - 32. Use AUC with multiclass problems 1:46:04 - 33. Create custom features with scikit-learn 1:50:03 - 34. Automate feature selection 1:52:24 - 35. Use pandas objects with scikit-learn 1:53:37 - 36. Pass parameters as keyword arguments 1:55:23 - 37. Create an interactive Pipeline diagram 1:57:22 - 38. Get the names of transformed features 1:59:32 - 39. Load a toy dataset into pandas 2:01:33 - 40. View all model parameters 2:03:00 - 41. Encode binary features 2:06:59 - 42. Column selection tricks 2:10:02 - 43. Save time when encoding categoricals 2:16:53 - 44. Speed up a grid search 2:19:01 - 45. Create feature interactions 2:23:00 - 46. Ensemble multiple models 2:27:23 - 47. Tune an ensemble 2:31:22 - 48. Run part of a Pipeline 2:34:52 - 49. Tune multiple models at once 2:39:50 - 50. Solve many ML problems with one solution #python #data science #machine learning #scikit-learn
2023年04月20日
00:00:00 - 02:47:31
21 more pandas tricks

21 more pandas tricks

You're about to learn 21 tricks that will help you to work faster, write better pandas code, and impress your friends. These are the BEST tricks that I couldn't fit into my FIRST tricks video! 📔 JUPYTER NOTEBOOK: https://nbviewer.org/github/justmarkham/pandas-videos/blob/master/21_more_pandas_tricks.ipynb 🔥 MY TOP 25 PANDAS TRICKS: https://www.youtube.com/watch?v=RlIiVeig3hc&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=35 🐼 MORE PANDAS VIDEOS: https://www.youtube.com/playlist?list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y TRICKS: 0:00 Introduction 0:36 1. Check for equality 1:27 2. Check for equality (alternative) 2:38 3. Use NumPy without importing NumPy 3:42 4. Calculate memory usage 4:10 5. Count the number of words in a column 4:45 6. Convert one set of values to another 6:59 7. Convert continuous data into categorical data (alternative) 8:05 8. Create a cross-tabulation 8:55 9. Create a datetime column from multiple columns 9:34 10. Resample a datetime column 11:07 11. Read and write from compressed files 12:10 12. Fill missing values using interpolation 12:45 13. Check for duplicate merge keys 13:50 14. Transpose a wide DataFrame 14:47 15. Create an example DataFrame (alternative) 16:06 16. Identify rows that are missing from a DataFrame 17:09 17. Use query to avoid intermediate variables 19:06 18. Reshape a DataFrame from wide format to long format 21:19 19. Reverse row order (alternative) 22:25 20. Reverse column order (alternative) 23:21 21. Split a string into multiple columns (alternative) NOTE: Tricks 3 and 15 were deprecated in pandas 1.0 LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #pandas #data analysis #data science
2022年05月13日
00:00:00 - 00:24:40
Adapt this pattern to solve many Machine Learning problems

Adapt this pattern to solve many Machine Learning problems

Here's a simple pattern that can be adapted to solve many ML problems. It has plenty of shortcomings, but can work surprisingly well as-is! Shortcomings include: - Assumes all columns have proper data types - May include irrelevant or improper features - Does not handle text or date columns well - Does not include feature engineering - Ordinal encoding may be better - Other imputation strategies may be better - Numeric features may not need scaling - A different model may be better - And so on... Want to watch all 50 scikit-learn tips? Enroll in my FREE online course: 👉 https://courses.dataschool.io/scikit-learn-tips 👈 Tips mentioned in this video: Tip 1: https://www.youtube.com/watch?v=NGq8wnH5VSo&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=1 Tip 2: https://www.youtube.com/watch?v=sCt4LVD5hPc&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=2 Tip 6: https://www.youtube.com/watch?v=0w78CHM_ubM&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=6 Tip 7: https://www.youtube.com/watch?v=bA6mYC1a_Eg&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=7 Tip 9: https://www.youtube.com/watch?v=DKmDJJzayZw&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=9 Tip 11: https://www.youtube.com/watch?v=m_qKhnaYZlc&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=11 Tip 16: https://www.youtube.com/watch?v=f_xB7kbZR_g&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=16 Tip 27: https://www.youtube.com/watch?v=k3KrhjvaCq0&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=27 Tip 43: https://www.youtube.com/watch?v=n_x40CdPZss&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=43 === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science
2021年10月29日
00:00:00 - 00:07:49
Tune multiple models simultaneously with GridSearchCV

Tune multiple models simultaneously with GridSearchCV

You can tune 2+ models using the same grid search! Here's how: 1. Create multiple parameter dictionaries 2. Specify the model within each dictionary 3. Put the dictionaries in a list 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science
2021年10月26日
00:00:00 - 00:05:07
Access part of a Pipeline using slicing

Access part of a Pipeline using slicing

Want to operate on part of a Pipeline (instead of the whole thing)? Slice it using Python's slicing notation! 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science
2021年10月21日
00:00:00 - 00:03:38
Tune the parameters of a VotingClassifer or VotingRegressor

Tune the parameters of a VotingClassifer or VotingRegressor

Want to improve the accuracy of your VotingClassifier? Try tuning the 'voting' and 'weights' parameters to change how predictions are combined! P.S. If you're using VotingRegressor, just tune the 'weights' parameter 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science
2021年10月19日
00:00:00 - 00:04:07
Ensemble multiple models using VotingClassifer or VotingRegressor

Ensemble multiple models using VotingClassifer or VotingRegressor

Want to improve your classifier's accuracy? Create multiple models and ensemble them using VotingClassifier! P.S. VotingRegressor is also available 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science
2021年10月14日
00:00:00 - 00:04:32
Create feature interactions using PolynomialFeatures

Create feature interactions using PolynomialFeatures

Want to include "feature interactions" in your model? Use PolynomialFeatures! P.S. This is impractical if you have lots of features, and unnecessary if you're using a tree-based model. 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science
2021年10月13日
00:00:00 - 00:04:08
Speed up GridSearchCV using parallel processing

Speed up GridSearchCV using parallel processing

Want your grid search to run faster? Set n_jobs=-1 to use parallel processing with all CPUs! 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science
2021年10月07日
00:00:00 - 00:02:16
Use OrdinalEncoder instead of OneHotEncoder with tree-based models

Use OrdinalEncoder instead of OneHotEncoder with tree-based models

With a tree-based model, try OrdinalEncoder instead of OneHotEncoder even for nominal (unordered) features. Accuracy will often be similar, but OrdinalEncoder will be much faster! 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science
2021年10月06日
00:00:00 - 00:06:59