動画一覧 - Data School - 機械学習のまとめ Data Schoolの動画一覧です。 https://ml.streamdb.net/videos-rss/c/UCnVzApLJE2ljPZSeQylSEyg Tue, 23 Apr 24 03:06:31 +0900 How to read the scikit-learn documentation https://ml.streamdb.net/timelines/v/vUSFLs8w_dg Tue, 23 Apr 24 03:06:31 +0900 00:00:00 Introduction 00:00:16 API reference 00:01:11 Class documentation 00:03:37 User Guide 00:04:24 Examples 00:05:22 Glossary 00:05:47 Summary How to read the scikit-learn documentation In order to become truly proficient with scikit-learn, you need to be able to read the documentation. In this video, I'll walk you through the five main pages and page types that you need to be familiar with: - API reference: List of classes and functions in each module - Class documentation: Detailed view of a class - User Guide: Advice for proper usage of a class or function - Examples: More complex usage examples - Glossary: Definitions of important terms This is one of 149 video lessons from my upcoming course, "Master Machine Learning with scikit-learn." Join the waitlist to get the best possible price when it launches: https://mldreamjob.com For all paid courses, I offer location-based discounts (up to 85%) to people in 160+ countries. Check your discount here: https://courses.dataschool.io/discounts Enroll in a free Data Science course here: https://courses.dataschool.io/free-courses 0:00 Introduction 0:16 API reference 1:11 Class documentation 3:37 User Guide 4:24 Examples 5:22 Glossary 5:47 Summary #python #machine learning #scikit-learn #python #machine learning #scikit-learn My top 50 scikit-learn tips https://ml.streamdb.net/timelines/v/WkqM0ndr42c Thu, 20 Apr 23 23:56:43 +0900 My top 50 scikit-learn tips If you already know the basics of scikit-learn, but you want to be more efficient and get up-to-date with the latest features, then THIS is the video for you. My name is Kevin Markham, and I've been teaching Machine Learning in Python with scikit-learn for more than 8 years. Over the next 3 hours, I'm going to share with you my top 50 scikit-learn tips. Each tip ranges from 2 to 8 minutes, and you can use the timestamp links below to skip along if you're already familiar with a particular tip. 👩‍💻 Code: https://github.com/justmarkham/scikit-learn-tips 🤖 Learn ML from me: https://courses.dataschool.io/ml-courses 💌 Weekly Data Science tips: https://tuesday.tips/ 50 TIPS: 0:00 - Introduction 1:03 - 1. Transform data with ColumnTransformer 4:19 - 2. Seven ways to select columns 8:18 - 3. "fit" vs "transform" 10:53 - 4. Don't use "fit" on new data! 15:05 - 5. Don't use pandas for preprocessing! 19:00 - 6. Encode categorical features 24:07 - 7. Handle new categories in testing data 27:16 - 8. Chain steps with Pipeline 30:19 - 9. Encode "missingness" as a feature 33:12 - 10. Why set a random state? 35:40 - 11. Better ways to impute missing values 41:22 - 12. Pipeline vs make_pipeline 44:08 - 13. Inspect a Pipeline 47:03 - 14. Handle missing values automatically 49:47 - 15. Don't drop the first categorical level 54:15 - 16. Tune a Pipeline 1:01:09 - 17. Randomized search vs grid search 1:05:42 - 18. Examine grid search results 1:08:10 - 19. Logistic regression tuning parameters 1:12:41 - 20. Plot a confusion matrix 1:15:37 - 21. Plot multiple ROC curves 1:17:21 - 22. Use the correct Pipeline methods 1:18:59 - 23. Access model coefficients 1:20:11 - 24. Visualize a decision tree 1:23:57 - 25. Improve a decision tree by pruning it 1:25:23 - 26. Use stratified sampling when splitting data 1:29:40 - 27. Impute missing values for categoricals 1:32:10 - 28. Save a model or Pipeline 1:33:47 - 29. Add multiple text columns to a model 1:35:35 - 30. More ways to inspect a Pipeline 1:37:28 - 31. Know when shuffling is required 1:42:32 - 32. Use AUC with multiclass problems 1:46:04 - 33. Create custom features with scikit-learn 1:50:03 - 34. Automate feature selection 1:52:24 - 35. Use pandas objects with scikit-learn 1:53:37 - 36. Pass parameters as keyword arguments 1:55:23 - 37. Create an interactive Pipeline diagram 1:57:22 - 38. Get the names of transformed features 1:59:32 - 39. Load a toy dataset into pandas 2:01:33 - 40. View all model parameters 2:03:00 - 41. Encode binary features 2:06:59 - 42. Column selection tricks 2:10:02 - 43. Save time when encoding categoricals 2:16:53 - 44. Speed up a grid search 2:19:01 - 45. Create feature interactions 2:23:00 - 46. Ensemble multiple models 2:27:23 - 47. Tune an ensemble 2:31:22 - 48. Run part of a Pipeline 2:34:52 - 49. Tune multiple models at once 2:39:50 - 50. Solve many ML problems with one solution #python #data science #machine learning #scikit-learn #python #data science #machine learning #scikit-learn 21 more pandas tricks https://ml.streamdb.net/timelines/v/tWFQqaRtSQA Fri, 13 May 22 01:20:41 +0900 21 more pandas tricks You're about to learn 21 tricks that will help you to work faster, write better pandas code, and impress your friends. These are the BEST tricks that I couldn't fit into my FIRST tricks video! 📔 JUPYTER NOTEBOOK: https://nbviewer.org/github/justmarkham/pandas-videos/blob/master/21_more_pandas_tricks.ipynb 🔥 MY TOP 25 PANDAS TRICKS: https://www.youtube.com/watch?v=RlIiVeig3hc&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=35 🐼 MORE PANDAS VIDEOS: https://www.youtube.com/playlist?list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y TRICKS: 0:00 Introduction 0:36 1. Check for equality 1:27 2. Check for equality (alternative) 2:38 3. Use NumPy without importing NumPy 3:42 4. Calculate memory usage 4:10 5. Count the number of words in a column 4:45 6. Convert one set of values to another 6:59 7. Convert continuous data into categorical data (alternative) 8:05 8. Create a cross-tabulation 8:55 9. Create a datetime column from multiple columns 9:34 10. Resample a datetime column 11:07 11. Read and write from compressed files 12:10 12. Fill missing values using interpolation 12:45 13. Check for duplicate merge keys 13:50 14. Transpose a wide DataFrame 14:47 15. Create an example DataFrame (alternative) 16:06 16. Identify rows that are missing from a DataFrame 17:09 17. Use query to avoid intermediate variables 19:06 18. Reshape a DataFrame from wide format to long format 21:19 19. Reverse row order (alternative) 22:25 20. Reverse column order (alternative) 23:21 21. Split a string into multiple columns (alternative) NOTE: Tricks 3 and 15 were deprecated in pandas 1.0 LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #pandas #data analysis #data science #python #pandas #data analysis #data science Adapt this pattern to solve many Machine Learning problems https://ml.streamdb.net/timelines/v/gd-TZut-oto Fri, 29 Oct 21 01:16:39 +0900 Adapt this pattern to solve many Machine Learning problems Here's a simple pattern that can be adapted to solve many ML problems. It has plenty of shortcomings, but can work surprisingly well as-is! Shortcomings include: - Assumes all columns have proper data types - May include irrelevant or improper features - Does not handle text or date columns well - Does not include feature engineering - Ordinal encoding may be better - Other imputation strategies may be better - Numeric features may not need scaling - A different model may be better - And so on... Want to watch all 50 scikit-learn tips? Enroll in my FREE online course: 👉 https://courses.dataschool.io/scikit-learn-tips 👈 Tips mentioned in this video: Tip 1: https://www.youtube.com/watch?v=NGq8wnH5VSo&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=1 Tip 2: https://www.youtube.com/watch?v=sCt4LVD5hPc&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=2 Tip 6: https://www.youtube.com/watch?v=0w78CHM_ubM&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=6 Tip 7: https://www.youtube.com/watch?v=bA6mYC1a_Eg&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=7 Tip 9: https://www.youtube.com/watch?v=DKmDJJzayZw&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=9 Tip 11: https://www.youtube.com/watch?v=m_qKhnaYZlc&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=11 Tip 16: https://www.youtube.com/watch?v=f_xB7kbZR_g&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=16 Tip 27: https://www.youtube.com/watch?v=k3KrhjvaCq0&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=27 Tip 43: https://www.youtube.com/watch?v=n_x40CdPZss&list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6&index=43 === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Tune multiple models simultaneously with GridSearchCV https://ml.streamdb.net/timelines/v/v2QpvCJ1ar8 Tue, 26 Oct 21 22:53:24 +0900 Tune multiple models simultaneously with GridSearchCV You can tune 2+ models using the same grid search! Here's how: 1. Create multiple parameter dictionaries 2. Specify the model within each dictionary 3. Put the dictionaries in a list 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Access part of a Pipeline using slicing https://ml.streamdb.net/timelines/v/sMlsd2CnIf4 Thu, 21 Oct 21 23:49:56 +0900 Access part of a Pipeline using slicing Want to operate on part of a Pipeline (instead of the whole thing)? Slice it using Python's slicing notation! 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Tune the parameters of a VotingClassifer or VotingRegressor https://ml.streamdb.net/timelines/v/fvY3InlnOh8 Tue, 19 Oct 21 22:49:42 +0900 Tune the parameters of a VotingClassifer or VotingRegressor Want to improve the accuracy of your VotingClassifier? Try tuning the 'voting' and 'weights' parameters to change how predictions are combined! P.S. If you're using VotingRegressor, just tune the 'weights' parameter 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Ensemble multiple models using VotingClassifer or VotingRegressor https://ml.streamdb.net/timelines/v/2lq2k6J3GW4 Thu, 14 Oct 21 20:40:31 +0900 Ensemble multiple models using VotingClassifer or VotingRegressor Want to improve your classifier's accuracy? Create multiple models and ensemble them using VotingClassifier! P.S. VotingRegressor is also available 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Create feature interactions using PolynomialFeatures https://ml.streamdb.net/timelines/v/unP3rCfzROk Wed, 13 Oct 21 00:26:05 +0900 Create feature interactions using PolynomialFeatures Want to include "feature interactions" in your model? Use PolynomialFeatures! P.S. This is impractical if you have lots of features, and unnecessary if you're using a tree-based model. 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Speed up GridSearchCV using parallel processing https://ml.streamdb.net/timelines/v/QqFGKVieywY Thu, 07 Oct 21 22:19:28 +0900 Speed up GridSearchCV using parallel processing Want your grid search to run faster? Set n_jobs=-1 to use parallel processing with all CPUs! 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Use OrdinalEncoder instead of OneHotEncoder with tree-based models https://ml.streamdb.net/timelines/v/n_x40CdPZss Wed, 06 Oct 21 01:25:13 +0900 Use OrdinalEncoder instead of OneHotEncoder with tree-based models With a tree-based model, try OrdinalEncoder instead of OneHotEncoder even for nominal (unordered) features. Accuracy will often be similar, but OrdinalEncoder will be much faster! 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Passthrough some columns and drop others in a ColumnTransformer https://ml.streamdb.net/timelines/v/vHGRXuOtFnE Fri, 01 Oct 21 00:29:47 +0900 Passthrough some columns and drop others in a ColumnTransformer In a ColumnTransformer, you can use the strings 'passthrough' and 'drop' in place of a transformer. Useful if you need to passthrough some columns and drop others! 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Drop the first category from binary features (only) with OneHotEncoder https://ml.streamdb.net/timelines/v/6EtfLjKhIec Wed, 29 Sep 21 03:36:41 +0900 Drop the first category from binary features (only) with OneHotEncoder New in version 0.23: Use drop='if_binary' with OneHotEncoder to drop the first category ONLY if it's a binary feature (meaning it has exactly two categories). Note: Beginning in scikit-learn 1.0, drop='first' and drop='if_binary' can both be used with handle_unknown='ignore'. However, the dropped category and an unknown category will both be encoded as all zeros. 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Estimators only print parameters that have been changed https://ml.streamdb.net/timelines/v/9MW6Vpzbock Fri, 24 Sep 21 00:38:09 +0900 Estimators only print parameters that have been changed New in version 0.23: Estimators only print the parameters that are *not* set to their default values. You can still see all parameters with get_params(), or restore the previous behavior with set_config(). 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Load a toy dataset into a DataFrame https://ml.streamdb.net/timelines/v/aMLLY9T3HPQ Wed, 22 Sep 21 01:13:53 +0900 Load a toy dataset into a DataFrame New in version 0.23: Need to load a toy dataset into a DataFrame, including column names? Set as_frame=True. Want features and target as separate objects? Also set return_X_y=True. 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #pandas #python #machine learning #scikit-learn #data science #pandas Get the feature names output by a ColumnTransformer https://ml.streamdb.net/timelines/v/NxLfpcfGzns Fri, 17 Sep 21 00:28:08 +0900 Get the feature names output by a ColumnTransformer Need to get the feature names output by a ColumnTransformer? Use get_feature_names(), which now works with "passthrough" columns (new in version 0.23)! 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Create an interactive diagram of a Pipeline in Jupyter https://ml.streamdb.net/timelines/v/_UKYxucD1Io Wed, 15 Sep 21 00:26:39 +0900 Create an interactive diagram of a Pipeline in Jupyter New in version 0.23: Create interactive diagrams of Pipelines (and other estimators) in Jupyter! Click on any element to see more details. You can even export the diagram to an HTML file! 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Most parameters should be passed as keyword arguments https://ml.streamdb.net/timelines/v/oIcS_pvNtpo Fri, 10 Sep 21 00:49:27 +0900 Most parameters should be passed as keyword arguments New in version 0.23: Most parameters are now expected to be passed as keyword arguments. They will raise a warning ⚠️ if passed positionally, and will error 🛑 starting in 0.25. Note: scikit-learn 0.25 has been renamed to scikit-learn 1.0. 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Don't use .values when passing a pandas object to scikit-learn https://ml.streamdb.net/timelines/v/f3HIw8o21Ao Tue, 07 Sep 21 23:37:10 +0900 Don't use .values when passing a pandas object to scikit-learn There's no need to use ".values" when passing a DataFrame or Series to scikit-learn... it knows how to access the underlying NumPy array! 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Add feature selection to a Pipeline https://ml.streamdb.net/timelines/v/BMBVwV8iarc Thu, 02 Sep 21 19:28:15 +0900 Add feature selection to a Pipeline It's simple to add feature selection to a Pipeline: 1. Use SelectPercentile to keep the highest scoring features 2. Add feature selection after preprocessing but before model building P.S. Make sure to tune the percentile value! 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Use FunctionTransformer to convert functions into transformers https://ml.streamdb.net/timelines/v/s1gL82BxKos Tue, 31 Aug 21 20:48:09 +0900 Use FunctionTransformer to convert functions into transformers Want to do feature engineering within a ColumnTransformer or Pipeline? 1. Select an existing function (or write your own) 2. Convert it into a transformer using FunctionTransformer 3. 🥳 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Use AUC to evaluate multiclass problems https://ml.streamdb.net/timelines/v/-s-KdkYmCaA Thu, 26 Aug 21 19:57:09 +0900 Use AUC to evaluate multiclass problems AUC is an excellent evaluation metric for binary classification, especially if you have class imbalance. New in scikit-learn 0.22: AUC can be used with multiclass problems! Supports "one-vs-one" and "one-vs-rest" strategies. 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Shuffle your dataset when using cross_val_score https://ml.streamdb.net/timelines/v/Ld8-_WP0G90 Tue, 24 Aug 21 23:40:47 +0900 Shuffle your dataset when using cross_val_score If you use cross-validation and your samples are NOT in an arbitrary order, shuffling may be required to get meaningful results. Use KFold or StratifiedKFold in order to shuffle! 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Four ways to examine the steps of a Pipeline https://ml.streamdb.net/timelines/v/IhUID_sD3hE Thu, 19 Aug 21 18:14:48 +0900 Four ways to examine the steps of a Pipeline There are FOUR ways to examine the steps of a Pipeline! (I prefer method 1 since you can autocomplete the step & parameter names... but method 4 is SO short!) 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Vectorize two text columns in a ColumnTransformer https://ml.streamdb.net/timelines/v/HyP5MvlmbRc Tue, 17 Aug 21 19:24:42 +0900 Vectorize two text columns in a ColumnTransformer Want to vectorize two text columns in a ColumnTransformer? You can't pass them in a list, but you can pass the vectorizer twice! (They'll learn separate vocabularies.) 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Save a model or Pipeline using joblib https://ml.streamdb.net/timelines/v/L5OVCoAemAk Thu, 12 Aug 21 19:00:17 +0900 Save a model or Pipeline using joblib Want to save a model (or pipeline) for later use? Use joblib! Warning: You must load it into an identical environment, and only load objects you trust 😇 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Two ways to impute missing values for a categorical feature https://ml.streamdb.net/timelines/v/k3KrhjvaCq0 Tue, 10 Aug 21 23:46:39 +0900 Two ways to impute missing values for a categorical feature Need to impute missing values for a categorical feature? Two options: 1. Impute the most frequent value 2. Impute the value "missing", which treats it as a separate category 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Use stratified sampling with train_test_split https://ml.streamdb.net/timelines/v/Zcjl8xPLmPw Thu, 05 Aug 21 20:41:54 +0900 Use stratified sampling with train_test_split Are you using train_test_split with a classification problem? Be sure to set "stratify=y" so that class proportions are preserved when splitting. Especially important if you have class imbalance! 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Prune a decision tree to avoid overfitting https://ml.streamdb.net/timelines/v/ioQ2Ahi-I_M Tue, 03 Aug 21 21:08:14 +0900 Prune a decision tree to avoid overfitting New in scikit-learn 0.22: Pruning of decision trees to avoid overfitting! - Uses cost-complexity pruning - Increase "ccp_alpha" to increase pruning (default value is 0) 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science Visualize a decision tree two different ways https://ml.streamdb.net/timelines/v/EMcNjJ6Gj8w Thu, 29 Jul 21 20:40:48 +0900 Visualize a decision tree two different ways Two new functions in scikit-learn 0.21 for visualizing decision trees: 1. plot_tree: uses Matplotlib (not Graphviz!) 2. export_text: doesn't require any external libraries 👉 New tips every TUESDAY and THURSDAY! 👈 🎥 Watch all tips: https://www.youtube.com/playlist?list=PL5-da3qGB5ID7YYAqireYEew2mWVvgmj6 🗒️ Code for all tips: https://github.com/justmarkham/scikit-learn-tips 💌 Get tips via email: https://scikit-learn.tips === WANT TO GET BETTER AT MACHINE LEARNING? === 1) LEARN THE FUNDAMENTALS in my intro course (free!): https://courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn 2) BUILD YOUR ML CONFIDENCE in my intermediate course: https://courses.dataschool.io/building-an-effective-machine-learning-workflow-with-scikit-learn 3) LET'S CONNECT! - Newsletter: https://www.dataschool.io/subscribe/ - Twitter: https://twitter.com/justmarkham - Facebook: https://www.facebook.com/DataScienceSchool/ - LinkedIn: https://www.linkedin.com/in/justmarkham/ #python #machine learning #scikit-learn #data science #python #machine learning #scikit-learn #data science