- Why do some pandas commands end with parentheses (and others don't)?

Why do some pandas commands end with parentheses (and others don't)?

To access most of the functionality in pandas, you have to call the methods and attributes of DataFrame and Series objects. In this video, I'll discuss some common methods and attributes, and show you how to tell the difference between them. (Hint: It's all about the parentheses!)

SUBSCRIBE to l...
To access most of the functionality in pandas, you have to call the methods and attributes of DataFrame and Series objects. In this video, I'll discuss some common methods and attributes, and show you how to tell the difference between them. (Hint: It's all about the parentheses!)

SUBSCRIBE to learn data science with Python:
https://www.youtube.com/dataschool?sub_confirmation=1

JOIN the "Data School Insiders" community and receive exclusive rewards:
https://www.patreon.com/dataschool

== RESOURCES ==
GitHub repository for the series: https://github.com/justmarkham/pandas-videos
"describe" documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.describe.html
What is a method in Python? http://stackoverflow.com/a/35812834/1636598

== LET'S CONNECT! ==
Newsletter: https://www.dataschool.io/subscribe/
Twitter:
Facebook: https://www.facebook.com/DataScienceSchool/
LinkedIn: https://www.linkedin.com/in/justmarkham/

#python #pandas #data analysis #data science #tutorial

Data School

※本サイトに掲載されているチャンネル情報や動画情報はYouTube公式のAPIを使って取得・表示しています。

Timetable

動画タイムテーブル

動画数:141件

- Introduction - My top 50 scikit-learn tips

- Introduction

My top 50 scikit-learn tips
2023年04月20日 
00:00:00 - 00:01:03
- 1. Transform data with ColumnTransformer - My top 50 scikit-learn tips

- 1. Transform data with ColumnTransformer

My top 50 scikit-learn tips
2023年04月20日 
00:01:03 - 00:04:19
- 2. Seven ways to select columns - My top 50 scikit-learn tips

- 2. Seven ways to select columns

My top 50 scikit-learn tips
2023年04月20日 
00:04:19 - 00:08:18
- 3. "fit" vs "transform" - My top 50 scikit-learn tips

- 3. "fit" vs "transform"

My top 50 scikit-learn tips
2023年04月20日 
00:08:18 - 00:10:53
- 4. Don't use "fit" on new data! - My top 50 scikit-learn tips

- 4. Don't use "fit" on new data!

My top 50 scikit-learn tips
2023年04月20日 
00:10:53 - 00:15:05
- 5. Don't use pandas for preprocessing! - My top 50 scikit-learn tips

- 5. Don't use pandas for preprocessing!

My top 50 scikit-learn tips
2023年04月20日 
00:15:05 - 00:19:00
- 6. Encode categorical features - My top 50 scikit-learn tips

- 6. Encode categorical features

My top 50 scikit-learn tips
2023年04月20日 
00:19:00 - 00:24:07
- 7. Handle new categories in testing data - My top 50 scikit-learn tips

- 7. Handle new categories in testing data

My top 50 scikit-learn tips
2023年04月20日 
00:24:07 - 00:27:16
handle_unknown='ignore'. A most useful tip! If only I'd read the docs. But, I don't understand when you say to go back and include the previously unknown categories. How can you train on unknown data? Even if you include the unknown "labels" in your encoder, they will all be zero during training, because, obviously, they weren't in your training data. I think it's best to just leave it alone. If it wasn't in your training data, then it's probably a rare occurrence and you can just ignore it. Zeros in all known categories simplifies what happens down stream? If you want to train on unknown data, you would need to use "dummy data" and set min_frequency or max_categories, then handle_unknown='infrequent_if_exists' to give down steam modules something to work with. - My top 50 scikit-learn tips

handle_unknown='ignore'. A most useful tip! If only I'd read the docs. But, I don't understand when you say to go back and include the previously unknown categories. How can you train on unknown data? Even if you include the unknown "labels" in your encoder, they will all be zero during training, because, obviously, they weren't in your training data. I think it's best to just leave it alone. If it wasn't in your training data, then it's probably a rare occurrence and you can just ignore it. Zeros in all known categories simplifies what happens down stream? If you want to train on unknown data, you would need to use "dummy data" and set min_frequency or max_categories, then handle_unknown='infrequent_if_exists' to give down steam modules something to work with.

My top 50 scikit-learn tips
2023年04月20日  Phil Webb 様 
00:24:08 - 02:47:31
- 8. Chain steps with Pipeline - My top 50 scikit-learn tips

- 8. Chain steps with Pipeline

My top 50 scikit-learn tips
2023年04月20日 
00:27:16 - 00:30:19
- 9. Encode "missingness" as a feature - My top 50 scikit-learn tips

- 9. Encode "missingness" as a feature

My top 50 scikit-learn tips
2023年04月20日 
00:30:19 - 00:33:12
Missingness. So, what happens when a feature is fully populated in your training data, but has missing values in your validation data? Just bringing that up in case you don't get to it. - My top 50 scikit-learn tips

Missingness. So, what happens when a feature is fully populated in your training data, but has missing values in your validation data? Just bringing that up in case you don't get to it.

My top 50 scikit-learn tips
2023年04月20日  Phil Webb 様 
00:30:20 - 02:47:31
- 10. Why set a random state? - My top 50 scikit-learn tips

- 10. Why set a random state?

My top 50 scikit-learn tips
2023年04月20日 
00:33:12 - 00:35:40
- 11. Better ways to impute missing values - My top 50 scikit-learn tips

- 11. Better ways to impute missing values

My top 50 scikit-learn tips
2023年04月20日 
00:35:40 - 00:41:22
- 12. Pipeline vs make_pipeline - My top 50 scikit-learn tips

- 12. Pipeline vs make_pipeline

My top 50 scikit-learn tips
2023年04月20日 
00:41:22 - 00:44:08
- 13. Inspect a Pipeline - My top 50 scikit-learn tips

- 13. Inspect a Pipeline

My top 50 scikit-learn tips
2023年04月20日 
00:44:08 - 00:47:03
- 14. Handle missing values automatically - My top 50 scikit-learn tips

- 14. Handle missing values automatically

My top 50 scikit-learn tips
2023年04月20日 
00:47:03 - 00:49:47
- 15. Don't drop the first categorical level - My top 50 scikit-learn tips

- 15. Don't drop the first categorical level

My top 50 scikit-learn tips
2023年04月20日 
00:49:47 - 00:54:15
- 16. Tune a Pipeline - My top 50 scikit-learn tips

- 16. Tune a Pipeline

My top 50 scikit-learn tips
2023年04月20日 
00:54:15 - 01:01:09
- 17. Randomized search vs grid search - My top 50 scikit-learn tips

- 17. Randomized search vs grid search

My top 50 scikit-learn tips
2023年04月20日 
01:01:09 - 01:05:42
- 18. Examine grid search results - My top 50 scikit-learn tips

- 18. Examine grid search results

My top 50 scikit-learn tips
2023年04月20日 
01:05:42 - 01:08:10
- 19. Logistic regression tuning parameters - My top 50 scikit-learn tips

- 19. Logistic regression tuning parameters

My top 50 scikit-learn tips
2023年04月20日 
01:08:10 - 01:12:41
- 20. Plot a confusion matrix - My top 50 scikit-learn tips

- 20. Plot a confusion matrix

My top 50 scikit-learn tips
2023年04月20日 
01:12:41 - 01:15:37
- 21. Plot multiple ROC curves - My top 50 scikit-learn tips

- 21. Plot multiple ROC curves

My top 50 scikit-learn tips
2023年04月20日 
01:15:37 - 01:17:21
- 22. Use the correct Pipeline methods - My top 50 scikit-learn tips

- 22. Use the correct Pipeline methods

My top 50 scikit-learn tips
2023年04月20日 
01:17:21 - 01:18:59
- 23. Access model coefficients - My top 50 scikit-learn tips

- 23. Access model coefficients

My top 50 scikit-learn tips
2023年04月20日 
01:18:59 - 01:20:11
- 24. Visualize a decision tree - My top 50 scikit-learn tips

- 24. Visualize a decision tree

My top 50 scikit-learn tips
2023年04月20日 
01:20:11 - 01:23:57
- 25. Improve a decision tree by pruning it - My top 50 scikit-learn tips

- 25. Improve a decision tree by pruning it

My top 50 scikit-learn tips
2023年04月20日 
01:23:57 - 01:25:23
- 26. Use stratified sampling when splitting data - My top 50 scikit-learn tips

- 26. Use stratified sampling when splitting data

My top 50 scikit-learn tips
2023年04月20日 
01:25:23 - 01:29:40
- 27. Impute missing values for categoricals - My top 50 scikit-learn tips

- 27. Impute missing values for categoricals

My top 50 scikit-learn tips
2023年04月20日 
01:29:40 - 01:32:10
- 28. Save a model or Pipeline - My top 50 scikit-learn tips

- 28. Save a model or Pipeline

My top 50 scikit-learn tips
2023年04月20日 
01:32:10 - 01:33:47
- 29. Add multiple text columns to a model - My top 50 scikit-learn tips

- 29. Add multiple text columns to a model

My top 50 scikit-learn tips
2023年04月20日 
01:33:47 - 01:35:35
- 30. More ways to inspect a Pipeline - My top 50 scikit-learn tips

- 30. More ways to inspect a Pipeline

My top 50 scikit-learn tips
2023年04月20日 
01:35:35 - 01:37:28
- 31. Know when shuffling is required - My top 50 scikit-learn tips

- 31. Know when shuffling is required

My top 50 scikit-learn tips
2023年04月20日 
01:37:28 - 01:42:32
- 32. Use AUC with multiclass problems - My top 50 scikit-learn tips

- 32. Use AUC with multiclass problems

My top 50 scikit-learn tips
2023年04月20日 
01:42:32 - 01:46:04
- 33. Create custom features with scikit-learn - My top 50 scikit-learn tips

- 33. Create custom features with scikit-learn

My top 50 scikit-learn tips
2023年04月20日 
01:46:04 - 01:50:03
- 34. Automate feature selection - My top 50 scikit-learn tips

- 34. Automate feature selection

My top 50 scikit-learn tips
2023年04月20日 
01:50:03 - 01:52:24
- 35. Use pandas objects with scikit-learn - My top 50 scikit-learn tips

- 35. Use pandas objects with scikit-learn

My top 50 scikit-learn tips
2023年04月20日 
01:52:24 - 01:53:37
- 36. Pass parameters as keyword arguments - My top 50 scikit-learn tips

- 36. Pass parameters as keyword arguments

My top 50 scikit-learn tips
2023年04月20日 
01:53:37 - 01:55:23
- 37. Create an interactive Pipeline diagram - My top 50 scikit-learn tips

- 37. Create an interactive Pipeline diagram

My top 50 scikit-learn tips
2023年04月20日 
01:55:23 - 01:57:22
- 38. Get the names of transformed features - My top 50 scikit-learn tips

- 38. Get the names of transformed features

My top 50 scikit-learn tips
2023年04月20日 
01:57:22 - 01:59:32
- 39. Load a toy dataset into pandas - My top 50 scikit-learn tips

- 39. Load a toy dataset into pandas

My top 50 scikit-learn tips
2023年04月20日 
01:59:32 - 02:01:33
- 40. View all model parameters - My top 50 scikit-learn tips

- 40. View all model parameters

My top 50 scikit-learn tips
2023年04月20日 
02:01:33 - 02:03:00
- 41. Encode binary features - My top 50 scikit-learn tips

- 41. Encode binary features

My top 50 scikit-learn tips
2023年04月20日 
02:03:00 - 02:06:59
Drop=if_binary makes sense, otherwise you have two columns which are perfectly redundant, not just implied. At least, it's a happy compromise. My only hesitation, without playing with it, is that the order is probably alphabetic. If it assigned 0 to the most frequent category, then handle_unknown=ignore would make sense. Otherwise, you're lumping unknowns in with the "least" alphabetic category. That's kinda silly. - My top 50 scikit-learn tips

Drop=if_binary makes sense, otherwise you have two columns which are perfectly redundant, not just implied. At least, it's a happy compromise. My only hesitation, without playing with it, is that the order is probably alphabetic. If it assigned 0 to the most frequent category, then handle_unknown=ignore would make sense. Otherwise, you're lumping unknowns in with the "least" alphabetic category. That's kinda silly.

My top 50 scikit-learn tips
2023年04月20日  Phil Webb 様 
02:03:00 - 02:47:31
- 42. Column selection tricks - My top 50 scikit-learn tips

- 42. Column selection tricks

My top 50 scikit-learn tips
2023年04月20日 
02:06:59 - 02:10:02
Hopefully, you'll never have 200 columns to passthrough, but I think specifying which columns to passthrough makes what you intend clearer. The default is remainder=drop, so the author thought that as well. - My top 50 scikit-learn tips

Hopefully, you'll never have 200 columns to passthrough, but I think specifying which columns to passthrough makes what you intend clearer. The default is remainder=drop, so the author thought that as well.

My top 50 scikit-learn tips
2023年04月20日  Phil Webb 様 
02:09:40 - 02:47:31
Yeah, if you have the time and the determination, you could run DecisionTreeClassifier, then plot_tree, and look through it for conditions like name != value. Then, you could use the order the decision tree "discovers" categories as the ordinal value for that feature, 0 being first. You just need to write a custom transformer to preprocess your validation data and assign -1 to all unknowns. Another trick I've had success with is ordering by frequency, with 0 being the most frequent. In that case, your custom transformer should assign 0 to all unknowns. Easy-peasy. - My top 50 scikit-learn tips

Yeah, if you have the time and the determination, you could run DecisionTreeClassifier, then plot_tree, and look through it for conditions like name != value. Then, you could use the order the decision tree "discovers" categories as the ordinal value for that feature, 0 being first. You just need to write a custom transformer to preprocess your validation data and assign -1 to all unknowns. Another trick I've had success with is ordering by frequency, with 0 being the most frequent. In that case, your custom transformer should assign 0 to all unknowns. Easy-peasy.

My top 50 scikit-learn tips
2023年04月20日  Phil Webb 様 
02:10:00 - 02:47:31
- 43. Save time when encoding categoricals - My top 50 scikit-learn tips

- 43. Save time when encoding categoricals

My top 50 scikit-learn tips
2023年04月20日 
02:10:02 - 02:16:53
- 44. Speed up a grid search - My top 50 scikit-learn tips

- 44. Speed up a grid search

My top 50 scikit-learn tips
2023年04月20日 
02:16:53 - 02:19:01
- 45. Create feature interactions - My top 50 scikit-learn tips

- 45. Create feature interactions

My top 50 scikit-learn tips
2023年04月20日 
02:19:01 - 02:23:00
- 46. Ensemble multiple models - My top 50 scikit-learn tips

- 46. Ensemble multiple models

My top 50 scikit-learn tips
2023年04月20日 
02:23:00 - 02:27:23
- 47. Tune an ensemble - My top 50 scikit-learn tips

- 47. Tune an ensemble

My top 50 scikit-learn tips
2023年04月20日 
02:27:23 - 02:31:22
- 48. Run part of a Pipeline - My top 50 scikit-learn tips

- 48. Run part of a Pipeline

My top 50 scikit-learn tips
2023年04月20日 
02:31:22 - 02:34:52
- 49. Tune multiple models at once - My top 50 scikit-learn tips

- 49. Tune multiple models at once

My top 50 scikit-learn tips
2023年04月20日 
02:34:52 - 02:39:50
- 50. Solve many ML problems with one solution - My top 50 scikit-learn tips

- 50. Solve many ML problems with one solution

My top 50 scikit-learn tips
2023年04月20日 
02:39:50 - 02:47:31
Introduction - 21 more pandas tricks

Introduction

21 more pandas tricks
2022年05月13日 
00:00:00 - 00:00:36
1. Check for equality - 21 more pandas tricks

1. Check for equality

21 more pandas tricks
2022年05月13日 
00:00:36 - 00:01:27
2. Check for equality (alternative) - 21 more pandas tricks

2. Check for equality (alternative)

21 more pandas tricks
2022年05月13日 
00:01:27 - 00:02:38
3. Use NumPy without importing NumPy - 21 more pandas tricks

3. Use NumPy without importing NumPy

21 more pandas tricks
2022年05月13日 
00:02:38 - 00:03:42
4. Calculate memory usage - 21 more pandas tricks

4. Calculate memory usage

21 more pandas tricks
2022年05月13日 
00:03:42 - 00:04:10
5. Count the number of words in a column - 21 more pandas tricks

5. Count the number of words in a column

21 more pandas tricks
2022年05月13日 
00:04:10 - 00:04:45
6. Convert one set of values to another - 21 more pandas tricks

6. Convert one set of values to another

21 more pandas tricks
2022年05月13日 
00:04:45 - 00:06:59
7. Convert continuous data into categorical data (alternative) - 21 more pandas tricks

7. Convert continuous data into categorical data (alternative)

21 more pandas tricks
2022年05月13日 
00:06:59 - 00:08:05
8. Create a cross-tabulation - 21 more pandas tricks

8. Create a cross-tabulation

21 more pandas tricks
2022年05月13日 
00:08:05 - 00:08:55
9. Create a datetime column from multiple columns - 21 more pandas tricks

9. Create a datetime column from multiple columns

21 more pandas tricks
2022年05月13日 
00:08:55 - 00:09:34
10. Resample a datetime column - 21 more pandas tricks

10. Resample a datetime column

21 more pandas tricks
2022年05月13日 
00:09:34 - 00:11:07
11. Read and write from compressed files - 21 more pandas tricks

11. Read and write from compressed files

21 more pandas tricks
2022年05月13日 
00:11:07 - 00:12:10
12. Fill missing values using interpolation - 21 more pandas tricks

12. Fill missing values using interpolation

21 more pandas tricks
2022年05月13日 
00:12:10 - 00:12:45
13. Check for duplicate merge keys - 21 more pandas tricks

13. Check for duplicate merge keys

21 more pandas tricks
2022年05月13日 
00:12:45 - 00:13:50
14. Transpose a wide DataFrame - 21 more pandas tricks

14. Transpose a wide DataFrame

21 more pandas tricks
2022年05月13日 
00:13:50 - 00:14:47
15. Create an example DataFrame (alternative) - 21 more pandas tricks

15. Create an example DataFrame (alternative)

21 more pandas tricks
2022年05月13日 
00:14:47 - 00:16:06
16. Identify rows that are missing from a DataFrame - 21 more pandas tricks

16. Identify rows that are missing from a DataFrame

21 more pandas tricks
2022年05月13日 
00:16:06 - 00:17:09
17. Use query to avoid intermediate variables - 21 more pandas tricks

17. Use query to avoid intermediate variables

21 more pandas tricks
2022年05月13日 
00:17:09 - 00:19:06
18. Reshape a DataFrame from wide format to long format - 21 more pandas tricks

18. Reshape a DataFrame from wide format to long format

21 more pandas tricks
2022年05月13日 
00:19:06 - 00:21:19
19. Reverse row order (alternative) - 21 more pandas tricks

19. Reverse row order (alternative)

21 more pandas tricks
2022年05月13日 
00:21:19 - 00:22:25
20. Reverse column order (alternative) - 21 more pandas tricks

20. Reverse column order (alternative)

21 more pandas tricks
2022年05月13日 
00:22:25 - 00:23:21
21. Split a string into multiple columns (alternative) - 21 more pandas tricks

21. Split a string into multiple columns (alternative)

21 more pandas tricks
2022年05月13日 
00:23:21 - 00:24:40
Had one doubt, didn't understood the placeholder part at . - Tune multiple models simultaneously with GridSearchCV

Had one doubt, didn't understood the placeholder part at .

Tune multiple models simultaneously with GridSearchCV
2021年10月26日  gaurav malik 様 
00:02:15 - 00:05:07