Python Python Tutorial File Handling NumPy Tutorial NumPy Random NumPy ufunc Pandas Tutorial Pandas Cleaning Data Pandas Correlations Pandas Plotting SciPy Tutorial



Pandas DataFrames

Pandas is a popular data analysis library in Python. It provides various data structures and functions to manipulate and analyze data. One of the most important data structures in Pandas is the DataFrame. A DataFrame is a two-dimensional table-like data structure with rows and columns. It is similar to a spreadsheet or a SQL table.

A DataFrame can be created from various data sources such as CSV files, Excel files, SQL databases, and Python dictionaries. Once created, a DataFrame can be manipulated and analyzed using various functions provided by Pandas.

Creating a DataFrame

A DataFrame can be created from a Python dictionary. The keys of the dictionary become the column names, and the values become the data in the columns. Here is an example:

<per>
import pandas as pd

data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
        'age': [25, 30, 35, 40],
        'gender': ['F', 'M', 'M', 'M']}

df = pd.DataFrame(data)

print(df)
</per>

The output of the above code will be:

      name  age gender
0    Alice   25      F
1      Bob   30      M
2  Charlie   35      M
3    David   40      M

A DataFrame can also be created from a CSV file using the `read_csv()` function. Here is an example:

<per>
df = pd.read_csv('data.csv')

print(df)
</per>

The `read_csv()` function reads the data from the CSV file and creates a DataFrame. The output of the above code will be the contents of the CSV file in a tabular format.

Manipulating a DataFrame

A DataFrame can be manipulated in various ways using functions provided by Pandas. Here are some examples:

Selecting columns:

<per>
# Selecting a single column
print(df['name'])

# Selecting multiple columns
print(df[['name', 'age']])
</per>

The output of the above code will be the selected columns of the DataFrame.

Selecting rows:

<per>
# Selecting a single row by index
print(df.loc[0])

# Selecting multiple rows by index
print(df.loc[[0, 2]])

# Selecting rows based on a condition
print(df[df['age'] > 30])
</per>

The output of the above code will be the selected rows of the DataFrame.

Adding a column:

<per>
df['salary'] = [50000, 60000, 70000, 80000]

print(df)
</per>

The output of the above code will be the DataFrame with the new column added.

Deleting a column:

<per>
df = df.drop('gender', axis=1)

print(df)
</per>

The output of the above code will be the DataFrame with the specified column deleted.

Conclusion

Pandas DataFrames are a powerful tool for data analysis in Python. They provide a convenient way to manipulate and analyze tabular data. With the various functions provided by Pandas, it is easy to perform complex data analysis tasks on DataFrames.

References

Activity