Pandas is a popular data analysis library in Python. It provides various data structures and functions to manipulate and analyze data. One of the most important data structures in Pandas is the DataFrame. A DataFrame is a two-dimensional table-like data structure with rows and columns. It is similar to a spreadsheet or a SQL table.
A DataFrame can be created from various data sources such as CSV files, Excel files, SQL databases, and Python dictionaries. Once created, a DataFrame can be manipulated and analyzed using various functions provided by Pandas.
A DataFrame can be created from a Python dictionary. The keys of the dictionary become the column names, and the values become the data in the columns. Here is an example:
<per>
import pandas as pd
data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
'age': [25, 30, 35, 40],
'gender': ['F', 'M', 'M', 'M']}
df = pd.DataFrame(data)
print(df)
</per>
The output of the above code will be:
name age gender
0 Alice 25 F
1 Bob 30 M
2 Charlie 35 M
3 David 40 M
A DataFrame can also be created from a CSV file using the `read_csv()` function. Here is an example:
<per>
df = pd.read_csv('data.csv')
print(df)
</per>
The `read_csv()` function reads the data from the CSV file and creates a DataFrame. The output of the above code will be the contents of the CSV file in a tabular format.
A DataFrame can be manipulated in various ways using functions provided by Pandas. Here are some examples:
Selecting columns:
<per>
# Selecting a single column
print(df['name'])
# Selecting multiple columns
print(df[['name', 'age']])
</per>
The output of the above code will be the selected columns of the DataFrame.
Selecting rows:
<per>
# Selecting a single row by index
print(df.loc[0])
# Selecting multiple rows by index
print(df.loc[[0, 2]])
# Selecting rows based on a condition
print(df[df['age'] > 30])
</per>
The output of the above code will be the selected rows of the DataFrame.
Adding a column:
<per>
df['salary'] = [50000, 60000, 70000, 80000]
print(df)
</per>
The output of the above code will be the DataFrame with the new column added.
Deleting a column:
<per>
df = df.drop('gender', axis=1)
print(df)
</per>
The output of the above code will be the DataFrame with the specified column deleted.
Pandas DataFrames are a powerful tool for data analysis in Python. They provide a convenient way to manipulate and analyze tabular data. With the various functions provided by Pandas, it is easy to perform complex data analysis tasks on DataFrames.