Python Python Tutorial File Handling NumPy Tutorial NumPy Random NumPy ufunc Pandas Tutorial Pandas Cleaning Data Pandas Correlations Pandas Plotting SciPy Tutorial



Pandas Read CSV

Pandas is a popular data manipulation library in Python. It provides various functions to read, write, and manipulate data in different formats. One of the most commonly used functions in Pandas is read_csv(). This function is used to read data from a CSV file and create a DataFrame object.

A CSV (Comma Separated Values) file is a text file that contains data in a tabular format. Each row in the file represents a record, and each column represents a field in the record. The fields are separated by a delimiter, which is usually a comma. CSV files are widely used to store and exchange data between different applications.

Brief Explanation of Pandas Read CSV

The read_csv() function in Pandas is used to read data from a CSV file and create a DataFrame object. The function takes several parameters, such as file path, delimiter, header, index_col, and many more. The default delimiter is a comma, but you can specify a different delimiter if your CSV file uses a different separator.

The header parameter is used to specify whether the first row in the CSV file contains column names or not. If the header parameter is set to None, Pandas will use default column names (0, 1, 2, …) for the DataFrame. If the header parameter is set to a list of column names, Pandas will use those names as column names for the DataFrame.

The index_col parameter is used to specify which column should be used as the index of the DataFrame. By default, Pandas will use a RangeIndex as the index of the DataFrame. If you want to use a different column as the index, you can specify the column name or column index using the index_col parameter.

Code Examples

Here are some examples of how to use the read_csv() function in Pandas:

Example 1: Reading a CSV file with default parameters


import pandas as pd

# Read a CSV file with default parameters
df = pd.read_csv('data.csv')

# Print the DataFrame
print(df)

In this example, we are reading a CSV file named data.csv using the read_csv() function. Since we are not specifying any parameters, Pandas will use default parameters to read the file. The resulting DataFrame will have default column names and a RangeIndex as the index.

Example 2: Reading a CSV file with custom parameters


import pandas as pd

# Read a CSV file with custom parameters
df = pd.read_csv('data.csv', delimiter=';', header=0, index_col='ID')

# Print the DataFrame
print(df)

In this example, we are reading a CSV file named data.csv using the read_csv() function. We are specifying custom parameters to read the file. The delimiter parameter is set to semicolon (;) because our CSV file uses semicolon as a separator. The header parameter is set to 0 because the first row in our CSV file contains column names. The index_col parameter is set to ID because we want to use the ID column as the index of the DataFrame.

Example 3: Reading a CSV file with missing values


import pandas as pd

# Read a CSV file with missing values
df = pd.read_csv('data.csv', na_values=['NA', 'N/A'])

# Print the DataFrame
print(df)

In this example, we are reading a CSV file named data.csv using the read_csv() function. We are specifying the na_values parameter to handle missing values in the file. The na_values parameter is set to a list of values that should be treated as missing values. In our case, we are treating NA and N/A as missing values.

References

Activity