The BankMarketCapETL project is designed to create a robust ETL (Extract, Transform, Load) pipeline focusing on the market capitalization of the world's largest banks. This pipeline extracts data from a specified Wikipedia page, transforms the market capitalization values according to exchange rates, and loads the data into both a SQLite database and a CSV file for further analysis or visualization. This project aims to provide financial analysts and enthusiasts with up-to-date information on bank valuations in various currencies, facilitating global financial comparisons and analyses.
- Data Source URL: List of largest banks on Wikipedia
- Exchange Rate CSV Path: Path to a CSV file containing the latest exchange rates.
- SQLite Database:
Banks.db
for storing transformed data. - Output CSV File:
Largest_banks_data.csv
for easy access and sharing of the transformed data.
- Objective: Extract the list of the world's largest banks and their market capitalization in USD from the specified Wikipedia page.
- Method: Use
BeautifulSoup
to parse the HTML content of the page and extract relevant data into a pandas DataFrame.
- Objective: Convert the market capitalization values from USD to GBP, EUR, and INR using the exchange rates provided in the
exchange_rate.csv
file. - Method: Read the exchange rates from the CSV file into a dictionary and apply these rates to the
MC_USD_Billion
column in the DataFrame. The transformed data will include new columns:MC_GBP_Billion
,MC_EUR_Billion
, andMC_INR_Billion
, with values rounded to the nearest billion.
- Objective: Save the transformed DataFrame to a CSV file for easy access and distribution.
- Method: Use the
to_csv
method of pandas DataFrame to write the data toLargest_banks_data.csv
, ensuring the data is accessible outside the Python environment.
- Objective: Load the transformed data into a SQLite database for persistent storage and query capabilities.
- Method: Utilize
sqlite3
and pandas'to_sql
function to insert the DataFrame into theLargest_banks
table in theBanks.db
database.
- Objective: Demonstrate the ability to run queries against the loaded data in the SQLite database.
- Method: Implement a
run_queries
function to execute SQL queries, printing both the query and its results. Sample queries include selecting the entire table, calculating the average market capitalization, and listing the top 5 banks by market capitalization.
The BankMarketCapETL project streamlines the process of gathering, converting, and storing critical financial data regarding the world's largest banks. By automating the extraction of up-to-date market capitalization data and accommodating currency conversions, this pipeline serves as a valuable tool for financial analysis and reporting.