Skip to content

In this code, we analyze a dataset using pandas, numpy, seaborn, and matplotlib libraries in Python. The dataset is loaded from a CSV file named "dataset.csv". The data is first analyzed in its original form and then normalized to analyze the normalized data.

Notifications You must be signed in to change notification settings

nishkarsh25/Dataset

Repository files navigation

Difficulty viewing large code files?

If you're having trouble viewing large code files on GitHub, you may find it helpful to download a ZIP file containing the entire repository. To do so, follow these steps:

  1. Click on the green "Code" button on the repository page.
  2. Select "Download ZIP" from the dropdown menu.
  3. Save the ZIP file to your computer.

This can be particularly useful if you're experiencing issues with GitHub's web interface or if you need to access the repository without an internet connection. If you have any questions or concerns, please don't hesitate to contact us. image

Introduction

In this code, we analyze a dataset using pandas, numpy, seaborn, and matplotlib libraries in Python. The dataset is loaded from a CSV file named "dataset.csv". The data is first analyzed in its original form and then normalized to analyze the normalized data.

Data Analysis

The code first loads the dataset into a pandas dataframe and drops the index column. The summary statistics of the original data are then printed using the describe() function. Next, a correlation heatmap and a distribution plot are created using seaborn and matplotlib libraries. image image image

Data Normalization

The data is then normalized using the min-max normalization technique. The normalized data is obtained by subtracting the minimum value of each column from each value of that column and then dividing the result by the difference between the maximum and minimum values of that column. The summary statistics of the normalized data are then printed using the describe() function. Next, a correlation heatmap and a distribution plot are created for the normalized data.

image

image

image

Comparison

The code then compares the correlation heatmaps of the original and normalized data using a subplot. The subplot shows two heatmaps side by side, one for the original data and the other for the normalized data. The title of the subplot is "Comparison of Correlation Heatmaps". image

Box Plots

Finally, the code creates box plots of the original and normalized data using seaborn library, and removes the outliers from the plots using the "showfliers=False" parameter.

image

image

Conclusion

In this code, we have analyzed a dataset using various statistical techniques and visualizations. We have also normalized the data to improve the correlation analysis. The box plots have provided an insight into the distribution of the data, and the removal of outliers has made the plots more readable. The code provides a good example of how to analyze and visualize a dataset using Python libraries.

About

In this code, we analyze a dataset using pandas, numpy, seaborn, and matplotlib libraries in Python. The dataset is loaded from a CSV file named "dataset.csv". The data is first analyzed in its original form and then normalized to analyze the normalized data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published