Difficulty viewing large code files?

If you're having trouble viewing large code files on GitHub, you may find it helpful to download a ZIP file containing the entire repository. To do so, follow these steps:

Click on the green "Code" button on the repository page.
Select "Download ZIP" from the dropdown menu.
Save the ZIP file to your computer.

This can be particularly useful if you're experiencing issues with GitHub's web interface or if you need to access the repository without an internet connection. If you have any questions or concerns, please don't hesitate to contact us.

Introduction

In this code, we analyze a dataset using pandas, numpy, seaborn, and matplotlib libraries in Python. The dataset is loaded from a CSV file named "dataset.csv". The data is first analyzed in its original form and then normalized to analyze the normalized data.

Data Analysis

The code first loads the dataset into a pandas dataframe and drops the index column. The summary statistics of the original data are then printed using the describe() function. Next, a correlation heatmap and a distribution plot are created using seaborn and matplotlib libraries.

Data Normalization

The data is then normalized using the min-max normalization technique. The normalized data is obtained by subtracting the minimum value of each column from each value of that column and then dividing the result by the difference between the maximum and minimum values of that column. The summary statistics of the normalized data are then printed using the describe() function. Next, a correlation heatmap and a distribution plot are created for the normalized data.

Comparison

The code then compares the correlation heatmaps of the original and normalized data using a subplot. The subplot shows two heatmaps side by side, one for the original data and the other for the normalized data. The title of the subplot is "Comparison of Correlation Heatmaps".

Box Plots

Finally, the code creates box plots of the original and normalized data using seaborn library, and removes the outliers from the plots using the "showfliers=False" parameter.

Conclusion

In this code, we have analyzed a dataset using various statistical techniques and visualizations. We have also normalized the data to improve the correlation analysis. The box plots have provided an insight into the distribution of the data, and the removal of outliers has made the plots more readable. The code provides a good example of how to analyze and visualize a dataset using Python libraries.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
README.md		README.md
Report for dataset analysis-cropped_removed.pdf		Report for dataset analysis-cropped_removed.pdf
dataset.csv		dataset.csv
dataset.ipynb		dataset.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Difficulty viewing large code files?

Introduction

Data Analysis

Data Normalization

Comparison

Box Plots

Conclusion

About

Releases

Packages

Languages

nishkarsh25/Dataset

Folders and files

Latest commit

History

Repository files navigation

Difficulty viewing large code files?

Introduction

Data Analysis

Data Normalization

Comparison

Box Plots

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages