Pandas Medicine Analysis

Exploratory data analysis of the Bangladesh medicine dataset using Pandas, Matplotlib, Plotly, and WordCloud.

PythonPandasPlotlyMatplotlibWordCloudBeautifulSoupJupyter Notebook

March 2026

Overview

A data analysis project exploring the Assorted Medicine Dataset of Bangladesh, containing 21,000+ medicines. The goal is to demonstrate core data analyst skills — data cleaning, transformation, descriptive statistics, and visualization — using real-world pharmaceutical data.

Features

Automatic dataset download from Kaggle with skip-if-exists logic
Data cleaning: dropping unnecessary columns, handling missing values, parsing HTML descriptions with BeautifulSoup
Price extraction using regex to calculate unit prices from complex package size strings
Interactive visualizations: pie charts, bar charts, and histograms with Plotly
Word cloud generation for drug classes and medical indications
Missing value analysis with percentage breakdowns and bar chart visualization

Architecture

Single Jupyter Notebook — all analysis in main.ipynb, structured into sections: Medicine, Generic, Manufacturer, Dosage Form, Drug Class, and Indication
Pandas for all data loading, cleaning, and transformation — value counts, explode, groupby, string extraction
Plotly for interactive charts (pie, bar, histogram)
Matplotlib + WordCloud for word cloud visualizations
BeautifulSoup for parsing HTML content in generic description columns
Dataset auto-downloaded from Kaggle API on first run

Learnings

Practiced handling messy real-world data: multi-value columns, embedded HTML, currency symbols (৳ Bangladeshi Taka), and inconsistent price formats
Deepened understanding of Pandas — value counts, groupby, explode, string methods, and chaining operations for data transformation
Gained experience with Matplotlib for static visualizations and word cloud rendering
Gained experience building interactive Plotly charts for data exploration
Understood the importance of missing value analysis before drawing conclusions from incomplete columns