Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreieccbgoyyzg337df3t7josxexjg6ydcm7mhu2rly7g3qmw3wiv3za",
    "uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3moqmgqhr4cp2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreid44dmsq7n3vq527yy7so2karbckiehxraw4itd5nijxv5kq2urm4"
    },
    "mimeType": "image/webp",
    "size": 285104
  },
  "path": "/josengash/pandas-and-data-visualization-using-matplotlib-and-seaborn-1ek7",
  "publishedAt": "2026-06-20T19:08:07.000Z",
  "site": "https://dev.to",
  "tags": [
    "datascience",
    "dataengineering",
    "analytics",
    "deeplearning"
  ],
  "textContent": "New chapter in Learning data analytics and data science. The focus now is on **Pandas** as a **Python library** alongside **Matplotlib** and **Seaborn** for data visualization.\nAm writing this article to guide beginners who are already or beginning the **data analytics** and **data science** profession.\n\n##  Introduction to Python Data Analysis\n\nIn modern world Data has become most valuable asset. Business, healthcare institutions, financial and even social media platforms rely heavily on data to make informed decisions.\nRaw data is of no use and that where **Data Analysis** and visualization becomes important.\n\nOne of most used programming language is **Python** for data analysis because of its simplicity and powerful libraries.\nCommonly used libraries are **Pandas** , **Matplotlib** and **Seaborn.**\n**Pandas:** Helps in cleaning, organizing and analyzing data.\n**Matplotlib** and **Seaborn:** are used to create visual representations of data.\n\nThis article introduces Pandas and explains how Matplotlib and Seaborn can be used for effective data visualization in a begginer friendly way.\n\n###  What is Pandas and Why it Matters\n\nPandas is open source Python Library used for data manipulation and analysis. It provides simple and efficient tools for working with structured data eg CSV, spreadsheets and databases.\n\n###  Main data structures in Pandas are:\n\n  * **Series:** it is a one-dimensional array like structure.\n\n\n\n\n    import pandas as pd\n    age = {\"Age\":[18, 20, 23, 40, 50, 24]}\n\n    series = pd.DataFrame(age)\n    series\n\n    #Output:\n    Age\n    0   18\n    1   20\n    2   23\n    3   40\n    4   50\n    5   24\n\n\n  * **DataFrame:** It is two dimensional table similar to an Excel spreadsheet or SQL table.\n\n\n\n\n    students = {\"Name\":[\"Mark\", \"John\", \"Nancy\"],\n                \"Grade\":[\"A\", \"B\", \"C\"],\n                \"Course\":[\"Data Science\", \"Data Engineering\", \"Data Analytics\"]}\n\n    student_grades = pd.DataFrame(students)\n    student_grades\n\n    #Output:\n        Name    Grade   Course\n    0   Mark    A   Data Science\n    1   John    B   Data Engineering\n    2   Nancy   C   Data Analytics\n\n\nPandas matters because it simplifies complex tasks such as:\n\n  * Performing calculations and statistical analysis\n  * Filtering and sorting information\n  * Cleaning missing or incorrect data\n  * Reading datasets from files\n\n\n\nBefore using Pandas, it must be installed, run the following command:\n\n\n\n    pip install pandas\n\n\nThen imported into Python script:\n\n\n\n     import pandas as pd\n\n\n  * **Reading Data** Pandas can easily read files such as CSV and Excel _Example:_\n\n\n\n\n    import pandas as pd\n\n    df = pd.read_csv(\"students.csv\")\n    print(df.head())\n\n\nThe _head()_ function displays the first five rows of the dataset.\n\n  * **Checking Data Information** When you want to understand the structure of your data:\n\n\n\n\n    print(data.info())\n    print(data.describe())\n\n\n_info()_ shows column names, data types and missing values\n_describe()_ provides statistical summaries such as maximum values, averages\n\n  * **Handling Missing Values** Missing values can affect the analysis results. Checking missing data\n\n\n\n\n     print(data.isnull().sum())\n\n\nRemoving missing values:\n\n\n\n    data = data.dropna()\n\n\nFilling missing values:\n\n\n\n    data[\"Age\"] = data[\"Age\"].fillna(data[\"Age\"].mean())\n\n\nThis replaces missing age values with the average age.\nFiltering allows users to select specific information\n_Example:_\n\n\n\n    high_scores = data[data[\"Score\"] > 70]\n    print(high_scores)\n\n\nSorting data:\n\n\n\n    sorted_data = data.sort_values(by=\"Score\", ascending=False)\n\n    print(sorted_data)\n\n\nThese operations help organize data for better understanding and reporting\n\n##  Data Visualization Fundamentals\n\n**Data Visualization:** It is the process of representing data graphically using charts, graphs and plots. Visualization makes it easier to identify patterns, trends and relationship in data.\n\n**For Example:**\n\n  * **Scatter plots** shows relationships between variables\n  * **Bar Charts** compares category\n  * **Pie Chart** display proportions\n  * **Line Charts** show trends over time\n\n\n\nVisualizations helps understand large datasets because humans interpret visuals faster than raw numbers.\n\nPython provides powerful visualization libraries, with _Matplotlib_ and _Seaborn_ being among the most widely used.\n\n###  Using Matplotlib for Charts\n\nMatplotlib is one of the oldest and most flexible visualization libraries in Python. It provides full control over the chart customization\n\nTo install Matplotlib\n\n\n\n    pip install matplotlib\n\n\nImport it:\n\n\n\n    import matplotlib.pyplot as plt\n\n\n**Creating a Line Chart**\nA line chart is used to show trends.\n_Example:_\n\n\n\n    import pandas as pd\n    import matplotlib.pyplot as plt\n    plt.figure(figsize=(6, 3))\n\n    sns.lineplot(data=housing_df, x=\"bathrooms\", y=\"bedrooms\")\n    plt.title(\"Bathrooms vs Bedrooms\")\n    plt.xlabel(\"Bathrooms\")\n    plt.ylabel(\"Bedrooms\")\n    plt.show()\n\n\nThe chart shows relationship between Bathrooms and bedrooms.\n\n\n**Creating Bar Chart**\nBar chart compare categories.\n_Example:_\n\n\n\n    # Average satisfaction score by property type\n    avg_satisfaction_by_prop = housing_df.groupby(\"property_type\")[\"satisfaction_score\"].mean().sort_values(ascending = False).reset_index()\n\n    #Plot\n    plt.figure(figsize = (6,3))\n\n    sns.barplot(data = avg_satisfaction_by_prop, x = \"property_type\", y = \"satisfaction_score\")\n    plt.title(\"Average Satisfaction Score by Property type\")\n    plt.xlabel(\"Property Type\")\n    plt.ylabel(\"Average Satisfaction Score\")\n    plt.show()\n\n\nThe chart compares Average satisfaction per property type\n\n**Creating Pie Chart**\nPie charts represents percentages.\n_Example:_\n\n\n\n    furnishing_counts = housing_df[\"furnishing\"].value_counts()\n\n    explode = (0.05, 0.05, 0.05)\n\n    plt.figure(figsize = (6, 6))\n\n    plt.pie(furnishing_counts, explode = explode, labels = furnishing_counts.index, autopct=\"%1.1f%%\")\n    plt.title(\"Distribution Of the furnishing status\")\n    plt.show()\n\n\n**Pie Chart:**\n\nMatplotlib is highly customizable and allows users to change colors, labels, chart sizes, and grid styles.\n\n###  Using Seaborn for Statistical Visualizations\n\n**Seaborn:** It is Python library built on top of **Matplotlib.** It provides more attractive and advanced statistical visualizations with less code.\n\nInstall Seaborns\n\n\n\n    pip install seaborn\n\n\nImport it:\n\n\n\n    import seaborn as sns\n\n\nSeaborn works smoothly with Pandas DataFrames.\n\n_Example dataset:_\n\n\n\n    import pandas as pd\n    data = { \"Student\": [\"John\", \"Mary\", \"Peter\", \"James\"], \"Score\": [85, 90, 78, 88] }\n    df = pd.DataFrame(data)\n\n\n**Bar Plot**\n\n\n\n    # Average monthly rent by property type\n    plt.figure(figsize=(6, 3))\n\n    sns.barplot(data=housing_df, x = \"property_type\", y = \"monthly_rent_kes\", estimator = \"mean\", palette = \"bright\")\n    plt.title(\"Average monthly rent by property type\")\n    plt.xlabel(\"Property Type\")\n    plt.ylabel(\"Average monthly rent\")\n    plt.xticks(rotation=45)\n    plt.show()\n\n\nSeaborn automatically applies better styling than in matplotlib.\n_Example of more styled Bar Plot:_\n\n**Histogram**\nHistograms show data distribution.\n_Example:_\n\n\n\n    # What is the distribution of monthly rent\n    plt.figure(figsize = (6, 3))\n\n    sns.histplot(data=housing_df, x = \"monthly_rent_kes\")\n    plt.title(\"Distribution of monthly rent\")\n    plt.xlabel(\"Monthly rent\")\n    plt.ylabel(\"Number of properties\")\n    plt.show()\n\n\nThis helps determine distribution of Monthly Rent.\n\n**Scatter Plot**\nIt reveals relationships between variables.\n_Example:_\n\n\n\n    # Plotting relationship between bedrooms and bathrooms\n\n    plt.figure(figsize=(6, 3))\n\n    sns.scatterplot(data=housing_df, x=\"bathrooms\", y=\"bedrooms\")\n    plt.title(\"Bathrooms vs Bedrooms\")\n    plt.xlabel(\"Bathrooms\")\n    plt.ylabel(\"Bedrooms\")\n    plt.show()\n\n\nThis chart helps to show the relationship between bathrooms and bedrooms.\n\n**Heatmap**\nHeatmaps show relationship between numerical values.\n_Example:_\n\n\n\n    # Correlation analysis\n\n    correlation = housing_df[numerical_columns].corr()\n\n    plt.figure(figsize=(6,6))\n\n    sns.heatmap(correlation, annot=True, fmt=\".2f\")\n    plt.title(\"Correlation Heatmap for Numerical Variables\")\n    plt.show()\n\n\nThey helps identify strong or weak relationships in datasets.\n\n_Heatmap Example_\n\nMatplotlib is suitable when detailed customization is required, while Seaborn is ideal for creating visually appealing statistical charts quickly.\nIn real world, most Analysts use both libraries together because seaborn is built on top of Matplotlib.\n\n###  Best Practices and Common Mistakes.\n\n####  Best Practices:\n\n  * Always clean data before analysis\n  * Use of appropriate charts\n  * Add labels and titles to charts\n  * Visualization should be simple and easy to read\n  * Check for missing or duplicate data\n\n\n\n####  Common Mistakes:\n\n  * Forgetting labels and or legends\n  * Ignoring missing values\n  * Using wrong charts\n  * Overcrowding charts with too much information\n\n\n\nClear visualization communicates datasets information effectively without confusion.\n\n###  Conclusion\n\nFor analyst to created clear and understandable visualizations, they must use **Pandas** , **Matplotlib** , and **Seaborn.**\n\nPandas simplifies data cleaning, manipulation, and analysis.\nMatplotlib and Seaborn transform raw numbers into meaningful visual insights.\n\nFor data analyst or data science beginners, learning these libraries is essential because they are widely used in industries like finance, healthcare, marketing and business intelligence.\nMastering these tools is an important step toward becoming a skilled analyst or data scientist.",
  "title": "Pandas and Data Visualization Using Matplotlib and Seaborn"
}