Browsing:

Category: Python

df.drop_duplicates(subset=[‘name’, ‘age’], keep=’first’, inplace=True) vs df[‘name’] = df[‘name’].drop_duplicates(keep=’first’)

1. df[‘name’] = df[‘name’].drop_duplicates(keep=’first’) Example: If your original DataFrame was: name age gender Alice 25 F Bob 30 M Alice 35 F Charlie 28 M The resulting DataFrame would be: name age gender Alice 25 F Bob 30 M NaN Read more…


Basic transformations in Data Engineering (Python, SQL, PySpark)

1.Replace or Remove Special Characters in Text Fields Pandas: PySpark SQL: 2.Standardize Values in Columns Standardize department names (e.g., change Sales to SALES): Panda PySpark SQL 3.To fill Missing Numerical Data Pandas 4. To convert Date Strings into a Consistent Read more…


if data is not None: vs if data:

In Python, both if data is not None: and if data: are common ways to check if a variable has a value, but they behave slightly differently in terms of what they check for. Here’s the difference: 1. if data Read more…


In a Pandas DataFrame, None, NaN, and NULL

In a Pandas DataFrame, None, NaN, and NULL are often used to represent missing values, but they are not exactly the same. Here’s a breakdown: Practical Differences in Pandas:


Reading Large Datasets in Chunks with Pandas and PySpark using CSV

S3, especially when dealing with 3 GB or more of data. Using Pandas to process data in chunks is a very effective method for handling large files without overwhelming your system’s memory. Here’s how to manage the process step-by-step: 1. Read more…


Understanding When to Make Copies of Data Before Modifications in Python

In Python, the scenario where modifying a dependent variable impacts the original variable’s value occurs with mutable data types, such as lists, dictionaries, sets, and user-defined objects. Here’s an explanation of how this behavior manifests: Mutable vs. Immutable Data Types Read more…


Unsupervised learning

If you don’t know the categories or labels in your data, you’re dealing with unsupervised learning, where the goal is to find patterns, groupings, or structure within the data without labeled outcomes. Unlike supervised learning (which involves classification or regression), Read more…


In Python, object creation revolves around several important methods and mechanisms

In Python, object creation revolves around several important methods and mechanisms. The most commonly used method is __init__, but Python provides other special methods that offer more control over object instantiation and initialization. These methods allow customization for both class-level Read more…


Mixins in Django

Mixins are a type of class in object-oriented programming (OOP) that provide reusable methods or behaviors to be shared across different classes without requiring full inheritance. They allow developers to add specific functionalities to different classes without cluttering the class Read more…


Can combine views in Django REST Framework (DRF) using class-based views or function-based views?

Yes, you can combine views in Django REST Framework (DRF) using class-based views or function-based views. However, combining views comes with trade-offs regarding code organization, readability, and maintainability. Here’s how and when you might want to combine views: 1. Combining Read more…