Pandas 2.x Migration Guide for Energyworx
This guide helps you migrate your custom Flow rules and Market Adapters from Pandas 1.x to Pandas 2.x. It covers every breaking change relevant to Energyworx code, with before/after examples and search patterns to find affected code.
The Energyworx platform is transitioning from Pandas 1.x to 2.x. During this period, your code must work with both Pandas 1.x and 2.x. This guide marks each change with its compatibility:
- Safe now: The new syntax works in both Pandas 1.x and 2.x (e.g.,
"h"instead of"H",pd.concat()instead of.append()) - 2.2+ only: The new syntax only works in Pandas 2.2 or later (e.g.,
"ME"instead of"M"). Do not use these yet — keep the old syntax until the platform fully migrates.
Table of Contents
- Quick Reference: What Changed
- Removed Methods and Parameters
- Frequency Alias Changes
- Timezone and Datetime Changes
- DataFrame Operation Changes
- Timedelta Handling
- Migrating Flow Rules
- Migrating Market Adapters
- Error Message Reference
- Search Patterns for Your Code
- Testing Your Migration
1. Quick Reference: What Changed
| Category | Severity | Compat | Summary |
|---|---|---|---|
DataFrame.append() removed | Error | Safe now | Use pd.concat() instead |
pd.date_range(closed=) removed | Error | Safe now | Use inclusive= parameter |
Frequency aliases ("H", "T", "S") | Warning | Safe now | Lowercase required: "h", "min", "s" |
Frequency aliases ("M", "Q", "Y") | Warning | 2.2+ only | New suffixes "ME", "QE", "YE" — keep old syntax for now |
pd.Timedelta("100y") / pd.Timedelta("6M") | Error | Safe now | Use days or DateOffset |
Index.get_loc(method=) removed | Error | Safe now | Use get_indexer() instead |
.ix[] accessor removed | Error | Safe now | Use .loc[] or .iloc[] |
ExcelWriter.save() removed | Error | Safe now | Use .close() |
is_monotonic removed | Error | Safe now | Use is_monotonic_increasing |
infer_datetime_format parameter removed | Error | Safe now | Remove the parameter |
pd.np removed (e.g., pd.np.nan) | Error | Safe now | Use np.nan with import numpy as np |
pd.to_datetime() stricter format inference | Silent | See notes | Use fallback chain for cross-version compat |
value_counts().reset_index() columns renamed | Silent | Safe now | Use positional column access |
groupby([col]) key type changed | Silent | Safe now | Remove list wrapper for single column |
.columns & list deprecated | Warning | Safe now | Use .intersection() |
| Mixed-type DataFrame operations stricter | Error | Safe now | Select numeric columns first |
| Timezone-naive/aware mixing | Error | Safe now | Always localize timestamps |
inplace=True deprecated | Warning | Safe now | Use assignment instead |
| Copy-on-Write behavior | Silent | Safe now | Avoid chained indexing |
2. Removed Methods and Parameters
2.1 DataFrame.append() and Series.append() Removed
The .append() method has been removed from DataFrames, Series, and Index objects. Use pd.concat() instead.
# BEFORE (Pandas 1.x)
df = df.append(other_df)
df = df.append(other_df, ignore_index=True)
series = series.append(other_series)
# AFTER (Pandas 2.x)
df = pd.concat([df, other_df])
df = pd.concat([df, other_df], ignore_index=True)
series = pd.concat([series, other_series])
Notes:
pd.concat()returns a new object — it does not modify in place.- Always wrap the objects in a list:
[df1, df2]. - For appending a single row as a dict, use
pd.concat([df, pd.DataFrame([row_dict])]).
Search pattern: \.append\( (then verify it's on a DataFrame/Series, not a Python list)
2.2 pd.date_range(closed=) Removed
The closed parameter in pd.date_range() has been replaced with inclusive.
This change only applies to pd.date_range(). Other methods like IntervalIndex.from_breaks(), IntervalIndex.from_arrays(), and pd.cut() still use the closed parameter in Pandas 2.x.
# BEFORE (Pandas 1.x)
pd.date_range(start, end, freq="h", closed="right")
pd.date_range(start, end, freq="h", closed="left")
pd.date_range(start, end, freq="h", closed=None)
# AFTER (Pandas 2.x)
pd.date_range(start, end, freq="h", inclusive="right")
pd.date_range(start, end, freq="h", inclusive="left")
pd.date_range(start, end, freq="h", inclusive="both")
Mapping:
Old (closed=) | New (inclusive=) | Meaning |
|---|---|---|
closed=None | inclusive="both" | Include both start and end |
closed="left" | inclusive="left" | Include start, exclude end |
closed="right" | inclusive="right" | Exclude start, include end |
Search pattern: date_range\([^)]*closed\s*=
2.3 Index.get_loc(method=) Removed
The method parameter has been removed from Index.get_loc(). Use Index.get_indexer() instead.
# BEFORE (Pandas 1.x)
idx = df.index.get_loc(date, method="nearest")
# AFTER (Pandas 2.x)
idx = df.index.get_indexer([date], method="nearest")[0]
Search pattern: \.get_loc\([^)]*method\s*=
2.4 .ix[] Accessor Removed
The .ix[] accessor was removed. Use .loc[] (label-based) or .iloc[] (position-based) instead.
# BEFORE (Pandas 1.x)
value = df.ix[row_label]
value = df.ix[0]
# AFTER (Pandas 2.x)
value = df.loc[row_label] # by label
value = df.iloc[0] # by position
Search pattern: \.ix\[
2.5 ExcelWriter.save() Removed
# BEFORE (Pandas 1.x)
writer = pd.ExcelWriter(output, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
# AFTER (Pandas 2.x)
writer = pd.ExcelWriter(output, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.close()
Search pattern: writer\.save\(\)
2.6 is_monotonic Removed
# BEFORE (Pandas 1.x)
df.index.is_monotonic
# AFTER (Pandas 2.x)
df.index.is_monotonic_increasing
Search pattern: \.is_monotonic(?!_)
2.7 infer_datetime_format Parameter Removed
The infer_datetime_format parameter has been removed from pd.to_datetime() and pd.read_csv(). Pandas 2.x infers the format automatically — simply remove the parameter.
# BEFORE (Pandas 1.x)
pd.to_datetime(series, infer_datetime_format=True)
pd.read_csv(file, parse_dates=True, infer_datetime_format=True)
# AFTER (Pandas 2.x)
pd.to_datetime(series)
pd.read_csv(file, parse_dates=True)
Search pattern: infer_datetime_format
2.8 pd.np Removed
The pd.np alias for numpy has been removed. Use import numpy as np and reference np directly.
# BEFORE (Pandas 1.x)
import pandas as pd
df.replace({pd.np.nan: ''})
value = pd.np.nan
# AFTER (Pandas 2.x)
import pandas as pd
import numpy as np
df.replace({np.nan: ''})
value = np.nan
Notes:
- This is very common in Market Adapters that use
pd.np.nanas a sentinel value. - The fix is straightforward: add
import numpy as npand replacepd.npwithnp.
Search pattern: pd\.np\.
3. Frequency Alias Changes
Pandas 2.2 deprecated many frequency alias strings. These currently raise FutureWarning but will become errors in a future version.
3.1 Safe to Change Now (works in both Pandas 1.x and 2.x)
These lowercase aliases are accepted by both Pandas 1.x and 2.x. Change these now:
| Old Alias | New Alias | Meaning | Affects |
|---|---|---|---|
"H" | "h" | Hour | resample(), date_range(), Grouper(), Timedelta() |
"T" | "min" | Minute | Same |
"S" | "s" | Second | Same |
"L" | "ms" | Millisecond | Same |
"U" | "us" | Microsecond | Same |
"N" | "ns" | Nanosecond | Same |
3.2 Do NOT Change Yet (only valid in Pandas 2.2+)
These new aliases ("ME", "QE", "YE", etc.) are not recognized by Pandas versions before 2.2 and will raise ValueError: Invalid frequency. Since the platform must support both Pandas 1.x and 2.x, keep the old aliases for now. They emit a FutureWarning in 2.2+ but still work.
| Old Alias | Future Alias | Meaning | Action |
|---|---|---|---|
"M" | "ME" | Month End | Keep "M" for now |
"Q" | "QE" | Quarter End | Keep "Q" for now |
"Y" or "A" | "YE" | Year End | Keep "Y" for now |
"BM" | "BME" | Business Month End | Keep "BM" for now |
"BQ" | "BQE" | Business Quarter End | Keep "BQ" for now |
"BA" | "BYE" | Business Year End | Keep "BA" for now |
"AS" | "YS" | Year Start | Keep "AS" for now |
"BAS" | "BYS" | Business Year Start | Keep "BAS" for now |
Aliases that are still valid (no change needed): "D" (day), "W" (week), "MS" (month start), "QS" (quarter start), "B" (business day).
3.3 Common Energyworx Examples
# BEFORE (Pandas 1.x)
df.resample("H").sum()
df.resample("15T").mean()
pd.date_range(start, end, freq="1H")
pd.Grouper(freq="1H")
# AFTER (safe for both Pandas 1.x and 2.x)
df.resample("h").sum()
df.resample("15min").mean()
pd.date_range(start, end, freq="1h")
pd.Grouper(freq="1h")
# NOTE: Keep "M" for now — "ME" only works in Pandas 2.2+
pd.Grouper(freq="M") # keep as-is (will emit FutureWarning in 2.2+)
df.resample("M").sum() # keep as-is
Compound frequencies: When a number precedes the alias, update only the letter part:
"1H"→"1h""15T"→"15min""30S"→"30s""100L"→"100ms"
Search patterns:
- Hour:
freq\s*=\s*["'][^"']*H["']orresample\(\s*["'][^"']*H["'] - Minute:
freq\s*=\s*["'][^"']*T["'] - Second:
freq\s*=\s*["'][^"']*[0-9]S["'](careful:"MS"is valid) - Month end:
freq\s*=\s*["']M["'](exactly"M", not"MS"or"ME") - Year end:
freq\s*=\s*["'][^"']*[AY]["']
4. Timezone and Datetime Changes
4.1 Cannot Mix Timezone-Naive and Timezone-Aware
Pandas 2.x strictly rejects operations that mix timezone-naive and timezone-aware datetime objects. This is especially important in Energyworx because self.flow_timestamp is timezone-naive (even though it represents UTC).
# BEFORE (Pandas 1.x) — worked but was technically incorrect
start = pd.Timestamp("2024-01-01") # naive
df = self.dataframe # has UTC DatetimeIndex
result = df.loc[start:] # worked implicitly
# AFTER (Pandas 2.x) — must match timezone
start = pd.Timestamp("2024-01-01", tz="UTC")
df = self.dataframe
result = df.loc[start:]
Common fix for self.flow_timestamp:
# BEFORE
timestamp = pd.Timestamp(self.flow_timestamp)
# AFTER
timestamp = pd.Timestamp(self.flow_timestamp, tz="UTC")
Common fix for computed timestamps:
# BEFORE
edit_date = pd.Timestamp(start_date) + pd.Timedelta(hours=1)
# AFTER — localize AFTER arithmetic, or localize the input
edit_date = (pd.Timestamp(start_date) + pd.Timedelta(hours=1)).tz_localize("UTC")
# OR
edit_date = pd.Timestamp(start_date, tz="UTC") + pd.Timedelta(hours=1)
Search pattern: pd\.Timestamp\([^)]*\) where no tz= appears — then check if it's used with tz-aware data.
4.2 Using .date() on Timezone-Aware Index
Calling .date() on a timezone-aware Timestamp returns a timezone-naive datetime.date, which cannot be used for slicing a timezone-aware index. Use .normalize() instead.
# BEFORE (Pandas 1.x)
end = df.index[-1].date()
result = df.loc[:end, columns]
# AFTER (Pandas 2.x) — .normalize() gives midnight in the same timezone
end = df.index[-1].normalize()
result = df.loc[:end, columns]
Search pattern: \.index\[.*\]\.date\(\)
4.3 Timezone Comparisons
Pandas 2.x may represent UTC using different timezone objects internally. Direct comparison with pytz.UTC or datetime.timezone.utc can fail.
# BEFORE (Pandas 1.x)
import datetime as dt
assert df.index.tz == dt.timezone.utc
# AFTER (Pandas 2.x) — flexible check
assert str(df.index.tz) in ("UTC", "UTC+00:00") or df.index.tz == dt.timezone.utc
Search pattern: \.tz\s*==
4.4 pd.to_datetime() Format Inference
Pandas 2.x no longer guesses the format when a column contains mixed date formats. If your data has inconsistent formats, you must handle this explicitly.
The format="mixed" and format="ISO8601" parameters are only available in Pandas 2.0+. If your code must run on both Pandas 1.x and 2.x, use the fallback pattern below.
Cross-version fallback pattern (recommended):
# Safe for both Pandas 1.x and 2.x
def parse_dates(series, dateformat=None, utc=False):
"""Parse dates with graduated fallback for cross-version compatibility."""
try:
return pd.to_datetime(series, format=dateformat, utc=utc)
except ValueError:
# Pandas 2.x enforces strict format matching. Try fallback strategies
# to handle minor format variations (e.g. ISO 'T' separator vs space).
fallbacks = [None, "mixed"] if dateformat else ["mixed"]
for fmt in fallbacks:
try:
return pd.to_datetime(series, format=fmt, utc=utc)
except (ValueError, TypeError):
continue
raise
This pattern works because:
- On Pandas 1.x, the initial call with
format=dateformatusually succeeds (lenient matching), andformat=Nonealso works as a fallback. - On Pandas 2.x, if strict matching fails,
format=None(auto-infer) is tried first, thenformat="mixed"as a last resort. - The
format="mixed"call is only reached on Pandas 2.x where it's available.
If you only need Pandas 2.x support:
# Pandas 2.x only
dates = pd.to_datetime(series, format="mixed")
# OR for ISO 8601 strings
dates = pd.to_datetime(series, format="ISO8601")
# OR specify exact format
dates = pd.to_datetime(series, format="%Y-%m-%d %H:%M:%S")
When to use which:
format="mixed": Data contains multiple different formats (e.g., some rows"2024-01-01", others"01/01/2024")format="ISO8601": All dates are ISO 8601 but with varying precision (e.g., some with seconds, some without)- Explicit format string: All dates follow the same format
Search pattern: pd\.to_datetime\( where no format= parameter is specified — check if the input data could have mixed formats.
4.5 datetime64 Resolution Changes
Pandas 2.x supports multiple datetime resolutions (datetime64[s], [ms], [us], [ns]) instead of only nanoseconds. This can cause issues when combining data with different resolutions.
# If you encounter resolution mismatch errors:
df.index = df.index.as_unit("ns") # convert to nanoseconds
# OR
series = series.dt.as_unit("ns")
5. DataFrame Operation Changes
5.1 Year-String Indexing
In Pandas 2.x, df["2024"] looks for a column named "2024" rather than filtering a DatetimeIndex by year.
# BEFORE (Pandas 1.x)
result = df["2024"]
result = df["2024"]["column_name"]
# AFTER (Pandas 2.x) — use .loc[]
result = df.loc["2024"]
result = df.loc["2024", "column_name"]
Search pattern: df\["[0-9]{4}"\]
5.2 Mixed-Type DataFrame Operations
Operations like .sum(), comparisons (<, >), and .clip() now raise errors when the DataFrame contains non-numeric columns (e.g., datetime or string columns).
# BEFORE (Pandas 1.x) — silently skipped non-numeric columns
total = df.sum()
negative_mask = df < 0
# AFTER (Pandas 2.x) — select numeric columns first
total = df.sum(numeric_only=True)
# OR
total = df[column_name].sum()
numeric_cols = df.select_dtypes(include=["number"]).columns
negative_mask = df[numeric_cols] < 0
For .clip():
# BEFORE (Pandas 1.x)
df = df.clip(lower=0)
# AFTER (Pandas 2.x)
numeric_cols = df.select_dtypes(include=["number"]).columns
df[numeric_cols] = df[numeric_cols].clip(lower=0)
Search pattern: \.sum\(\), \.clip\(, df\s*[<>] — check if the DataFrame could contain non-numeric columns.
5.3 DataFrame.columns & list Deprecated
Using the & operator between an Index and a list is deprecated. Use .intersection() instead.
# BEFORE (Pandas 1.x)
columns = df.columns & ["col1", "col2", "col3"]
# AFTER (Pandas 2.x)
columns = df.columns.intersection(["col1", "col2", "col3"])
Search pattern: \.columns\s*&\s*\[
5.4 groupby([single_column]) Key Type Changed
When grouping by a single column wrapped in a list, Pandas 2.x returns tuple keys (e.g., ("A",)) instead of scalar keys (e.g., "A"). Remove the list wrapper for single-column groupby.
# BEFORE (Pandas 1.x) — key is "A" (scalar)
for key, group in df.groupby([column_name]):
print(key) # "A"
# AFTER (Pandas 2.x) — remove list wrapper to get scalar keys
for key, group in df.groupby(column_name):
print(key) # "A"
Important: Only change this for single column groupby. Multi-column groupby should keep the list:
# Multi-column — keep the list
for key, group in df.groupby([col1, col2]):
print(key) # ("A", "B") — tuple in both versions
Search pattern: \.groupby\(\[ — check if only one column is inside the brackets.
5.5 value_counts().reset_index() Column Names Changed
The output column names from .value_counts().reset_index() have changed.
# Pandas 1.x result columns: ["index", "column_name"]
# Pandas 2.x result columns: ["column_name", "count"]
# BEFORE (Pandas 1.x)
counts = df["status"].value_counts().reset_index()
value = counts["index"][0]
count = counts["status"][0]
# AFTER (Pandas 2.x) — use positional access for version-agnostic code
counts = df["status"].value_counts().reset_index()
value_col = counts.columns[0] # the original values
count_col = counts.columns[1] # the counts
value = counts[value_col][0]
count = counts[count_col][0]
Search pattern: \.value_counts\(\)\.reset_index\(\)
5.6 Series Assignment to Filtered DataFrames
Assigning a Series to filtered DataFrame rows can fail when the index has duplicate labels. Use .values to convert to a numpy array first.
# BEFORE (Pandas 1.x)
df.loc[mask, "column"] = some_series
# AFTER (Pandas 2.x) — use .values to bypass reindexing
df.loc[mask, "column"] = some_series[mask].values
Search pattern: \.loc\[.*\]\s*=.*[^.]\bvalues\b — look for .loc[mask, col] = series without .values.
5.7 Copy-on-Write and Chained Indexing
Pandas 2.x enables Copy-on-Write by default. Chained indexing (getting a value through two successive [] operations) no longer modifies the original DataFrame.
# BEFORE (Pandas 1.x) — modified df in place
df["column"][mask] = new_value
# AFTER (Pandas 2.x) — use .loc[] for direct modification
df.loc[mask, "column"] = new_value
Search pattern: df\[["'][^"']+["']\]\[ — look for df["col"][...].
5.8 inplace=True Deprecated
The inplace parameter is deprecated on most DataFrame/Series methods. Use assignment instead.
# BEFORE (Pandas 1.x)
df.reset_index(inplace=True)
df.sort_values("col", inplace=True)
df.drop(columns=["col"], inplace=True)
df.fillna(0, inplace=True)
# AFTER (Pandas 2.x)
df = df.reset_index()
df = df.sort_values("col")
df = df.drop(columns=["col"])
df = df.fillna(0)
Search pattern: inplace\s*=\s*True
6. Timedelta Handling
Timedelta operations have several compatibility pitfalls. This section covers patterns that are safe across Pandas versions.
6.1 pd.Timedelta with Year/Month Units
Year ("Y", "y") and month ("M") units are no longer accepted because they're ambiguous (a year can be 365 or 366 days; a month can be 28–31 days).
# BEFORE (Pandas 1.x)
pd.Timedelta("100y")
pd.Timedelta("6M")
# AFTER (Pandas 2.x) — use explicit days
pd.Timedelta(days=36500) # ~100 years
pd.Timedelta(days=180) # ~6 months
# OR use DateOffset for calendar-aware offsets
pd.DateOffset(years=100)
pd.DateOffset(months=6)
Note: pd.DateOffset respects calendar months/years (e.g., adding 1 month to Jan 31 gives Feb 28), while pd.Timedelta(days=30) always adds exactly 30 days. Use whichever is correct for your business logic.
Search pattern: pd\.Timedelta\(\s*["'][0-9]+[yYmM]["']\s*\)
6.2 Converting Timedelta to Seconds or Days
The .dt.total_seconds() accessor and division by pd.Timedelta() can behave differently across versions. The safest cross-version approach uses numpy:
import numpy as np
# BEFORE (Pandas 1.x) — may fail or give wrong results in 2.x
seconds = timedelta_series.dt.total_seconds()
seconds = timedelta_series / pd.Timedelta(seconds=1)
days = timedelta_series.dt.days
# AFTER (safe across versions) — use numpy timedelta64
seconds = pd.Series(
timedelta_series.values / np.timedelta64(1, 's'),
index=timedelta_series.index
)
days = pd.Series(
timedelta_series.values / np.timedelta64(1, 'D'),
index=timedelta_series.index
).astype(int)
Key rule: Always use .values to get the underlying numpy array before dividing by np.timedelta64().
Search pattern: \.dt\.total_seconds\(\), \.dt\.days, /\s*pd\.Timedelta\(
6.3 Timedelta .astype(int) Returns Nanoseconds
When you call .astype(int) on a timedelta column, it converts to the internal representation (nanoseconds), not seconds. This can silently produce values that are 1,000,000,000x larger than expected.
# BEFORE (Pandas 1.x) — often appeared to work because of implicit conversions
interval_seconds = timedelta_column.astype(int)
# Danger: returns nanoseconds, not seconds!
# AFTER — convert to seconds explicitly first
interval_seconds = (timedelta_column.values / np.timedelta64(1, 's')).astype(int)
Search pattern: timedelta.*\.astype\(\s*int\s*\), \.astype\(\s*int\s*\) on columns that might contain timedelta values.
6.4 Extracting Values from Timedelta Columns
When you extract a single value from a timedelta column (e.g., df[column][0]), the result type depends on the pandas version and operation history. It may be:
- A float (seconds) — use directly
- A timedelta64 object — divide by
np.timedelta64(1, 's') - An int64 containing nanoseconds — divide by 1,000,000,000
import numpy as np
# Robust extraction pattern
raw_value = df[column].iloc[0]
try:
numeric_value = int(raw_value)
except (TypeError, ValueError):
# It's a timedelta object — convert to seconds
value_in_seconds = int(raw_value / np.timedelta64(1, 's'))
else:
# Check if it's nanoseconds (> ~10 years in seconds)
if numeric_value > 315_360_000:
value_in_seconds = numeric_value // 1_000_000_000
else:
value_in_seconds = numeric_value
7. Migrating Flow Rules
This section walks through the most common patterns found in Energyworx Flow rules and how to update them.
7.1 Timezone Conversion Pattern
This is the most common pattern in rules — converting between UTC and local time:
class MyRule(AbstractRule):
def apply(self, **kwargs):
local_tz = self.datasource.timezone
df = self.dataframe[[self.source_column]].copy()
# Convert to local time for business logic
df = df.tz_convert(local_tz)
# MIGRATION CHECK: If you create timestamps for slicing,
# make sure they are timezone-aware
# BEFORE:
start = pd.Timestamp("2024-01-01")
# AFTER:
start = pd.Timestamp("2024-01-01", tz=local_tz)
# Process...
result = df.loc[start:]
# Convert back to UTC
result = result.tz_convert("UTC")
return RuleResult(result=result)
7.2 Resampling Pattern
Many rules aggregate data using .resample(). Update frequency aliases and check closed parameter usage:
class MyAggregationRule(AbstractRule):
def apply(self, interval="h", **kwargs):
df = self.dataframe[[self.source_column]].copy()
# BEFORE:
resampled = df.resample("H", closed="right", label="right").sum()
# AFTER:
resampled = df.resample("h", closed="right", label="right").sum()
# Note: 'closed' parameter still works on resample() — only
# pd.date_range() replaced it with 'inclusive'.
return RuleResult(result=resampled)
7.3 Date Range Generation Pattern
Rules that generate date ranges (e.g., for gap filling or profile creation):
class MyGapFillRule(AbstractRule):
def apply(self, heartbeat=3600, **kwargs):
start = self.dataframe.index[0]
end = self.dataframe.index[-1]
# BEFORE:
full_range = pd.date_range(start, end, freq="{}s".format(heartbeat), closed="right")
# AFTER:
full_range = pd.date_range(start, end, freq="{}s".format(heartbeat), inclusive="right")
return RuleResult()
7.4 Using self.flow_timestamp
The self.flow_timestamp is always timezone-naive UTC. In Pandas 2.x, you must localize it before using it with timezone-aware data:
class MyRule(AbstractRule):
def apply(self, **kwargs):
# BEFORE — worked with implicit conversion:
flow_ts = pd.Timestamp(self.flow_timestamp)
df = self.dataframe.loc[:flow_ts]
# AFTER — explicit timezone:
flow_ts = pd.Timestamp(self.flow_timestamp, tz="UTC")
df = self.dataframe.loc[:flow_ts]
return RuleResult()
7.5 Concatenating DataFrames in Rules
Rules that combine data from multiple sources:
class MyCombineRule(AbstractRule):
def prepare_context(self, other_datasource_id, **kwargs):
return {
"prepare_datasource_ids": [other_datasource_id],
"other_id": other_datasource_id,
}
def apply(self, **kwargs):
other_ds = self.prepared_datasources[self.context["other_id"]]
other_df = self.load_timeseries(other_ds.id, [self.source_column],
self.dataframe.index[0],
self.dataframe.index[-1])
# BEFORE:
combined = self.dataframe.append(other_df)
# AFTER:
combined = pd.concat([self.dataframe, other_df])
return RuleResult(result=combined)
7.6 Grouper with Frequency
Rules that group by time periods:
class MyMonthlyRule(AbstractRule):
def apply(self, **kwargs):
df = self.dataframe[[self.source_column]].copy()
# BEFORE:
monthly = df.groupby(pd.Grouper(freq="M")).sum()
hourly = df.groupby(pd.Grouper(freq="1H")).mean()
# AFTER:
monthly = df.groupby(pd.Grouper(freq="M")).sum() # keep "M" — "ME" is 2.2+ only
hourly = df.groupby(pd.Grouper(freq="1h")).mean() # "h" is safe to change now
return RuleResult(result=monthly)
7.7 Sum on DataFrames with Multiple Column Types
Rules that sum across all columns when some columns are non-numeric:
class MyValidationRule(AbstractRule):
def apply(self, **kwargs):
df = self.dataframe.copy()
# BEFORE — silently skipped datetime columns:
total = df.sum().values[0]
# AFTER — specify the column or use numeric_only:
total = df[self.source_column].sum()
# OR
total = df.sum(numeric_only=True).values[0]
return RuleResult()
8. Migrating Market Adapters
Market Adapters typically use pandas for parsing files and reshaping data. The most common migration issues are in the split() and adapt() methods.
8.1 CSV Parsing with Date Columns
class MyCSVAdapter(PluggableMarketAdapter):
def adapt(self, content, current_datetime, **kwargs):
df = pd.read_csv(
io.StringIO(content),
# BEFORE:
parse_dates=["date_col"],
infer_datetime_format=True,
# AFTER — remove infer_datetime_format:
parse_dates=["date_col"],
)
# If dates have mixed formats, parse separately:
# df["date_col"] = pd.to_datetime(df["date_col"], format="mixed")
return self.normalize_csv(df.to_csv(index=False))
8.2 Groupby for Splitting by Datasource
class MyCSVAdapter(PluggableMarketAdapter):
def split(self, content, **kwargs):
df = pd.read_csv(io.StringIO(content), dtype=str)
# BEFORE — single column in list:
for datasource_id, group in df.groupby(["meter_id"]):
# datasource_id was a scalar in 1.x, tuple in 2.x
yield group.to_csv(index=False)
# AFTER — remove list wrapper for single column:
for datasource_id, group in df.groupby("meter_id"):
# datasource_id is always a scalar
yield group.to_csv(index=False)
8.3 Horizontal-to-Vertical Format Conversion
Adapters that reshape horizontal (wide) data into vertical (long) format:
class MyHorizontalAdapter(PluggableMarketAdapter):
def adapt(self, content, current_datetime, **kwargs):
df = pd.read_csv(io.StringIO(content), dtype=str)
dates = pd.to_datetime(df["date"])
# Creating time intervals
intervals = pd.timedelta_range(start="0h", periods=24, freq="1h")
# BEFORE:
all_intervals = intervals.append(pd.TimedeltaIndex([pd.Timedelta(hours=25)]))
# AFTER:
all_intervals = intervals.append(pd.TimedeltaIndex([pd.Timedelta(hours=25)]))
# Note: TimedeltaIndex.append() still works, but prefer concat for DataFrames:
# all_intervals = pd.TimedeltaIndex(
# list(intervals) + [pd.Timedelta(hours=25)]
# )
return self.normalize_json(result)
8.4 Excel File Handling
class MyExcelAdapter(PluggableMarketAdapter):
def adapt(self, content, current_datetime, **kwargs):
excel_file = pd.ExcelFile(io.BytesIO(content.encode()))
df = excel_file.parse("Sheet1")
# BEFORE — if writing Excel output:
writer = pd.ExcelWriter(output, engine="xlsxwriter")
df.to_excel(writer, sheet_name="Output")
writer.save() # Removed in 2.x
# AFTER:
writer = pd.ExcelWriter(output, engine="xlsxwriter")
df.to_excel(writer, sheet_name="Output")
writer.close() # Use close() instead
return self.normalize_csv(df.to_csv(index=False))
8.5 Replacing pd.np.nan in Adapters
Many adapters use pd.np.nan to replace or detect missing values. This alias was removed in Pandas 2.x.
class MyCSVAdapter(PluggableMarketAdapter):
def split(self, element, **kwargs):
import pandas as pd
# BEFORE:
# group.replace({pd.np.nan: ''}, inplace=True)
# AFTER:
import numpy as np
group = group.replace({np.nan: ''})
yield group.values.tolist()
8.6 Date Range in Adapters
Adapters that create date ranges for timeseries output:
class MyDomainAdapter(PluggableMarketAdapter):
def adapt(self, content, current_datetime, **kwargs):
# BEFORE:
timestamps = pd.date_range(
start=dt.datetime(2024, 1, 1, tzinfo=pytz.UTC),
end=dt.datetime(2024, 1, 1, 23, 0, 0, tzinfo=pytz.UTC),
freq='1H' # deprecated alias
)
# AFTER:
timestamps = pd.date_range(
start=dt.datetime(2024, 1, 1, tzinfo=pytz.UTC),
end=dt.datetime(2024, 1, 1, 23, 0, 0, tzinfo=pytz.UTC),
freq='1h' # lowercase alias
)
df = pd.DataFrame({"channel_1": values}, index=timestamps)
timeseries = self.create_timeseries(df=df, datasource=ds, version=current_datetime)
self.output_timeseries(timeseries)
9. Error Message Reference
When you encounter one of these errors, use the table to find the fix:
| Error Message | Cause | Fix |
|---|---|---|
'DataFrame' object has no attribute 'append' | DataFrame.append() removed | Use pd.concat([df1, df2]) — Section 2.1 |
got an unexpected keyword argument 'closed' | closed parameter removed from date_range() | Use inclusive= — Section 2.2 |
got an unexpected keyword argument 'method' on get_loc | method removed from get_loc() | Use get_indexer() — Section 2.3 |
'DataFrame' object has no attribute 'ix' | .ix[] removed | Use .loc[] or .iloc[] — Section 2.4 |
'XlsxWriter' object has no attribute 'save' | save() removed | Use .close() — Section 2.5 |
'Index' object has no attribute 'is_monotonic' | is_monotonic removed | Use is_monotonic_increasing — Section 2.6 |
got an unexpected keyword argument 'infer_datetime_format' | Parameter removed | Remove the parameter — Section 2.7 |
module 'pandas' has no attribute 'np' | pd.np removed | Use import numpy as np and np.nan — Section 2.8 |
FutureWarning: 'H' is deprecated and will be removed... | Old frequency alias | Use 'h' — Section 3 |
FutureWarning: 'M' is deprecated...use 'ME'... | Old frequency alias | Keep 'M' for now — 'ME' is 2.2+ only. See Section 3 |
Units 'M', 'Y' and 'y' do not represent unambiguous timedelta values | Ambiguous Timedelta unit | Use days: pd.Timedelta(days=N) — Section 6.1 |
Cannot compare tz-naive and tz-aware datetime-like objects | Mixed timezone awareness | Add tz="UTC" or .tz_localize("UTC") — Section 4.1 |
cannot reindex on an axis with duplicate labels | Series reindexing conflict | Use .values — Section 5.6 |
'DatetimeArray' with dtype datetime64[ns] does not support reduction 'sum' | Summing datetime columns | Use numeric_only=True — Section 5.2 |
UFuncBinaryResolutionError | Timedelta division incompatibility | Use np.timedelta64() — Section 6.2 |
10. Search Patterns for Your Code
Use these patterns to scan your code for potential migration issues. Each can be used with your IDE's search (regex mode) or with grep -E.
Critical — Will Error
# DataFrame/Series.append()
\.append\(
# pd.date_range with closed=
date_range\([^)]*closed\s*=
# Deprecated frequency aliases — safe to change now
(?:resample|date_range|Grouper|Timedelta)\([^)]*["'][^"']*(?<![a-zA-Z])H(?!z)["']
# Deprecated frequency aliases — DO NOT change yet (2.2+ only)
# These will emit FutureWarning but still work. Keep as-is for cross-version compat.
# freq\s*=\s*["']M["']
# freq\s*=\s*["'][AY]["']
# pd.Timedelta with year/month units
pd\.Timedelta\(\s*["'][0-9]+[yYmM]["']
# get_loc with method=
\.get_loc\([^)]*method\s*=
# .ix[] accessor
\.ix\[
# ExcelWriter.save()
\.save\(\)
# is_monotonic (without _increasing/_decreasing)
\.is_monotonic(?!_)
# infer_datetime_format
infer_datetime_format
# pd.np (e.g., pd.np.nan)
pd\.np\.
High Priority — Silent Behavior Changes
# pd.to_datetime without format (check for mixed data)
pd\.to_datetime\([^)]*\)(?![^)]*format\s*=)
# value_counts().reset_index()
\.value_counts\(\)\.reset_index\(\)
# groupby with single column in list
\.groupby\(\[[^\],]+\]\)
# .columns & list
\.columns\s*&\s*\[
# .sum() on DataFrames (check for non-numeric columns)
\.sum\(\s*\)
# Timezone-naive Timestamps used with tz-aware data
pd\.Timestamp\([^)]*\)(?![^)]*tz\s*=)
# .date() on tz-aware timestamps
\.date\(\)
# Year-string indexing
\[["'][0-9]{4}["']\]
# Chained indexing
df\[["'][^"']+["']\]\[
Medium Priority — Deprecation Warnings
# inplace=True
inplace\s*=\s*True
# Timezone comparisons
\.tz\s*==
# .dt.total_seconds() on timedelta
\.dt\.total_seconds\(\)
# timedelta .astype(int)
\.astype\(\s*int\s*\)
11. Testing Your Migration
Step-by-Step Testing Approach
-
Search your code using the patterns from Section 10 to identify all affected lines.
-
Apply the fixes from this guide, working through one category at a time.
-
Run your unit tests if you have them. Pay special attention to:
- Tests that create DataFrames with timezone-aware indices
- Tests that use timedelta operations
- Tests that assert on specific column names after
value_counts()
-
Test with real data on a non-production environment. Check:
- Do resample operations produce the same number of output rows?
- Are timezone conversions producing correct local times?
- Are numeric aggregations (sum, mean) returning the same values?
- Are date ranges generating the correct number of timestamps?
Common Verification Checks
# Verify resample output hasn't changed
# Run with both old and new code, compare:
assert old_result.shape == new_result.shape
assert (old_result.values == new_result.values).all()
# Verify timezone handling
assert df.index.tz is not None, "Index should be timezone-aware"
# Verify date_range output
old_range = pd.date_range(start, end, freq="1h", inclusive="both")
assert len(old_range) == expected_count
Warnings to Watch For
After migration, run your code and watch for these FutureWarning messages in the console output — each indicates something that will break in a future pandas version:
FutureWarning: ... is deprecated and will be removed in a future version— frequency alias needs updatingFutureWarning: The behavior of DataFrame.sum with axis=None is deprecated— addaxis=parameterFutureWarning: Downcasting object dtype arrays...— explicit dtype conversion neededFutureWarning: Setting an item of incompatible dtype...— check dtype compatibility
This guide is based on the Pandas 1.x → 2.x migration of the Energyworx platform (March 2026). For the official Pandas migration documentation, see the Pandas 2.0 What's New and Pandas 2.2 What's New.