Skip to content

Conversation

@Dekermanjian
Copy link
Contributor

This is a draft proposal for #598

The idea is to handle each component separately using _set_{component} methods and all information are stored using data classes for easy mapping.

I believe this will simplify our tests of these components and will reduce redundancies where we have the same information spread across multiple sub-components like data_names and data_info.

@jessegrabowski let me know what you think I put a little notebook together to showcase the changes.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@Dekermanjian Dekermanjian marked this pull request as draft November 2, 2025 17:50
Copy link
Member

@jessegrabowski jessegrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great first pass, much cleaner than what we have now.



@dataclass
class ParameterProperty:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class ParameterProperty:
class Parameter(StatespaceProperty):

Let's make the names really dumb and simple



@dataclass
class ParameterProperties:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class ParameterProperties:
class ParameterInfo(Info[StatespaceParameter]):

Same here. I'm also not against just calling it StatespaceParameters (plural), but that's a bit less obvious.


from pymc_extras.statespace.models.structural.core import Component
from pymc_extras.statespace.utils.constants import TIME_DIM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level comment: There's a lot of code duplication here because of all the shared functionality between the different types of parameters. Use a superclass:

@dataclass(frozen=True)
class Property:
    def __str__(self) -> str:
        return "\n".join(f"{f.name}: {getattr(self, f.name)}" for f in fields(self))


T = TypeVar("T", bound=Property)


@dataclass(frozen=True)
class Info(Generic[T]):
    items: list[T]
    key_field: str = "name"

    def _key(self, item: T) -> str:
        return getattr(item, self.key_field)

    def get(self, key: str) -> T | None:
        return next((i for i in self.items if self._key(i) == key), None)

    def __getitem__(self, key: str) -> T:
        result = self.get(key)
        if result is None:
            raise KeyError(f"No {self.key_field} '{key}'")
        return result

    def __contains__(self, key: str) -> bool:
        return any(self._key(i) == key for i in self.items)

    def __str__(self) -> str:
        return f"{self.key_field}s: {[self._key(i) for i in self.items]}"

Comment on lines +61 to +66
needs_exogenous_data: bool = field(default=False, init=False)

def __post_init__(self):
for d in self.data:
if d.is_exogenous:
self.needs_exogenous_data = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
needs_exogenous_data: bool = field(default=False, init=False)
def __post_init__(self):
for d in self.data:
if d.is_exogenous:
self.needs_exogenous_data = True
@property
def needs_exogenous_data(self) -> bool:
return any(d.is_exogenous for d in self.items)

Just compute this on demand with a property instead of going through post init hoops. In this case you can also set frozen=True.

Comment on lines +46 to +55
@dataclass
class DataProperty:
name: str
shape: tuple[int, ...]
dims: tuple[str, ...]
is_exogenous: bool

def __str__(self):
base = f"name: {self.name}\nshape: {self.shape}\ndims: {self.dims}\nis_exogenous: {self.is_exogenous}"
return base
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example, after you have the base classes, this just becomes:

@dataclass(frozen=True)
class Data(Property):
    name: str
    shape: tuple[int, ...]
    dims: tuple[str, ...]
    is_exogenous: bool


self.coords = CoordProperties(coords=[regression_state_prop, endogenous_state_prop])

def populate_component_properties(self) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in the Component superclass, and not touched at all by the subclasses.

You will also have to add each _set_foo() method to the superclass, along with some kind of default behavior.

@jessegrabowski
Copy link
Member

We can also keep all of the existing properties like state_names, shock_names, state_dims, etc, but move them to the base class and just extract the requested info from the relevant Info objects.

@jessegrabowski jessegrabowski changed the title proposal for updating propogate_component_properties using data classes Represent statespace metadata with dataclasses Nov 7, 2025
@jessegrabowski
Copy link
Member

Reflecting on it, I am convinced this is the way to go. It's 1000x more ergonomic. I made some changes to your initial code to make the API more "dictionary like", and to reduce code duplication. I moved everything to statespace/core/properties.py, because this is ultimately going to replace what we have in both the core models and the components.

@Dekermanjian
Copy link
Contributor Author

@jessegrabowski, this is looking really cool! What can I do to help push this forward?

@jessegrabowski
Copy link
Member

jessegrabowski commented Nov 7, 2025

Delete the new regression_dataclass.py and simply refactor regression.py to use the new stuff.

We should keep your notebook with the plan to add it as a new example for the docs. Or it can be merged into the custom statespace notebook. So that should also be updated to import from the new properties.py file

@Dekermanjian
Copy link
Contributor Author

Perfect! I'll work on that today!! It is really looking cool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants