Release Notes: Version 23.03
Here's a sneak peek at what's new:
Platform Changes: Defects, Improvements, and New Features
Defects
- When a new data is ingested for a datasource, the previous annotations are now correctly overwritten.
- Fix and provide a better error message when the user come back after being idle/away from the console for a while.
- Fixed bug where load-side-input would not always use cached data from ingestion or previous flow
- Blacklisted in user settings channels do not show anymore on the DS details page in "Channels" section
- After returning all the search result hits on Search Datasources page, if the user clicks 'View More', the results page does not reset anymore.
- Tag names and tag property keys are now checked for their values. Incorrect values will no longer be stored, but raise an error and Audit Event instead.
- Invalid datasource and channel classifiers in flow and transformation configurations now create Audit Events when attempted to be used.
- Invalid transformation configurations configured in market adapters now create Audit Events when attempted to be used.
Stories and Improvements
- We streamline the user experience when the user tries to start running a flow, depending on which type of flow the user selects. If they choose to start a new datasource flow, then they would not see the source channel, but if they choose to start a new channel flow, then they can see (and select) and source channel. This is done to improve the user experience consistencies across platform.
- We streamline the user experience for the Flow Design when it is opened on Datasource Flow Scope, the source & destination channels are not visible anymore. This is done to improve the user experience consistencies across platform.
- Json format file is recognised and can be mapped in Transformation Configuration.
- All generic market adapters now check the extension of the file before ingesting
- Caching can be disabled for flow and decision tree resource configurations on staging environments.
- Reduce model inference time when iterated in a rule by implementing caching to fetch model by name.
- Caching can be disabled globally for all resource configurations on staging environments.
- Any non-source datasource now has a flow metadata property in the datasource view to show what the source was of that flow.
New Features
We introduce new ML model training functionalities where the user can train their ML models using the data from the Energyworx platform. A query to prepare the data and a training script according to our training framework is required.
Currently supported ML frameworks are: scikit-learn, xgboost, and xgboost_multi_output. The xgboost_multi_output is required if the prediction produces more than one output at a time. We also support non-distributed hyperparameter tuning with hyper-opt. We also provide docker images / dockerfiles to test locally the user's code.
When the training succeeds, the trained model and its metadata is uploaded in the file manager. When the training job fails, the traceback info is uploaded as well. The ML framework also allows a flexible workflow where the user can train the same model with different data by configuring \sqlqueryfile\\ as filename in trainingflow parameters.
Argo Workflow
The workflow yaml file to integrate different steps of training pipeline and manage internal artifacts and parameters between steps.
Includes an image with necessary permissions and packages and retrieval go script as an entry point taking sqlquery as input argument from workflow.
Allows to create a placeholder file in the job submitter. This is needed to allow successful runs of the step which expects an output. The output is needed in case of failure to pass the metadata to the exit hook.
Argo expects the artifact output, once configured, to always have the output artifact present even in successful runs.
Metadata Persistence
Adds the metadata writer WorkflowTemplate. It receives a metadata file and copies it to a destination GCS bucket. This template can be deployed to the argo training installation.
Once the WorkflowTemplate is installed, its templates can be used in Workflows. You can use this workflow template on its own to test it without the other components.
BigQuery Usage Dashboard
This dashboard can be used for detecting queries that use up a lot of slottime/bytes
Every Monday a report will be sent to customer about the previous week
New method to store timeseries data
We introduced new methods for AbstractRule:
- store_timeseries(timeseries, channel_id=None, *, datasource_id=None, store_in_flow_data=True)
- store_annotations(annotations, channel_id, annotation_name=None, *, datasource_id=None, store_in_flow_data=True)
On top of that, we also introduced changes in the functionalities such as:
- Storing timeseries or annotations can be done independently of each other.
- Both methods can take a Series or a DataFrame.
- Timeseries and annotations can have any timezone now.
- Channels are only created on the datasource if actual data is required to be stored on them.
- A DataFrame of timeseries will store all columns independently on the channels which are the column names. Any channel classifier that does not exist on the namespace will raise an error and abort the flow.
- A DataFrame of annotations will store each column independently as annotations for the specific channel_id channel.
- Annotations can now have any name you want (series name or column names are used). Only requirement is that this name is unique.
- Annotations can be created on non-existing datapoints. This also means that gap annotations will now actually be stored on the gap itself instead of one annotation after the gap.
- Multiple annotations can be created on the same datapoint within the same rule.
- Multiple versions of annotations can be stored on the same timeseries version.
- Timeseries and annotations can be stored on ANY prepared datasource, bringing it in line with how you can add tags to any prepared datasource.
- Timeseries and annotations stored with store_XXX can optionally also be added to the flow data (which self.dataframe is created from), which is turned on by default. This only works for data stored on this datasource. Data stored on other (prepared) datasources will never be added to the flow data.
- Annotations stored with store_annotations are recognized by self.data_filter.
- Heartbeat channels are no longer supported.