AssetsDefinition.from_op
. Dagster will infer asset inputs and outputs from the ins/outs defined on the @op
in the same way as @graphs
.default_executor_def
argument. The default executor definition will be used for all op/asset jobs that don’t explicitly define their own executor.JobDefinition.run_request_for_partition
now accepts a tags
argument (Thanks @jburnich!)@multi_asset
now accepts a resource_defs argument. The provided resources can be either used on the context, or satisfy the io manager requirements of the outs on the asset.define_asset_job
and to_job
now can now accept a partitions_def
argument and a config
argument at the same time, as long as the value for the config
argument is a hardcoded config dictionary (not a PartitionedConfig
or ConfigMapping
)k8s_job_executor
and docker_executor
would sometimes return the same event lines twice in the command-line output for the step.@op
decorator (Thanks Milos Tomic!)UnresolvedAssetJobDefinition
now supports the run_request_for_partition
method.AssetOut
and AssetIn
Software-defined assets are now marked fully stable and are ready for prime time - we recommend using them whenever your goal using Dagster is to build and maintain data assets.
You can now organize software defined assets into groups by providing a group_name on your asset definition. These assets will be grouped together in Dagit.
Software-defined assets now accept configuration, similar to ops. E.g.
from dagster import asset
@asset(config_schema={"iterations": int})
def my_asset(context):
for i in range(context.op_config["iterations"]):
...
Asset definitions can now be created from graphs via AssetsDefinition.from_graph
:
@graph(out={"asset_one": GraphOut(), "asset_two": GraphOut()})
def my_graph(input_asset):
...
graph_asset = AssetsDefinition.from_graph(my_graph)
execute_in_process
and GraphDefinition.to_job
now both accept an input_values
argument, so you can pass arbitrary Python objects to the root inputs of your graphs and jobs.
Ops that return Outputs and DynamicOutputs now work well with Python type annotations. You no longer need to sacrifice static type checking just because you want to include metadata on an output. E.g.
from dagster import Output, op
@op
def my_op() -> Output[int]:
return Output(5, metadata={"a": "b"})
You can now automatically re-execute runs from failure. This is analogous to op-level retries, except at the job level.
You can now supply arbitrary structured metadata on jobs, which will be displayed in Dagit.
The partitions and backfills pages in Dagit have been redesigned to be faster and show the status of all partitions, instead of just the last 30 or so.
The left navigation pane in Dagit is now grouped by repository, which makes it easier to work with when you have large numbers of jobs, especially when jobs in different repositories have the same name.
The Asset Details page for a software-defined asset now includes a Lineage tab, which makes it easy to see all the assets that are upstream or downstream of an asset.
This release marks the official transition of software-defined assets from experimental to stable. We made some final changes to incorporate feedback and make the APIs as consistent as possible:
fs_asset_io_manager
in favor of merging its functionality with fs_io_manager
. fs_io_manager
is now the default IO manager for asset jobs, and will store asset outputs in a directory named with the asset key. Similarly, removed adls2_pickle_asset_io_manager
, gcs_pickle_asset_io_manager
, and s3_pickle_asset_io_manager
. Instead, adls2_pickle_io_manager
, gcs_pickle_io_manager
, and s3_pickle_io_manager
now support software-defined assets.@asset
decorator and AssetIn has been deprecated. Users should use key_prefix instead.define_assets_job
(replacing AssetGroup.build_job
), and arbitrary sets of assets can be materialized using the standalone function materialize (replacing AssetGroup.materialize
).outs
property of the previously-experimental @multi_asset
decorator now prefers a dictionary whose values are AssetOut
objects instead of a dictionary whose values are Out
objects. The latter still works, but is deprecated.OpExecutionContext
called output_asset_partition_key
is now deprecated in favor of asset_partition_key_for_output
get_event_records
method on DagsterInstance now requires a non-None argument event_records_filter
. Passing a None
value for the event_records_filter
argument will now raise an exception where previously it generated a deprecation warning.events_for_asset_key
and get_asset_events
, which have been deprecated since 0.12.0.dbt_assets = load_assets_from_dbt_project(..., node_info_to_asset_key=lambda node_info: AssetKey(node_info["name"])
.prior_attempts_count
parameter is now removed from step-launching APIs. This parameter was not being used, as the information it held was stored elsewhere in all cases. It can safely be removed from invocations without changing behavior.FileCache
class has been removed.A new define_asset_job
function allows you to define a selection of assets that should be executed together. The selection can be a simple string, or an AssetSelection object. This selection will be resolved into a set of assets once placed on the repository.
from dagster import repository, define_asset_job, AssetSelection
string_selection_job = define_asset_job(
name="foo_job", selection="*foo"
)
object_selection_job = define_asset_job(
name="bar_job", selection=AssetSelection.groups("some_group")
)
@repository
def my_repo():
return [
*my_list_of_assets,
string_selection_job,
object_selection_job,
]
[dagster-dbt] Assets loaded with load_assets_from_dbt_project
and load_assets_from_dbt_manifest
will now be sorted into groups based on the subdirectory of the project that each model resides in.
@asset
and @multi_asset
are no longer considered experimental.
Adds new utility methods load_assets_from_modules
, assets_from_current_module
, assets_from_package_module
, and assets_from_package_name
to fetch and return a list of assets from within the specified python modules.
Resources and io managers can now be provided directly on assets and source assets.
from dagster import asset, SourceAsset, resource, io_manager
@resource
def foo_resource():
pass
@asset(resource_defs={"foo": foo_resource})
def the_resource(context):
foo = context.resources.foo
@io_manager
def the_manager():
...
@asset(io_manager_def=the_manager)
def the_asset():
...
Note that assets provided to a job must not have conflicting resource for the same key. For a given job, all resource definitions must match by reference equality for a given key.
A materialize_to_memory
method which will load the materializations of a provided list of assets into memory:
from dagster import asset, materialize_to_memory
@asset
def the_asset():
return 5
result = materialize_to_memory([the_asset])
output = result.output_for_node("the_asset")
A with_resources
method, which allows resources to be added to multiple assets / source assets at once:
from dagster import asset, with_resources, resource
@asset(required_resource_keys={"foo"})
def requires_foo(context):
...
@asset(required_resource_keys={"foo"})
def also_requires_foo(context):
...
@resource
def foo_resource():
...
requires_foo, also_requires_foo = with_resources(
[requires_foo, also_requires_foo],
{"foo": foo_resource},
)
You can now include asset definitions directly on repositories. A default_executor_def
property has been added to the repository, which will be used on any materializations of assets provided directly to the repository.
from dagster import asset, repository, multiprocess_executor
@asset
def my_asset():
...
@repository(default_executor_def=multiprocess_executor)
def repo():
return [my_asset]
The run_storage
, event_log_storage
, and schedule_storage
configuration sections of the dagster.yaml
can now be replaced by a unified storage
configuration section. This should avoid duplicate configuration blocks with your dagster.yaml
. For example, instead of:
# dagster.yaml
run_storage:
module: dagster_postgres.run_storage
class: PostgresRunStorage
config:
postgres_url: { PG_DB_CONN_STRING }
event_log_storage:
module: dagster_postgres.event_log
class: PostgresEventLogStorage
config:
postgres_url: { PG_DB_CONN_STRING }
schedule_storage:
module: dagster_postgres.schedule_storage
class: PostgresScheduleStorage
config:
postgres_url: { PG_DB_CONN_STRING }
You can now write:
storage:
postgres:
postgres_url: { PG_DB_CONN_STRING }
All assets where a group_name
is not provided are now part of a group called default
.
The group_name parameter value for @asset
is now restricted to only allow letters, numbers and underscore.
You can now set policies to automatically retry Job runs. This is analogous to op-level retries, except at the job level. By default the retries pick up from failure, meaning only failed ops and their dependents are executed.
[dagit] The new repository-grouped left navigation is fully launched, and is no longer behind a feature flag.
[dagit] The left navigation can now be collapsed even when the viewport window is wide. Previously, the navigation was collapsible only for small viewports, but kept in a fixed, visible state for wide viewports. This visible/collapsed state for wide viewports is now tracked in localStorage, so your preference will persist across sessions.
[dagit] Queued runs can now be terminated from the Run page.
[dagit] The log filter on a Run page now shows counts for each filter type, and the filters have higher contrast and a switch to indicate when they are on or off.
[dagit] The partitions and backfill pages have been redesigned to focus on easily viewing the last run state by partition. These redesigned pages were previously gated behind a feature flag — they are now loaded by default.
[dagster-k8s] Overriding labels in the K8sRunLauncher will now apply to both the Kubernetes job and the Kubernetes pod created for each run, instead of just the Kubernetes pod.
docker_executor
to run each step of a job in a different Docker container. env_vars
field to the EcsRunLauncher that allows you to configure environment variables in the ECS task for launched runs.env_vars
field on K8sRunLauncher
and k8s_job_executor
can now except input of the form ENV_VAR_NAME=ENV_VAR_VALUE, and will set the value of ENV_VAR_NAME to ENV_VAR_VALUE. Previously, it only accepted input of the form ENV_VAR_NAME, and the environment variable had to be available in the pod launching the job.IOManager.load_input
was called to load an asset that was not being materialized as part of the run, the provided context would not include the metadata for that asset. context.upstream_output.metadata
now correctly returns the metadata on the upstream asset.list[str]
) as the type of an input would raise an exception.metadata
parameter) and viewed in dagit. You can use it to track code owners, link to docs, or add other useful information.config_schema
. If you attempt to materialize an asset with a config schema in Dagit, you'll be able to enter the required config via a modal.k8s_job_executor
where ops with long names sometimes failed to create a pod due to a validation error with the label names automatically generated by Dagster.fs_asset_io_manager
has been removed in favor of merging its functionality with fs_io_manager
. fs_io_manager
is now the default IO manager for asset jobs, and will store asset outputs in a directory named with the asset key.k8s_job_executor
’s max_conccurent
configuration. Thanks @fahadkh!fs_io_manager
to incorrectly handle assets associated with upstream assets. Thanks @aroig!protobuf
version 3 due to a backwards incompatible change in the probobuf
version 4 release.AssetGroup.build_job()
, you can define selections which select subsets of the loaded dbt project.load_assets_from_dbt_manifest
function now supports an experimental select
parameter. This allows you to use dbt selection syntax to select from an existing manifest.json file, rather than having Dagster re-compile the project on demand.OpExecutionContext
now exposes an asset_key_for_output
method, which returns the asset key that one of the op’s outputs corresponds too.python -m dagster.daemon
.non_argument_deps
parameter for the asset
and multi_asset
decorators can now be a set of strings in addition to a set of AssetKey
.InvalidSubsetError
. This is now fixed.AssetKey
s.snowflake_io_manager
would sometimes raise an error with pandas
1.4 or later installed.DagsterExecutionStepNotFoundError
. This is now fixed.AssetIn
can now accept a string that will be coerced to an AssetKey
. Thanks @aroig!dagster-gcp
now have user-configurable timeout length. Thanks @3cham!AssetsDefinition.from_graph
now accepts a partitions_def
argument.@asset
-decorated functions can now accept variable keyword arguments.dagster instance info
now prints the current schema migration state for the configured instance storage.docs_url
on the dbt_cli_resource
. If this value is set, AssetMaterializations associated with each dbt model will contain a link to the dbt docs for that model.dbt_cloud_host
on the dbt_cloud_resource
, in the case that your dbt cloud instance is under a custom domain.InputContext.upstream_output
was missing the asset_key
when it referred to an asset outside the run.selection
parameter in AssetGroup.build_job()
, the generated job would include an incorrect set of assets in certain situations. This has been fixed.DagsterInstanceSchemaOutdated
exception if the instance storage was not up to date with the latest schema. We no longer wrap these exceptions, allowing the underlying exceptions to bubble up.adls2_pickle_io_manager
would sometimes fail to recursively delete a folder when cleaning up an output.instance.get_event_records
without an event type filter is now deprecated and will generate a warning. These calls will raise an exception starting in 0.15.0
.@multi_asset
now supports partitioning. Thanks @aroig!max_concurrent
field to the k8s_job_executor
that limits the number of concurrent Ops that will execute per run. Since this executor launches a Kubernetes Job per Op, this also limits the number of concurrent Kuberenetes Jobs. Note that this limit is per run, not global. Thanks @kervel!externalConfigmap
field as an alternative to dagit.workspace.servers
when running the user deployments chart in a separate release. This allows the workspace to be managed outside of the main Helm chart. Thanks @peay!markupsafe<=2.0.1
. Thanks @bollwyvl!RunRequest
objects instead of yielding them.OpExecutionContext
(provided as the context
argument to Ops) now has fields for, run
, job_def
, job_name
, op_def
, and op_config
. These replace pipeline_run
, pipeline_def
, etc. (though they are still available).OpExecutionContext
now offers a partition_time_window
attribute, which returns a tuple of datetime objects that mark the bounds of the partition’s time window.AssetsDefinition.from_graph
now accepts a partitions_def
argument.dagster-test-connection
pod from the Dagster Helm chart.k8s_job_executor
now polls the event log on a ~1 second interval (previously 0.1). Performance testing showed that this reduced DB load while not significantly impacting run time.Jinja2
and nbconvert
.<meta>
tag to a response header, and several more security and privacy related headers have been added as well.foo/bar
in dagit, rather than appearing as foo > bar
in some contexts.--log-level
flag is now available in the dagit cli for controlling the uvicorn log level.load_assets_from_dbt_project()
and load_assets_from_dbt_manifest()
utilities now have a use_build_command
parameter. If this flag is set, when materializing your dbt assets, Dagster will use the dbt build
command instead of dbt run
. Any tests run during this process will be represented with AssetObservation events attached to the relevant assets. For more information on dbt build
, see the dbt docs.build_snowflake_io_manager
offers a way to store assets and op outputs in Snowflake. The PandasSnowflakeTypeHandler
stores Pandas DataFrame
s in Snowflake.dagit.logLevel
has been added to values.yaml to access the newly added dagit --log-level cli option.toposort<=1.6
was installed.PickledObjectS3IOManager
now uses list_objects
to check the access permission. Thanks @trevenrawr!load_assets_from_dbt_project
and load_assets_from_dbt_manifest
functions now include the schemas of the dbt models in their asset keys. To revert to the old behavior: dbt_assets = load_assets_from_dbt_project(..., node_info_to_asset_key=lambda node_info: AssetKey(node_info["name"])
.TableSchema
API is no longer experimental.[mypy-dagster] (or e.g. mypy-dagster-dbt)
follow_imports = "skip"
build_output_context
now accepts an asset_key
argument.AssetGroup.from_package_module
, from_modules
, from_package_name
, and from_current_module
now accept an extra_source_assets
argument that includes a set of source assets into the group in addition to the source assets scraped from modules.to_source_assets
method that return SourceAsset versions of their assets, which can be used as source assets for downstream AssetGroups.mapping_key
can now be provided as an argument to build_op_context
/build_solid_context
. Doing so will allow the use of OpExecutionContext.get_mapping_key()
.AssetGroup.build_job
now uses >
instead of .
for delimiting the components within asset keys, which is consistent with how selection works in Dagit.+
.celery_docker_executor
would sometimes fail to execute with a JSON deserialization error when using Dagster resources that write to stdout.client.terminate_run(run_id)
. Thanks @Javier162380!values.yaml
file:runLauncher:
type: K8sRunLauncher
config:
k8sRunLauncher:
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
includeConfigInLaunchedRuns: true
in a user code deployment will now launch runs using the same namespace and service account as the user code deployment.@asset
decorator now accepts an op_tags
argument, which allows e.g. providing k8s resource requirements on the op that computes the asset.dagster api grpc-health-check
(previously it just returned via exit codes)emr_pyspark_step_launcher
now supports dynamic orchestration, RetryPolicy
s defined on ops, and re-execution from failure. For failed steps, the stack trace of the root error will now be available in the event logs, as will logs generated with context.log.info
.tags_fn_for_partition
function, instead of requiring that the dictionary have string keys and values.EcsRunLauncher
now registers new task definitions if the task’s execution role or task role changes.setuptools
as a runtime dependency.In
can now accept asset_partitions
without crashing.secretsmanager_resource
.includeConfigInLaunchedRuns
flag to the Helm chart that can be used to automatically include configmaps, secrets, and volumes in any runs launched from code in a user code deployment. See https://docs.dagster.io/deployment/guides/kubernetes/deploying-with-helm#configure-your-user-deployment for more information.BackfillParams
(used for launching backfills), now has an allPartitions
boolean flag, which can be used instead of specifying all the individual partition names.gevent
and gevent-websocket
dependencies from dagster-graphql
Fixed sqlite3.OperationalError
error when viewing schedules/sensors pages in Dagit. This was affecting dagit instances using the default SQLite schedule storage with a SQLite version < 3.25.0
.
Fixed an issues where schedules and sensors would sometimes fail to run when the daemon and dagit were running in different Python environments.
Fixed an exception when the telemetry file is empty
fixed a bug with @graph
composition which would cause the wrong input definition to be used for type checks
[dagit] For users running Dagit with --path-prefix
, large DAGs failed to render due to a WebWorker error, and the user would see an endless spinner instead. This has been fixed.
[dagit] Fixed a rendering bug in partition set selector dropdown on Launchpad.
[dagit] Fixed the ‘View Assets’ link in Job headers
Fixed an issue where root input managers with resource dependencies would not work with software defined assets
dagster-census
is a new library that includes a census_resource
for interacting the Census REST API, census_trigger_sync_op
for triggering a sync and registering an asset once it has finished, and a CensusOutput
type. Thanks @dehume!dagster.yaml
that can be used to increase the time that Dagster waits when spinning up a gRPC server before timing out. For more information, see https://docs.dagster.io/deployment/dagster-instance#code-servers.assetMaterializations
that can be queried off of a DagsterRun
field. You can use this field to fetch the set of asset materialization events generated in a given run within a GraphQL query.@resource
decorator will now be used as resource descriptions, if no description is explicitly provided.dagit -m
or dagit -f
at a module or file that has asset definitions but no jobs or asset groups, and all the asset definitions will be loaded into Dagit.AssetGroup
now has a materialize
method which executes an in-process run to materialize all the assets in the group.AssetGroup
s can now contain assets with different partition_defs
.fs_asset_io_manager
, now include the path of the file where the values were saved.max_concurrent_runs
limit on the QueuedRunCoordinator
by setting it to -1
. Use this if you only want to limit runs using tag_concurrency_limits
.get_runs()
function to get a list of runs matching certain paramters from the dbt Cloud API (thanks @kstennettlull!)authenticator
field to the connection arguments for the snowflake_resource
(thanks @swotai!).container_kwargs
that allows you to specify additional arguments to pass to your docker containers when they are run.:
character would fail to parse correctly, and filtering would therefore fail. This has been fixed.run_id
can now be provided as an argument to execute_in_process
.dagit
’s empty state no longer mentions the legacy concept “Pipelines”.IOManager.load_input
method, you can add input metadata via InputContext.add_input_metadata
. These metadata entries will appear on the LOADED_INPUT
event and if the input is an asset, be attached to an AssetObservation
. This metadata is viewable in dagit
.dagit
and dagster-daemon
processes. This would manifest in schedules / sensors getting marked as “Unloadable” in dagit
, and ticks not being registered correctly. The fix involves changing how Dagster stores schedule/sensor state and requires a schema change using the CLI command dagster instance migrate
. Users who are not running into this class of bugs may consider the migration optional.root_input_manager
can now be specified without a context argument.root_input_manager
from being used with VersionStrategy
.dagit
writing to the same telemetry logs.dagit
, using the “Open in Launchpad” feature for a run could cause server errors if the run configuration yaml was too long. Runs can now be opened from this feature regardless of config length.dagit
, runs in the timeline view sometimes showed incorrect end times, especially batches that included in-progress runs. This has been fixed.dagit
launchpad, reloading a repository should present the user with an option to refresh config that may have become stale. This feature was broken for jobs without partition sets, and has now been fixed.typing
type as dagster_type
to input and output definition was incorrectly being rejected.`from dagster import AssetGroup
from dagster_azure import adls2_pickle_asset_io_manager, adls2_resource
asset_group = AssetGroup(
[upstream_asset, downstream_asset],
resource_defs={"io_manager": adls2_pickle_asset_io_manager, "adls2": adls2_resource}
)`
@hourly_partitioned_config
, @daily_partitioned_config
, @weekly_partitioned_config
, and @monthly_partitioned_config
PartitionedConfig.get_run_config_for_partition_key
function. This will allow the use of the validate_run_config
function in unit tests.PartitionedConfig
now takes an argument tags_for_partition_fn
which allows for custom run tags for a given partition.resource_defs
in AssetGroup
:`from dagster import AssetGroup, gcs_pickle_asset_io_manager, gcs_resource
asset_group = AssetGroup(
[upstream_asset, downstream_asset],
resource_defs={"io_manager": gcs_pickle_asset_io_manager, "gcs": gcs_resource}
)`
RetryRequested
exceptions.funcsigs.partial
that would cause incorrect InvalidInvocationErrors
to be thrown.@hourly
, @daily
, @weekly
, and @monthly
in addition to the standard 5-field cron strings (e.g. * * * * *
).value
is now an alias argument of entry_data
(deprecated) for the MetadataEntry
constructor.SourceAssets
and is rendered in dagit
.dagster
CLI.dagster-k8s
, dagster-celery-k8s
, and dagster-docker
now name step workers dagster-step-...
rather than dagster-job-...
.k8s_job_executor
for runs with many user logsdagster-k8s/config
tag to configure Dagster Kubernetes pods, the tags can now accept any valid Kubernetes config, and can be written in either snake case (node_selector_terms
) or camel case (nodeSelectorTerms
). See the docs for more information.EcsRunLauncher
using the same syntax that you use to set secrets in the ECS API.EcsRunLauncher
now attempts to reuse task definitions instead of registering a new task definition for every run.EcsRunLauncher
now raises the underlying ECS API failure if it cannot successfully start a task.AssetGroup.from_package_name
and similar methods, lists of assets at module scope are now loaded.AssetGroup.from_modules
and AssetGroup.from_current_module
, which automatically load assets at module scope from particular modules.AssetGraph.from_modules
now correctly raises an error if multiple assets with the same key are detected.InputContext
object provided to IOManager.load_input
previously did not include resource config. Now it does.build_fivetran_assets
will now be properly tagged with a fivetran
pill in Dagit.++item
. This has been fixed.SQLAlchemy
package to be 1.4
or greater to be installed. We are now using queries supported by SQLAlchemy>=1.3
. Previously we would raise an error including the message: 'Select' object has no attribute 'filter'
.sqlite
to be 3.25.0
or greater to be installed. This has been relaxed to support older versions of sqlite. This was previously marked as fixed in our 0.14.0
notes, but a handful of cases that were still broken have now been fixed. Previously we would raise an error (sqlite3.OperationalError
).EcsRunLauncher
to use sidecars without you providing your own custom task definition. Now, you can continue to inherit sidecars from the launching task’s task definition by setting include_sidecars: True
in your run launcher config.dagster-snowflake
has dropped support for python 3.6. The library it is currently built on, snowflake-connector-python,
dropped 3.6 support in their recent 2.7.5
release.MetadataValue.path()
and PathMetadataValue
now accept os.PathLike
objects in addition to strings. Thanks@abkfenris!env_vars
on the k8s_job_executor
. Thanks @kervel!AssetGroup
instead of build_assets_job
, and it can now be run entirely from a local machine with no additional infrastructure (storing data inside DuckDB).AssetGroup
instead of build_assets_job
.run_request_for_partition
, which returns a RunRequest
that can be returned in a sensor or schedule evaluation function to launch a run for a particular partition for that job. See our documentation for more information.PipelineRunsFilter
=> RunsFilter
.load_assets_from_dbt_project
will now attach schema information to the generated assets if it is available in the dbt project (schema.yml
).AssetGroup
api is now the fs_asset_io_manager
.SourceAsset
s can now be partitioned, by setting the partitions_def
argument.execute_in_process
was not updated properly.SQLAlchemy<1.4.0
.load_assets_from_dbt_project
would fail if models were organized into subdirectories.load_assets_from_dbt_project
would fail if seeds or snapshots were present in the project.[dagster-fivetran] A new fivetran_resync_op (along with a corresponding resync_and_poll method on the fivetran_resource) allows you to kick off Fivetran resyncs using Dagster (thanks @dwallace0723!)
[dagster-shell] Fixed an issue where large log output could cause operations to hang (thanks @kbd!)
[documentation] Fixed export message with dagster home path (thanks @proteusiq)!
[documentation] Remove duplicate entries under integrations (thanks @kahnwong)!
validate_run_config
.reexecute_pipeline
API.TableRecord
, TableSchema
and its constituents are now documented in the API docs.MetadataEntry
and MetadataValue
instead of old ones.build_run_status_sensor_context
to help build context objects for run status sensorsdefault_value
on inputs has been resolved. Previously, a defensive error that did not take default_value
in to account was thrown.local_dagster_job_package_path
config option (Thanks Iswariya Manivannan!)reconstructable
API docs.markupsafe
to function with existing Jinja2
pin.Software-defined assets, which offer a declarative approach to data orchestration on top of Dagster’s core job/op/graph APIs, have matured significantly. Improvements include partitioned assets, a revamped asset details page in Dagit, a cross-repository asset graph view in Dagit, Dagster types on assets, structured metadata on assets, and the ability to materialize ad-hoc selections of assets without defining jobs. Users can expect the APIs to only undergo minor changes before being declared fully stable in Dagster’s next major release. For more information, view the software-defined assets concepts page here.
We’ve made it easier to define a set of software-defined assets where each Dagster asset maps to a dbt model. All of the dependency information between the dbt models will be reflected in the Dagster asset graph, while still running your dbt project in a single step.
Dagit has a new homepage, dubbed the “factory floor” view, that provides an overview of recent runs of all the jobs. From it, you can monitor the status of each job’s latest run or quickly re-execute a job. The new timeline view reports the status of all recent runs in a convenient gantt chart.
You can now write schedules and sensors that default to running as soon as they are loaded in your workspace, without needing to be started manually in Dagit. For example, you can create a sensor like this:
from dagster import sensor, DefaultSensorStatus
@sensor(job=my_job, default_status=DefaultSensorStatus.RUNNING)
def my_running_sensor():
...
or a schedule like this:
from dagster import schedule, DefaultScheduleStatus, ScheduleEvaluationContext
@schedule(job=my_job, cron_schedule="0 0 * * *", default_status=DefaultScheduleStatus.RUNNING)
def my_running_schedule(context: ScheduleEvaluationContext):
...
As soon as schedules or sensors with the default_status field set to RUNNING
are included in the workspace loaded by your Dagster Daemon, they will begin creating ticks and submitting runs.
Op selection now supports selecting ops inside subgraphs. For example, to select an op my_op inside a subgraph my_graph, you can now specify the query as my_graph.my_op
. This is supported in both Dagit and Python APIs.
Dagster Types can now have attached metadata. This allows TableSchema objects to be attached to Dagster Types via TableSchemaMetadata. A Dagster Type with a TableSchema will have the schema rendered in Dagit.
A new Pandera integration (dagster-pandera) allows you to use Pandera’s dataframe validation library to wrap dataframe schemas in Dagster types. This provides two main benefits: (1) Pandera’s rich schema validation can be used for runtime data validation of Pandas dataframes in Dagster ops/assets; (2) Pandera schema information is displayed in Dagit using a new TableSchema API for representing arbitrary table schemas.
The new AssetObservation event enables recording metadata about an asset without indicating that the asset has been updated.
AssetMaterializations
, ExpectationResults
, and AssetObservations
can be logged via the context of an op using the OpExecutionContext.log_event method. Output metadata can also be logged using the OpExecutionContext.add_output_metadata method. Previously, Dagster expected these events to be yielded within the body of an op, which caused lint errors for many users, made it difficult to add mypy types to ops, and also forced usage of the verbose Output API. Here’s an example of the new invocations:
from dagster import op, AssetMaterialization
@op
def the_op(context):
context.log_event(AssetMaterialization(...))
context.add_output_metadata({"foo": "bar"})
...
A new Airbyte integration (dagster-airbyte) allows you to kick off and monitor Airbyte syncs from within Dagster. The original contribution from @airbytehq’s own @marcosmarxm includes a resource implementation as well as a pre-built op for this purpose, and we’ve extended this library to support software-defined asset use cases as well. Regardless of which interface you use, Dagster will automatically capture the Airbyte log output (in the compute logs for the relevant steps) and track the created tables over time (via AssetMaterializations).
The ECSRunLauncher (introduced in Dagster 0.11.15) is no longer considered experimental. You can bootstrap your own Dagster deployment on ECS using our docker compose example or you can use it in conjunction with a managed Dagster Cloud deployment. Since its introduction, we’ve added the ability to customize Fargate container memory and CPU, mount secrets from AWS SecretsManager, and run with a variety of AWS networking configurations. Join us in #dagster-ecs in Slack!
[Helm] The default liveness and startup probes for Dagit and user deployments have been replaced with readiness probes. The liveness and startup probe for the Daemon has been removed. We observed and heard from users that under load, Dagit could fail the liveness probe which would result in the pod restarting. With the new readiness probe, the pod will not restart but will stop serving new traffic until it recovers. If you experience issues with any of the probe changes, you can revert to the old behavior by specifying liveness and startup probes in your Helm values (and reach out via an issue or Slack).
The Dagster Daemon now uses the same workspace.yaml file as Dagit to locate your Dagster code. You should ensure that if you make any changes to your workspace.yaml file, they are included in both Dagit’s copy and the Dagster Daemon’s copy. When you make changes to the workspace.yaml file, you don’t need to restart either Dagit or the Dagster Daemon - in Dagit, you can reload the workspace from the Workspace tab, and the Dagster Daemon will periodically check the workspace.yaml file for changes every 60 seconds. If you are using the Dagster Helm chart, no changes are required to include the workspace in the Dagster Daemon.
Dagster’s metadata API has undergone a signficant overhaul. Changes include:
EventMetadata
> MetadataValue
EventMetadataEntry
> MetadataEntry
XMetadataEntryData
> XMetadataValue
(e.g. TextMetadataEntryData
> TextMetadataValue
)metadata_entries
keyword argument to events and Dagster types is deprecated. Instead, users should use the metadata keyword argument, which takes a dictionary mapping string labels to MetadataValue
s.EventMetadataEntry
is deprecated.EventMetadataEntry
(e.g. EventMetadataEntry.text
) is deprecated. In 0.15.0, users should avoid constructing EventMetadataEntry
objects directly, instead utilizing the metadata dictionary keyword argument, which maps string labels to MetadataValues
.In previous releases, it was possible to supply either an AssetKey, or a function that produced an AssetKey from an OutputContext as the asset_key argument to an Out/OutputDefinition. The latter behavior makes it impossible to gain information about these relationships without running a job, and has been deprecated. However, we still support supplying a static AssetKey as an argument.
We have renamed many of the core APIs that interact with ScheduleStorage, which keeps track of sensor/schedule state and ticks. The old term for the generic schedule/sensor “job” has been replaced by the term “instigator” in order to avoid confusion with the execution API introduced in 0.12.0. If you have implemented your own schedule storage, you may need to change your method signatures appropriately.
Dagit is now powered by Starlette instead of Flask. If you have implemented a custom run coordinator, you may need to make the following change:
from flask import has_request_context, request
def submit_run(self, context: SubmitRunContext) -> PipelineRun:
jwt_claims_header = (
request.headers.get("X-Amzn-Oidc-Data", None) if has_request_context() else None
)
Should be replaced by:
def submit_run(self, context: SubmitRunContext) -> PipelineRun:
jwt_claims_header = context.get_request_header("X-Amzn-Oidc-Data")
Dagit
The Dagster Daemon now requires a workspace.yaml file, much like Dagit.
Ellipsis (“...”) is now an invalid substring of a partition key. This is because Dagit accepts an ellipsis to specify partition ranges.
[Helm] The Dagster Helm chart now only supported Kubernetes clusters above version 1.18.
Software Defined Assets:
In Dagit, the Asset Catalog now offers a third display mode - a global graph of your software-defined assets.
The Asset Catalog now allows you to filter by repository to see a subset of your assets, and offers a “View in Asset Graph” button for quickly seeing software-defined assets in context.
The Asset page in Dagit has been split into two tabs, “Activity” and “Definition”.
Dagit now displays a warning on the Asset page if the most recent run including the asset’s step key failed without yielding a materialization, making it easier to jump to error logs.
Dagit now gives you the option to view jobs with software-defined assets as an Asset Graph (default) or as an Op Graph, and displays asset <-> op relationships more prominently when a single op yields multiple assets.
You can now include your assets in a repository with the use of an AssetGroup. Each repository can only have one AssetGroup, and it can provide a jumping off point for creating the jobs you plan on using from your assets.
from dagster import AssetGroup, repository, asset
@asset(required_resource_keys={"foo"})
def asset1():
...
@asset
def asset2():
...
@repository
def the_repo():
asset_group = AssetGroup(assets=[asset1, asset2], resource_defs={"foo": ...})
return [asset_group, asset_group.build_job(selection="asset1-")]
AssetGroup.build_job
supports a selection syntax similar to that found in op selection.
Asset Observations:
asset_observations_for_node
method to ExecuteInProcessResult
for fetching the AssetObservations from an in-process execution.Dagster Types with an attached TableSchemaMetadataValue now render the schema in Dagit UI.
[dagster-pandera] New integration library dagster-pandera provides runtime validation from the Pandera dataframe validation library and renders table schema information in Dagit.
OpExecutionContext.log_event
provides a way to log AssetMaterializations, ExpectationResults, and AssetObservations from the body of an op without having to yield anything. Likewise, you can use OpExecutionContext.add_output_metadata
to attach metadata to an output without having to explicitly use the Output object.
OutputContext.log_event
provides a way to log AssetMaterializations from within the handle_output method of an IO manager without yielding. Likewise, output metadata can be added using OutputContext.add_output_metadata
.
[dagster-dbt] The load_assets_from_dbt_project function now returns a set of assets that map to a single dbt run command (rather than compiling each dbt model into a separate step). It also supports a new node_info_to_asset_key argument which allows you to customize the asset key that will be used for each dbt node.
[dagster-airbyte] The dagster-airbyte integration now collects the Airbyte log output for each run as compute logs, and generates AssetMaterializations for each table that Airbyte updates or creates.
[dagster-airbyte] The dagster-airbyte integration now supports the creation of software-defined assets, with the build_airbyte_assets function.
[dagster-fivetran] The dagster-fivetran integration now supports the creation of software-defined assets with the build_fivetran_assets function.
The multiprocess executor now supports choosing between spawn or forkserver for how its subprocesses are created. When using forkserver we attempt to intelligently preload modules to reduce the per-op overhead.
[Helm] Labels can now be set on the Dagit and daemon deployments.
[Helm] The default liveness and startup probes for Dagit and user deployments have been replaced with readiness probes. The liveness and startup probe for the Daemon has been removed. We observed and heard from users that under load, Dagit could fail the liveness probe which would result in the pod restarting. With the new readiness probe, the pod will not restart but will stop serving new traffic until it recovers. If you experience issues with any of the probe changes, you can revert to the old behavior by specifying liveness and startup probes in your Helm values (and reach out via an issue or Slack).
[Helm] The Ingress v1 is now supported.
EcsRunLauncher
can now override the secrets_tag
parameter to None, which will cause it to not look for any secrets to be included in the tasks for the run. This can be useful in situations where the run launcher does not have permissions to query AWS Secretsmanager.0.13.17
/ 0.13.18
might display an incorrect timestamp for its start time on the Runs page. Running the dagster instance migrate
CLI command should resolve the issue.my_op
inside a subgraph my_graph
, you can now specify the query as "my_graph.my_op"
.dagster asset wipe
CLI command now takes a --noprompt
option.Map
config type, used to represent mappings between arbitrary scalar keys and typed values. For more information, see the Map ConfigType docs.build_resources
has been added to the top level API. It provides a way to initialize resources outside of execution. This provides a way to use resources within the body of a sensor or schedule: https://github.com/dagster-io/dagster/issues/3794dagster-daemon
process now creates fewer log entries when no actions are taken (for example, if the run queue is empty)dagster-daemon
pod will now spin down completely before the new dagster-daemon
pod is started.K8sRunLauncher
should fail if the Dagster run fails. To enable this flag, set the `failPodOnRunFailure
key to true in the run launcher portion of the Helm chart.schema
and data
arguments on the DbtCliResource.test
function no longer need to be set to False to avoid errors, and the dbt output will be no longer be displayed in json format in the event logs.DagsterGraphQLClient
now supports submitting runs with op/solid sub-selections.0.13.15
/ 0.13.16
/ 0.13.17
might display an incorrect timestamp for its start time on the Runs page. This would only happen if you had run a schema migration (using one of those versions) with the dagster instance migrate
CLI command. Running the dagster instance reindex
command should run a data migration that resolves this issue.namespace
argument of the @asset
decorator now accepts a list of strings in addition to a single string.TableSchemaMetadataEntryData
and TableMetadataEntryData
allow you to emit metadata representing the schema / contents of a table, to be displayed in Dagit.k8s_job_executor
due to a label validation error when creating the step pod.--path-prefix
option.@asset
decorator has a partitions_def
argument, which accepts a PartitionsDefinition
value. The asset details page in Dagit now represents which partitions are filled in.sync_and_poll
method of the dagster-fivetran resource (thanks Marcos Marx).databricks_pyspark_step_launcher
now streams Dagster logs back from Databricks rather than waiting for the step to completely finish before exporting all events. Fixed an issue where all events from the external step would share the same timestamp. Immediately after execution, stdout and stderr logs captured from the Databricks worker will be automatically surfaced to the event log, removing the need to set the wait_for_logs
option in most scenarios.databricks_pyspark_step_launcher
now supports dynamically mapped steps.execute_in_process
would not work for graphs with nothing inputs.Ctrl+A
command did not correctly allow select-all behavior in the editor for non-Mac users, this has now been fixed.k8s_job_executor
where the same step could start twice in rare cases.S3ComputeLogManager
would cause errors in Dagit. This is now fixed.categorical_column
constraint.SkipReason
from the schedule function did not display the skip reason in the tick timeline in Dagit, or output the skip message in the dagster-daemon log output.@daily
rather than 0 0 * * *
. However, these schedules would fail to actually run successfully in the daemon and would also cause errors when viewing certain pages in Dagit. We now raise an DagsterInvalidDefinitionError
for schedules that do not have a cron expression consisting of a 5 space-separated fields.get_bucket()
calls with bucket()
, to avoid unnecessary bucket metadata fetches, thanks!@asset_sensor
s.CeleryK8sRunLauncher
that will be included in all launched jobs.end_mlflow_on_run_finished
hook is now a top-level export of the dagster mlflow library. The API reference also now includes an entry for it.RetryPolicy
is now respected when execution is interrupted.job_metadata
in tags did not correctly propagate to Kubernetes jobs created by Dagster. Thanks @ibelikov!default_flags
property of DbtCliResource
.AssetIn
input object now accepts an asset key so upstream assets can be explicitly specified (e.g. AssetIn(asset_key=AssetKey("asset1"))
)@asset
decorator now has an optional non_argument_deps
parameter that accepts AssetKeys of assets that do not pass data but are upstream dependencies.ForeignAsset
objects now have an optional description
attribute.run_id
, job_name
, and op_exception
have been added as parameters to build_hook_context
.from dagster import job, op
@op
def add_one(x):
return x + 1
@job
def my_job(x):
add_one(x)
You can now add config for x at the top level of my run_config like so:
run_config = {
"inputs": {
"x": {
"value": 2
}
}
}
@op(config_schema={"partition_key": str})
def my_op(context):
print("partition_key: " + context.op_config["partition_key"])
@static_partitioned_config(partition_keys=["a", "b"])
def my_static_partitioned_config(partition_key: str):
return {"ops": {"my_op": {"config": {"partition_key": partition_key}}}}
@job(config=my_static_partitioned_config)
def my_partitioned_job():
my_op()
You can now write:
@op
def my_op(context):
print("partition_key: " + context.partition_key)
@job(partitions_def=StaticPartitionsDefinition(["a", "b"]))
def my_partitioned_job():
my_op()
op_retry_policy
to @job
. You can also specify op_retry_policy
when invoking to_job
on graphs.fivetran_sync_op
will now be rendered with a fivetran tag in Dagit.fivetran_sync_op
now supports producing AssetMaterializations
for each table updated during the sync. To this end, it now outputs a structured FivetranOutput
containing this schema information, instead of an unstructured dictionary.AssetMaterializations
produced from the dbt_cloud_run_op now include a link to the dbt Cloud docs for each asset (if docs were generated for that run).@schedule
decorator with RunRequest
- based evaluation functions. For example, you can now write:@schedule(cron_schedule="* * * * *", job=my_job)
def my_schedule(context):
yield RunRequest(run_key="a", ...)
yield RunRequest(run_key="b", ...)
python_logs
settings using the Dagster Helm chart.make_slack_on_run_failure_sensor
to use Slack layout blocks and include clickable link to Dagit. Previously, it sent a plain text message.ForeignAsset
, the repository containing that job would fail to load.EcsRunLauncher
.EcsRunLauncher
.EcsRunLauncher
now dynamically chooses between assigning a public IP address or not based on whether it’s running in a public or private subnet.@asset
and @multi_asset
decorator now return AssetsDefinition
objects instead of OpDefinitions
get_dagster_logger
instead of context.log
.a, b = my_op()
, inside @graph
or @job
, but my
_op only has a single Out
.dbt_cloud_run_op
, as well as a more flexible dbt_cloud_resource
for more customized use cases. Check out the api docs to learn more!pipeline launch
/ job launch
CLIs that would spin up an ephemeral dagster instance for the launch, then tear it down before the run actually executed. Now, the CLI will enforce that your instance is non-ephemeral.pipeline
argument of the InitExecutorContext
constructor has been changed to job
.@asset
decorator now accepts a dagster_type
argument, which determines the DagsterType for the output of the asset op.build_assets_job
accepts an executor_def
argument, which determines the executor for the job.google-cloud-bigquery
is temporarily pinned to be prior to version 3 due to a breaking change in that version.EcsRunLauncher
tagged each ECS task with its corresponding Dagster Run ID. ECS tagging isn't supported for AWS accounts that have not yet migrated to using the long ARN format. Now, the EcsRunLauncher
only adds this tag if your AWS account has the long ARN format enabled.k8s_job_executor
and docker_executor
that could result in jobs exiting as SUCCESS
before all ops have run.k8s_job_executor
and docker_executor
that could result in jobs failing when an op is skipped.graphene
is temporarily pinned to be prior to version 3 to unbreak Dagit dependencies.fivetran_sync_op
, as well as a more flexible fivetran_resource
for more customized use cases. Check out the api docs to learn more!SourceHashVersionStrategy
class has been added, which versions op
and resource
code. It can be provided to a job like so:from dagster import job, SourceHashVersionStrategy
@job(version_strategy=SourceHashVersionStrategy())
def my_job():
...
Workspace
> Graph
view.dagster job
selected both pipelines and jobs. This release changes the dagster job
command to select only jobs and not pipelines.k8s_job_executor
is no longer experimental, and is recommended for production workloads. This executor runs each op in a separate Kubernetes job. We recommend this executor for Dagster jobs that require greater isolation than the multiprocess
executor can provide within a single Kubernetes pod. The celery_k8s_job_executor
will still be supported, but is recommended only for use cases where Celery is required (The most common example is to offer step concurrency limits using multiple Celery queues). Otherwise, the k8s_job_executor
is the best way to get Kubernetes job isolation.make_dagster_job_from_airflow_dag
factory function. Deprecated pipeline_name
argument in favor of job_name
in all the APIs.chardet
library that was required due to an incompatibility with an old version of the aiohttp
library, which has since been fixed.ins
argument of the op
decorator.slack_on_run_failure_sensor
now says “Job” instead of “Pipeline” in its default message.DagsterTypeCheckDidNotPass
error when a Dagster Type contained a List inside a Tuple (thanks @jan-eat!)dagstermill
, dagster-pandas
, dagster-airflow
, etc)execute_in_process
, when job does not have a top-level output.k8s_job_executor
define_dagstermill_op
factory function. Also updated documentation and examples to reflect these changes.from dagster import get_dagster_logger
)dbt_cli_resource
, dbt_rpc_resource
, and dbt_rpc_sync_resource
) now support the dbt ls
command: context.resources.dbt.ls()
.ins
and outs
properties to OpDefinition
.resources_config
argument on build_solid_context
. The config argument has been renamed to solid_config
.run_worker
and step_worker
, respectively.execute_in_process
Executor.execute(...)
has changed from pipeline_context
to plan_context
dagster
, rather than repeating the job name. Thanks @skirino![dagster-docker] Added a new docker_executor
which executes steps in separate Docker containers.
The dagster-daemon process can now detect hanging runs and restart crashed run workers. Currently
only supported for jobs using the docker_executor
and k8s_job_executor
. Enable this feature in your dagster.yaml with:
run_monitoring:
enabled: true
Documentation coming soon. Reach out in the #dagster-support Slack channel if you are interested in using this feature.
dagster-aws
, dagster-github
, and dagster-slack
to reference job/op/graph APIs.ls
command.execution:
celery-k8s:
GCSComputeLogManager
using a string or environment variable instead of passing a path to a credentials file. Thanks @silentsokolov!dagster instance migrate
would run out of memory when migrating over long run histories.dagster_aws.s3.sensor.get_s3_keys
that would return no keys if an invalid s3 key was providedmy_log.info("foo %s", "bar")
would cause errors in some scenarios.build_assets_job
. The asset graph shows each node in the job’s graph with metadata about the asset it corresponds to - including asset materializations. It also contains links to upstream jobs that produce assets consumed by the job, as well as downstream jobs that consume assets produced by the job.load_assets_from_dbt_project
and load_assets_from_dbt_project
that would cause runs to fail if no runtime_metadata_fn
argument were supplied.@asset
not to infer the type of inputs and outputs from type annotations of the decorated function.@asset
now accepts a compute_kind
argument. You can supply values like “spark”, “pandas”, or “dbt”, and see them represented as a badge on the asset in the Dagit asset graph.Changed VersionStrategy.get_solid_version
and VersionStrategy.get_resource_version
to take in a SolidVersionContext
and ResourceVersionContext
, respectively. This gives VersionStrategy access to the config (in addition to the definition object) when determining the code version for memoization. (Thanks @RBrossard!).
Note: This is a breaking change for anyone using the experimental VersionStrategy
API. Instead of directly being passed solid_def
and resource_def
, you should access them off of the context object using context.solid_def
and context.resource_def
respectively.
emr_pyspark_step_launcher
to fail when stderr included non-Log4J-formatted lines.applyPerUniqueValue
config on the QueuedRunCoordinator
to fail Helm schema validation.@asset
decorator and build_assets_job
APIs to construct asset-based jobs, along with Dagit support.load_assets_from_dbt_project
and load_assets_from_dbt_manifest
, which enable constructing asset-based jobs from DBT models.[dagstermill] You can now have more precise IO control over the output notebooks by specifying output_notebook_name
in define_dagstermill_solid
and providing your own IO manager via "output_notebook_io_manager" resource key.
We've deprecated output_notebook
argument in define_dagstermill_solid
in favor of output_notebook_name
.
Previously, the output notebook functionality requires “file_manager“ resource and result in a FileHandle output. Now, when specifying output_notebook_name, it requires "output_notebook_io_manager" resource and results in a bytes output.
You can now customize your own "output_notebook_io_manager" by extending OutputNotebookIOManager. A built-in local_output_notebook_io_manager
is provided for handling local output notebook materialization.
See detailed migration guide in https://github.com/dagster-io/dagster/pull/4490.
Dagit fonts have been updated.
context.log.info("foo %s", "bar")
would not get formatted as expected.QueuedRunCoordinator
’s tag_concurrency_limits
to not be respected in some casestags
argument of the @graph
decorator or GraphDefinition
constructor. These tags will be set on any runs of jobs are built from invoking to_job
on the graph.k8s_job_executor
or celery_k8s_job_executor
. Use the key image
inside the container_config
block of the k8s solid tag.jobs
argument. Each RunRequest
emitted from a multi-job sensor’s evaluation function must specify a job_name
.KubernetesRunLauncher
image pull policy is now configurable in a separate field (thanks @yamrzou!).dagster-github
package is now usable for GitHub Enterprise users (thanks @metinsenturk!) A hostname can now be provided via config to the dagster-github resource with the key github_hostname
:execute_pipeline(
github_pipeline, {'resources': {'github': {'config': {
"github_app_id": os.getenv('GITHUB_APP_ID'),
"github_app_private_rsa_key": os.getenv('GITHUB_PRIVATE_KEY'),
"github_installation_id": os.getenv('GITHUB_INSTALLATION_ID'),
"github_hostname": os.getenv('GITHUB_HOSTNAME'),
}}}}
)
pipeline_failure_sensor
and run_status_sensor
queries. To take advantage of these performance gains, run a schema migration with the CLI command: dagster instance migrate
.DockerRunLauncher
would raise an exception when no networks were specified in its configuration.dagster-slack
has migrated off of deprecated slackclient
(deprecated) and now uses [slack_sdk](https://slack.dev/python-slack-sdk/v3-migration/)
.OpDefinition
, the replacement for SolidDefinition
which is the type produced by the @op
decorator, is now part of the public API.daily_partitioned_config
, hourly_partitioned_config
, weekly_partitioned_config
, and monthly_partitioned_config
now accept an end_offset
parameter, which allows extending the set of partitions so that the last partition ends after the current time.Previously in Dagit, when a repository location had an error when reloaded, the user could end up on an empty page with no context about the error. Now, we immediately show a dialog with the error and stack trace, with a button to try reloading the location again when the error is fixed.
Dagster is now compatible with Python’s logging module. In your config YAML file, you can configure log handlers and formatters that apply to the entire Dagster instance. Configuration instructions and examples detailed in the docs: https://docs.dagster.io/concepts/logging/python-logging
[helm] The timeout of database statements sent to the Dagster instance can now be configured using .dagit.dbStatementTimeout
.
The QueuedRunCoordinator
now supports setting separate limits for each unique value with a certain key. In the below example, 5 runs with the tag (backfill: first)
could run concurrently with 5 other runs with the tag (backfill: second)
.
run_coordinator:
module: dagster.core.run_coordinator
class: QueuedRunCoordinator
config:
tag_concurrency_limits:
- key: backfill
value:
applyLimitPerUniqueValue: True
limit: 5
.migrate.enabled=True
.Added instance
on RunStatusSensorContext
for accessing the Dagster Instance from within the
run status sensors.
The inputs of a Dagstermill solid now are loaded the same way all other inputs are loaded in the framework. This allows rerunning output notebooks with properly loaded inputs outside Dagster context. Previously, the IO handling depended on temporary marshal directory.
Previously, the Dagit CLI could not target a bare graph in a file, like so:
from dagster import op, graph
@op
def my_op():
pass
@graph
def my_graph():
my_op()
This has been remedied. Now, a file foo.py
containing just a graph can be targeted by the dagit
CLI: dagit -f foo.py
.
When a solid, pipeline, schedule, etc. description or event metadata entry contains a markdown-formatted table, that table is now rendered in Dagit with better spacing between elements.
The hacker-news example now includes instructions on how to deploy the repository in a Kubernetes cluster using the Dagster Helm chart.
[dagster-dbt] The dbt_cli_resource
now supports the dbt source snapshot-freshness
command
(thanks @emilyhawkins-drizly!)
[helm] Labels are now configurable on user code deployments.
Bugfixes
EventMetadata.asset
and EventMetadata.pipeline_run
in
AssetMaterialization
metadata. (Thanks @ymrzkrrs and @drewsonne!)Breaking Changes
fs_io_manager
, which allows
data to be passed out of the Jupyter process boundary.Community Contributions
Documentation
objects.inv
is available at http://docs.dagster.io/objects.inv for other projects to link.execute_solid
has been removed from the testing (https://docs.dagster.io/concepts/testing)
section. Direct invocation is recommended for testing solids.gcsfs
as a dependency.create_databricks_job_solid
now includes an example of how to use it.context.log.info()
and other similar functions now fully respect the python logging API. Concretely, log statements of the form context.log.error(“something %s happened!”, “bad”)
will now work as expected, and you are allowed to add things to the “extra” field to be consumed by downstream loggers: context.log.info("foo", extra={"some":"metadata"})
.config_from_files
, config_from_pkg_resources
, and config_from_yaml_strings
have been added for constructing run config from yaml files and strings.DockerRunLauncher
can now be configured to launch runs that are connected to more than one network, by configuring the networks
key.env_from
and volume_mounts
are now properly applied to the corresponding Kubernetes run worker and job pods.end_mlflow_run_on_pipeline_finished
hook now no longer errors whenever invoked.context.log
calls are now not allowed. context.log.info("msg", foo="hi")
should be rewritten as context.log.info("msg", extra={"foo":"hi"})
.AssetMaterialization
. Previously, it would still yield an AssetMaterialization
where the path is a temp file path that won't exist after the notebook execution.InputContext
and OutputContext
now each has an asset_key
that returns the asset key that was provided to the corresponding InputDefinition
or OutputDefinition
.dbt_rpc_sync_resource
), which allows you to programmatically send dbt
commands to an RPC server, returning only when the command completes (as opposed to returning as soon as the command has been sent).k8s_job_executor
now adds to the secrets specified in K8sRunLauncher
instead of overwriting them.local_file_manager
no longer uses the current directory as the default base_dir
instead defaulting to LOCAL_ARTIFACT_STORAGE/storage/file_manager
. If you wish, you can configure LOCAL_ARTIFACT_STORAGE
in your dagster.yaml file.s3_pickle_io_manager
for intermediate storage.Recent
live tick timeline view for Sensors.dagit
will now write to a temp directory in the current working directory when launched with the env var DAGSTER_HOME
not set. This should resolve issues where the event log was not keeping up to date when observing runs progress live in dagit
with no DAGSTER_HOME
to_job
that would result in an error when using an enum config schema within a job.A-Za-z0-9_
.EventMetadata.python_artifact
.dagster sensor list
and dagster schedule list
CLI commands to include schedules and sensors that have never been turned on.execute_in_process
where providing default executor config to a job would cause config errors.ops
config entry in place of solids
would cause a config error.adls2_io_manager
ModeDefinition
now validates the keys of resource_defs
at definition time.Failure
exceptions no longer bypass the RetryPolicy
if one is set.serviceAccount.name
to the user deployment Helm subchart and schema, thanks @jrouly!EcsRunLauncher
will now exponentially backoff certain requests for up to a minute while waiting for ECS to reach a consistent state.launch
CLI, and other modes of external execution, whereas before, memoization was only available via execute_pipeline
and the execute
CLI.version
argument on the decorator:from dagster import root_input_manager
@root_input_manager(version="foo")
def my_root_manager(_):
pass
versioned_fs_io_manager
now defaults to using the storage directory of the instance as a base directory.GraphDefinition.to_job
now accepts a tags dictionary with non-string values - which will be serialized to JSON. This makes job tags work similarly to pipeline tags and solid tags.NoOpComputeLogManager
. It did not make sense to default to the LocalComputeLogManager
as pipeline runs are executed in ephemeral jobs, so logs could not be retrieved once these jobs were cleaned up. To have compute logs in a Kubernetes environment, users should configure a compute log manager that uses a cloud provider.dbt_pipeline
to the hacker news example repo, which demonstrates how to run a dbt project within a Dagster pipeline.k8s_job_executor
to match the configuration set in the K8sRunLauncher
.DAGSTER_GRPC_MAX_RX_BYTES
environment variable.dagster instance migrate
when the asset catalog contains wiped assets.--models
, --select
, or --exclude
flags while configuring the dbt_cli_resource
, it will no longer attempt to supply these flags to commands that don’t accept them.yield_result
wrote output value to the same file path if output names are the same for different solids.ops
can now be used as a config entry in place of solids
.EcsRunLauncher
more resilient to ECS’ eventual consistency model.[@run_status_sensor](https://docs.dagster.io/_apidocs/schedules-sensors#dagster.run_status_sensor)
which defines sensors that react to given PipelineRunStatus
.solid
on build_hook_context
. This allows you to access the hook_context.solid
parameter.dagster
’s dependency on docstring-parser
has been loosened.@pipeline
now pulls its description
from the doc string on the decorated function if it is provided.dagster new-project
now no longer targets a non-existent mode.@repository
functions.GraphDefinition.to_job
now supports the description
argument.AmazonECS_FullAccess
policy. Now, the attached roles has been more narrowly scoped to only allow the daemon and dagit tasks to interact with the ECS actions required by the EcsRunLauncher.Error: Got unexpected extra arguments
. Now, it ignores the entrypoint and launches succeed.dagster instance migrate
.ScheduleDefinition
constructor to instantiate a schedule definition, if a schedule name is not provided, the name of the schedule will now default to the pipeline name, plus “_schedule”, instead of raising an error.description
and solid_retry_policy
were getting dropped when using a solid_hook
decorator on a pipeline definition (#4355).@pipeline_failure_sensor
that prevented them from working.With the new first-class Pipeline Failure sensors, you can now write sensors to perform arbitrary actions when pipelines in your repo fail using @pipeline_failure_sensor
. Out-of-the-box sensors are provided to send emails using make_email_on_pipeline_failure_sensor
and slack messages using make_slack_on_pipeline_failure_sensor
.
See the Pipeline Failure Sensor docs to learn more.
New first-class Asset sensors help you define sensors that launch pipeline runs or notify appropriate stakeholders when specific asset keys are materialized. This pattern also enables Dagster to infer cross-pipeline dependency links. Check out the docs here!
Solid-level retries: A new retry_policy
argument to the @solid
decorator allows you to easily and flexibly control how specific solids in your pipelines will be retried if they fail by setting a RetryPolicy.
Writing tests in Dagster is now even easier, using the new suite of direct invocation apis. Solids, resources, hooks, loggers, sensors, and schedules can all be invoked directly to test their behavior. For example, if you have some solid my_solid
that you'd like to test on an input, you can now write assert my_solid(1, "foo") == "bar"
(rather than explicitly calling execute_solid()
).
[Experimental] A new set of experimental core APIs. Among many benefits, these changes unify concepts such as Presets and Partition sets, make it easier to reuse common resources within an environment, make it possible to construct test-specific resources outside of your pipeline definition, and more. These changes are significant and impactful, so we encourage you to try them out and let us know how they feel! You can learn more about the specifics here
[Experimental] There’s a new reference deployment for running Dagster on AWS ECS and a new EcsRunLauncher that launches each pipeline run in its own ECS Task.
[Experimental] There’s a new k8s_job_executor
(https://docs.dagster.io/_apidocs/libraries/dagster-k8s#dagster_k8s.k8s_job_executor)which executes each solid of your pipeline in a separate Kubernetes job. This addition means that you can now choose at runtime (https://docs.dagster.io/deployment/guides/kubernetes/deploying-with-helm#executor) between single pod and multi-pod isolation for solids in your run. Previously this was only configurable for the entire deployment- you could either use the K8sRunLauncher
with the default executors (in process and multiprocess) for low isolation, or you could use the CeleryK8sRunLauncher
with the celery_k8s_job_executor
for pod-level isolation. Now, your instance can be configured with the K8sRunLauncher
and you can choose between the default executors or the k8s_job_executor.
Using the @schedule
, @resource
, or @sensor
decorator no longer requires a context parameter. If you are not using the context parameter in these, you can now do this:
@schedule(cron_schedule="\* \* \* \* \*", pipeline_name="my_pipeline")
def my_schedule():
return {}
@resource
def my_resource():
return "foo"
@sensor(pipeline_name="my_pipeline")
def my_sensor():
return RunRequest(run_config={})
Dynamic mapping and collect features are no longer marked “experimental”. DynamicOutputDefinition
and DynamicOutput
can now be imported directly from dagster
.
Added repository_name property on SensorEvaluationContext
, which is name of the repository that the sensor belongs to.
get_mapping_key
is now available on SolidExecutionContext
, allowing for discerning which downstream branch of a DynamicOutput
you are in.
When viewing a run in Dagit, you can now download its debug file directly from the run view. This can be loaded into dagit-debug.
[dagster-dbt] A new dbt_cli_resource
simplifies the process of working with dbt projects in your pipelines, and allows for a wide range of potential uses. Check out the integration guide for examples!
k8s_job_executor
that caused solid tag user defined Kubernetes config to not be applied to the Kubernetes jobs.The deprecated SystemCronScheduler
and K8sScheduler
schedulers have been removed. All schedules are now executed using the dagster-daemon proess. See the deployment docs for more information about how to use the dagster-daemon
process to run your schedules.
If you have written a custom run launcher, the arguments to the launch_run
function have changed in order to enable faster run launches. launch_run
now takes in a LaunchRunContext
object. Additionally, run launchers should now obtain the PipelinePythonOrigin
to pass as an argument to dagster api execute_run
. See the implementation of DockerRunLauncher for an example of the new way to write run launchers.
[helm] .Values.dagsterDaemon.queuedRunCoordinator
has had its schema altered. It is now referenced at .Values.dagsterDaemon.runCoordinator.
Previously, if you set up your run coordinator configuration in the following manner:
dagsterDaemon:
queuedRunCoordinator:
enabled: true
module: dagster.core.run_coordinator
class: QueuedRunCoordinator
config:
max_concurrent_runs: 25
tag_concurrency_limits: []
dequeue_interval_seconds: 30
It is now configured like:
dagsterDaemon:
runCoordinator:
enabled: true
type: QueuedRunCoordinator
config:
queuedRunCoordinator:
maxConcurrentRuns: 25
tagConcurrencyLimits: []
dequeueIntervalSeconds: 30
The method events_for_asset_key
on DagsterInstance
has been deprecated and will now issue a warning. This method was previously used in our asset sensor example code. This can be replaced by calls using the new DagsterInstance
API get_event_records
. The example code in our sensor documentation has been updated to use our new APIs as well.
The Python GraphQL client now includes a shutdown_repository_location API call that shuts down a gRPC server. This is useful in situations where you want Kubernetes to restart your server and re-create your repository definitions, even though the underlying Python code hasn’t changed (for example, if your pipelines are loaded programatically from a database)
io_manager_key and root_manager_key is disallowed on composite solids’ InputDefinitions and OutputDefinitions. Instead, custom IO managers on the solids inside composite solids will be respected:
@solid(input_defs=[InputDefinition("data", dagster_type=str, root_manager_key="my_root")])
def inner_solid(_, data):
return data
@composite_solid
def my_composite():
return inner_solid()
Schedules can now be directly invoked. This is intended to be used for testing. To learn more, see https://docs.dagster.io/master/concepts/partitions-schedules-sensors/schedules#testing-schedules
dagster-postgres
or dagster-graphql
) are now pinned to the same version as the core dagster
package. This should reduce instances of issues due to backwards compatibility problems between Dagster packages.Invoking a generator solid now yields a generator, and output objects are not unpacked.
@solid
def my_solid():
yield Output("hello")
assert isinstance(list(my_solid())[0], Output)
EcsRunLauncher
. This creates a new ECS Task Definition and launches a new ECS Task for each run. You can use the new ECS Reference Deployment to experiment with the EcsRunLauncher
. We’d love your feedback in our #dagster-ecs Slack channel!dagit
with flag --suppress-warnings
will now ignore all warnings, such as ExperimentalWarnings.dagster-dbt
library with some helpful tips and example code (thanks @makotonium!).dagster-pyspark
documentation for providing and accessing the pyspark resource (thanks @Andrew-Crosby!).build_input_context
and build_output_context
APIs (link).examples/hacker_news
!retry_number
is now available on SolidExecutionContext
, allowing you to determine within a solid function how many times the solid has been previously retried.--version
flag when installing the Helm chart, the tags for Dagster-provided images in the Helm chart will now default to the current Chart version. For --version
<0.11.13, the image tags will still need to be updated properly to use old chart version.PIPELINE_INIT_FAILURE
event type. A failure that occurs during pipeline initialization will now produce a PIPELINE_FAILURE
as with all other pipeline failures.get_run_status
method on the Python GraphQL client now returns a PipelineRunStatus
enum instead of the raw string value in order to align with the mypy type annotation. Thanks to Dylan Bienstock for surfacing this bug!k8s_job_executor
, which executes solids in separate kubernetes jobs. With the addition of this executor, you can now choose at runtime between single pod and multi-pod isolation for solids in your run. Previously this was only configurable for the entire deployment - you could either use the K8sRunLauncher with the default executors (in_process and multiprocess) for low isolation, or you could use the CeleryK8sRunLauncher with the celery_k8s_job_executor for pod-level isolation. Now, your instance can be configured with the K8sRunLauncher and you can choose between the default executors or the k8s_job_executor.DagsterGraphQLClient
now allows you to specify whether to use HTTP or HTTPS when connecting to the GraphQL server. In addition, error messages during query execution or connecting to dagit are now clearer. Thanks to @emily-hawkins for raising this issue! from dagster import build_hook_context
my_hook(build_hook_context(resources={"foo_resource": "foo"}))
from dagster import build_init_resource_context
@resource(config_schema=str)
def my_basic_resource(init_context):
return init_context.resource_config
context = build_init_resource_context(config="foo")
assert my_basic_resource(context) == "foo"
ScheduleDefinition
and SensorDefinition
now carry over properties from functions decorated by @sensor
and @schedule
. Ie: docstrings.ResourceDefinition
was not being passed to the ResourceDefinition
created by the call to configured
.IOManager
handle_output
implementation that was a generator, it would not be wrapped DagsterExecutionHandleOutputError
. Now, it is wrapped.queuedRunCoordinator
. See the docs for more information on setup.RetryPolicy
now supports backoff and jitter settings, to allow for modulating the delay
as a function of attempt number and randomness.build_schedule_context
and validate_run_config
functions are still in an experimental state. https://docs.dagster.io/master/concepts/partitions-schedules-sensors/schedules#testing-schedulesvalidate_run_config
function is still in an experimental state. https://docs.dagster.io/master/concepts/partitions-schedules-sensors/partitions#experimental-testing-a-partition-setdagit.enableReadOnly
. When enabled, a separate Dagit instance is deployed in —read-only
mode. You can use this feature to serve Dagit to users who you do not want to able to kick off new runs or make other changes to application state.EventMetadata.asset
.LOGS_CAPTURED
, which explicitly links to the captured stdout/stderr logs for a given step, as determined by the configured ComputeLogManager
on the Dagster instance. Previously, these links were available on the STEP_START
event.network
key on DockerRunLauncher
config can now be sourced from an environment variable.get_execution_data
method of SensorDefinition
and ScheduleDefinition
has been renamed to evaluate_tick
. We expect few to no users of the previous name, and are renaming to prepare for improved testing support for schedules and sensors.build_sensor_context
API. See Testing sensors.dagster.Int
) as type annotations on functions decorated with @solid
have been resolved.K8sRunLauncher
sometimes hanged while launching a run due to holding a stale Kubernetes client.context.update_cursor(str_value)
that is persisted across evaluations to save unnecessary computation. This persisted string value is made available on the context as context.cursor
. Previously, we encouraged cursor-like behavior by exposing last_run_key
on the sensor context, to keep track of the last time the sensor successfully requested a run. This, however, was not useful for avoiding unnecessary computation when the sensor evaluation did not result in a run request.--read-only
mode, which will disable mutations in the user interface and on the server. You can use this feature to run instances of Dagit that are visible to users who you do not want to able to kick off new runs or make other changes to application state.dagster-pandas
, the event_metadata_fn
parameter to the function create_dagster_pandas_dataframe_type
may now return a dictionary of EventMetadata
values, keyed by their string labels. This should now be consistent with the parameters accepted by Dagster events, including the TypeCheck
event.# old
MyDataFrame = create_dagster_pandas_dataframe_type(
"MyDataFrame",
event_metadata_fn=lambda df: [
EventMetadataEntry.int(len(df), "number of rows"),
EventMetadataEntry.int(len(df.columns), "number of columns"),
]
)
# new
MyDataFrame = create_dagster_pandas_dataframe_type(
"MyDataFrame",
event_metadata_fn=lambda df: {
"number of rows": len(df),
"number of columns": len(dataframe.columns),
},
)
PandasColumn.datetime_column()
now has a new tz
parameter, allowing you to constrain the column to a specific timezone (thanks @mrdavidlaing
!)DagsterGraphQLClient
now takes in an optional transport
argument, which may be useful in cases where you need to authenticate your GQL requests:authed_client = DagsterGraphQLClient(
"my_dagit_url.com",
transport=RequestsHTTPTransport(..., auth=<some auth>),
)
ecr_public_resource
to get login credentials for the AWS ECR Public Gallery. This is useful if any of your pipelines need to push images.max_completion_wait_time_seconds
configuration option, which controls how long it will wait for a Databricks job to complete before exiting.build_solid_context
function can be used to provide a context to the invocation.from dagster import build_solid_context
@solid
def basic_solid():
return "foo"
assert basic_solid() == 5
@solid
def add_one(x):
return x + 1
assert add_one(5) == 6
@solid(required_resource_keys={"foo_resource"})
def solid_reqs_resources(context):
return context.resources.foo_resource + "bar"
context = build_solid_context(resources={"foo_resource": "foo"})
assert solid_reqs_resources(context) == "foobar"
build_schedule_context
allows you to build a ScheduleExecutionContext
using a DagsterInstance
. This can be used to test schedules.from dagster import build_schedule_context
with DagsterInstance.get() as instance:
context = build_schedule_context(instance)
my_schedule.get_execution_data(context)
build_sensor_context
allows you to build a SensorExecutionContext
using a DagsterInstance
. This can be used to test sensors.
from dagster import build_sensor_context
with DagsterInstance.get() as instance:
context = build_sensor_context(instance)
my_sensor.get_execution_data(context)
build_input_context
and build_output_context
allow you to construct InputContext
and OutputContext
respectively. This can be used to test IO managers.from dagster import build_input_context, build_output_context
io_manager = MyIoManager()
io_manager.load_input(build_input_context())
io_manager.handle_output(build_output_context(), val)
build_input_context
/build_output_context
must be used as a context manager.with build_input_context(resources={"cm_resource": my_cm_resource}) as context:
io_manager.load_input(context)
validate_run_config
can be used to validate a run config blob against a pipeline definition & mode. If the run config is invalid for the pipeline and mode, this function will throw an error, and if correct, this function will return a dictionary representing the validated run config that Dagster uses during execution.validate_run_config(
{"solids": {"a": {"config": {"foo": "bar"}}}},
pipeline_contains_a
) # usage for pipeline that requires config
validate_run_config(
pipeline_no_required_config
) # usage for pipeline that has no required config
RetryPolicy
has been added. This allows you to declare automatic retry behavior when exceptions occur during solid execution. You can set retry_policy
on a solid invocation, @solid
definition, or @pipeline
definition.@solid(retry_policy=RetryPolicy(max_retries=3, delay=5))
def fickle_solid(): # ...
@pipeline( # set a default policy for all solids
solid_retry_policy=RetryPolicy()
)
def my_pipeline(): # will use the pipelines policy by default
some_solid()
# solid definition takes precedence over pipeline default
fickle_solid()
# invocation setting takes precedence over definition
fickle_solid.with_retry_policy(RetryPolicy(max_retries=2))
dagster/priority
tag directly on pipeline definitions would cause an error. This has been fixed.create_dagster_pandas_dataframe_type()
function would, in some scenarios, not use the specified materializer
argument when provided. This has been fixed (thanks @drewsonne
!)dagster-graphql --remote
now sends the query and variables as post body data, avoiding uri length limit issues.sqlalchemy.Engine
objects would be invalidated after 8 hours of idle time due to MySQL’s default configuration, resulting in an sqlalchemy.exc.OperationalError
when attempting to view pages in Dagit in long-running deployments.asOf
URL parameter, which shows a snapshot of the asset at the provided timestamp, including parent materializations as of that time.make dev_install
has been fixed.reload_repository_location
and submit_pipeline_execution
have been fixed - the underlying GraphQL queries had a missing inline fragment case.@solid
decorator can now wrap a function without a context
argument, if no context information is required. For example, you can now do:@solid
def basic_solid():
return 5
@solid
def solid_with_inputs(x, y):
return x + y
however, if your solid requires config or resources, then you will receive an error at definition time.
metadata_entries
argument may now instead accept a metadata
argument, which should allow for a more convenient API. The metadata
argument takes a dictionary with string labels as keys and EventMetadata
values. Some base types (str
, int
, float
, and JSON-serializable list
/dict
s) are also accepted as values and will be automatically coerced to the appropriate EventMetadata
value. For example:@solid
def old_metadata_entries_solid(df):
yield AssetMaterialization(
"my_asset",
metadata_entries=[
EventMetadataEntry.text("users_table", "table name"),
EventMetadataEntry.int(len(df), "row count"),
EventMetadataEntry.url("http://mysite/users_table", "data url")
]
)
@solid
def new_metadata_solid(df):
yield AssetMaterialization(
"my_asset",
metadata={
"table name": "users_table",
"row count": len(df),
"data url": EventMetadata.url("http://mysite/users_table")
}
)
--heartbeat-tolerance
argument that allows you to configure how long the process can run before shutting itself down due to a hanging thread. This parameter can be used to troubleshoot failures with the daemon process.PartitionSetDefinition.create_schedule_definition
, the partition_selector
function that determines which partition to use for a given schedule tick can now return a list of partitions or a single partition, allowing you to create schedules that create multiple runs for each schedule tick.KeyError
.Dict
and Set
types for solid inputs/outputs now works as expected. Previously a structure like Dict[str, Dict[str, Dict[str, SomeClass]]]
could result in confusing errors.solid_config
.map
and collect
steps downstream of other map
and collect
steps to mysteriously not execute when using multiprocess executors has been resolved.solid_exception
on HookContext
which returns the actual exception object thrown in a failed solid. See the example “Accessing failure information in a failure hook“ for more details.solid_output_values
on HookContext
which returns the computed output values.make_values_resource
helper for defining a resource that passes in user-defined values. This is useful when you want multiple solids to share values. See the example for more details.--path-prefix
, our color-coded favicons denoting the success or failure of a run were not loading properly. This has been fixed.DagsterInstance.get()
no longer falls back to an ephemeral instance if DAGSTER_HOME
is not set. We don’t expect this to break normal workflows. This change allows our tooling to be more consistent around it’s expectations. If you were relying on getting an ephemeral instance you can use DagsterInstance.ephemeral()
directly.HookContext
have been removed. step_key
and mode_def
have been documented as attributes.config_schema
for all configurable objects - solids, resources, IO managers, composite solids, executors, loggers - is now Any
. This means that you can now use configuration without explicitly providing a config_schema
. Refer to the docs for more details: https://docs.dagster.io/concepts/configuration/config-schema.input_defs
and output_defs
on @solid
will now flexibly combine data that can be inferred from the function signature that is not declared explicitly via InputDefinition
/ OutputDefinition
. This allows for more concise defining of solids with reduced repetition of information.dagster-daemon
process within the last 5 minutes. Previously, it would only display errors from the last 30 seconds.dagster-daemon
process.DockerRunLauncher
now accepts a container_kwargs
config parameter, allowing you to specify any argument to the run container that can be passed into the Docker containers.run method. See https://docker-py.readthedocs.io/en/stable/containers.html#docker.models.containers.ContainerCollection.run for the full list of available options.celery_k8s_job_executor
now accepts a job_wait_timeout
allowing you to override the default of 24 hours.--prefix-path
argument.Requested
state.output_config_schema
or input_config_schema
arguments of @io_manager
, the config would still be required. Now, the config is not required.Assets
catalog, where the view switcher would flip back from the directory view to the flat view when navigating into subdirectories.dagster-daemon
process would crash if it experienced a transient connection error while connecting to the Dagster database.dagster-airflow scaffold
command would raise an exception if a preset was specified.ModeDefinition
that are not required by a pipeline no longer require runtime configuration. This should make it easier to share modes or resources among multiple pipelines.RetryRequested
is yielded from a notebook using dagstermill.yield_event
.--path-prefix
option, leading to failed GraphQL requests and broken pages. This bug was introduced in 0.11.4, and is now fixed.update_timestamp
column in the runs table is now updated with a UTC timezone, making it consistent with the create_timestamp
column.dagster-pandas
on pandas
. You can now include any version of pandas. (https://github.com/dagster-io/dagster/issues/3350)requests
in dagster
. Now only dagit
depends on requests.
pyrsistent
in dagster
.--config
help message (thanks @pawelad !)execute_pipeline
, the system would use the io manager that handled each output to perform the retrieval. Now, when using execute_pipeline
with the default in-process executor, the system directly captures the outputs of solids for use with the result object returned by execute_pipeline
. This may lead to slightly different behavior when retrieving outputs if switching between executors and using custom IO managers.K8sRunLauncher
and CeleryK8sRunLauncher
now add a dagster/image
tag to pipeline runs to document the image used. The DockerRunLauncher
has also been modified to use this tag (previously it used docker/image
)..
key shortcut to toggle visibility.@solid
can now decorate async def functions.PartitionGraphFragment
has been fixed.pipeline_name
that is not present in the current repository will now error out when the repository is created.generatePostgresqlPasswordSecret
toggle was added to allow the Helm chart to reference an external secret containing the Postgresql password (thanks @PenguinToast !)dagit.workspace
, which can be useful if you are managing your user deployments in a separate Helm release.dict
values in run_config
targeting Permissive
/ dict
config schemas, the ordering is now preserved.EventMetadataEntry.int
greater than 32 bits no longer cause dagit
errors.PresetDefinition.with_additional_config
no longer errors if the base config was empty (thanks @esztermarton !)StatusCode.RESOURCE_EXHAUSTED
for a large number of run requests, especially when the requested run configs were large.Community Contributions
dagster new project
now scaffolds setup.py
using your local dagster
pip version (thanks @taljaards!)New
description
is provided to the solid decorator, the docstring will now be used as the solid’s description.Bugfixes
dagster api execute_step
will mistakenly skip a step and output a non-DagsterEvent log. This affected the celery_k8s_job_executor
.Integrations
Community Contributions
dagster new-project
, which broke on the 0.11.0 release (Thank you @saulius!)New
Bugfixes
--path-prefix option
. Custom fonts and their CSS have now been removed, and system fonts are now used for both normal and monospace text.AssetKeys
to solid outputs through either the OutputDefinition
or IOManager
, which allows Dagster to automatically generate asset lineage information for assets referenced in this way. Direct parents of an asset will appear in the Dagit Asset Catalog. See the asset docs to learn more.DynamicOutput
and map
from the last release, this release includes the ability to collect
over dynamically mapped outputs. You can see an example here.partition_days_offset
argument to the @daily_schedule
decorator that allows you to customize which partition is used for each execution of your schedule. The default value of this parameter is 1
, which means that a schedule that runs on day N will fill in the partition for day N-1. To create a schedule that uses the partition for the current day, set this parameter to 0
, or increase it to make the schedule use an earlier day’s partition. Similar arguments have also been added for the other partitioned schedule decorators (@monthly_schedule
, @weekly_schedule
, and @hourly_schedule
).ardescription
parameter that takes in a human-readable string description and displays it on the corresponding landing page in Dagit.AssetMaterialization
now accepts a tags
argument. Tags can be used to filter assets in Dagit.QueuedRunCoordinator
daemon is now more resilient to errors while dequeuing runs. Previously runs which could not launch would block the queue. They will now be marked as failed and removed from the queue.dagster-daemon
process uses fewer resources and spins up fewer subprocesses to load pipeline information. Previously, the scheduler, sensor, and run queue daemon each spun up their own process for this–now they share a single process.dagster-daemon
process now runs each of its daemons in its own thread. This allows the scheduler, sensor loop, and daemon for launching queued runs to run in parallel, without slowing each other down.workspace.yaml
file to load your pipelines, you can now specify an environment variable for the server’s hostname and port.dagster run delete
CLI command to delete a run and its associated event log entries.fs_io_manager
now defaults the base directory to base_dir
via the Dagster instance’s local_artifact_storage
configuration. Previously, it defaulted to the directory where the pipeline was executed.handle_output
, load_input
, or a type check function, the log output now includes context about which input or output the error occurred during.BoolSource
config type (similar to the StringSource
type). The config value for this type can be a boolean literal or a pointer to an environment variable that is set to a boolean value.DagsterNoStepsToExecuteException
.OutputContext
passed to the has_output
method of MemoizableIOManager
now includes a working log
.workspace.yaml
file without restarting Dagit. To reload your workspace, navigate to the Status page and press the “Reload all” button in the Workspace section.step
and type
filtering now offers fuzzy search, all log event types are now searchable, and visual bugs within the input have been repaired. Additionally, the default setting for “Hide non-matches” has been flipped to true
.grpc_server
repository location, Dagit will automatically detect changes and prompt you to reload when the remote server updates.dagster asset wipe
.snowflake_resource
can now be configured to use the SQLAlchemy connector (thanks @basilvetas!)seed
and docs generate
are now available as solids in the library dagster-dbt
. (thanks @dehume-drizly!)dagster-spark
config schemas now support loading values for all fields via environment variables.gcs_pickle_io_manager
now also retries on 403 Forbidden errors, which previously would only retry on 429 TooManyRequests.K8sRunLauncher
and CeleryK8sRunLauncher
no longer reload the pipeline being executed just before launching it. The previous behavior ensured that the latest version of the pipeline was always being used, but was inconsistent with other run launchers. Instead, to ensure that you’re running the latest version of your pipeline, you can refresh your repository in Dagit by pressing the button next to the repository name.userDeployments.deployments
in the Helm chart, replicaCount
now defaults to 1 if not specified.dagster/dagster-k8s
and dagster/dagster-celery-k8s
can be used for all processes which don't require user code (Dagit, Daemon, and Celery workers when using the CeleryK8sExecutor). user-code-example
can be used for a sample user repository. The prior images (k8s-dagit
, k8s-celery-worker
, k8s-example
) are deprecated.dagster-k8s
, dagster-celery-k8s
, user-code-example
, and k8s-dagit-example
images to a public ECR registry in addition to DockerHub. If you are encountering rate limits when attempting to pull images from DockerHub, you should now be able to pull these images from public.ecr.aws/dagster..Values.dagsterHome
is now a global variable, available at .Values.global.dagsterHome
..Values.global.postgresqlSecretName
has been introduced, for subcharts to access the Dagster Helm chart’s generated Postgres secret properly..Values.userDeployments
has been renamed .Values.dagster-user-deployments
to reference the subchart’s values. When using Dagster User Deployments, enabling .Values.dagster-user-deployments.enabled
will create a workspace.yaml
for Dagit to locate gRPC servers with user code. To create the actual gRPC servers, .Values.dagster-user-deployments.enableSubchart
should be enabled. To manage the gRPC servers in a separate Helm release, .Values.dagster-user-deployments.enableSubchart
should be disabled, and the subchart should be deployed in its own helm release.Schedules now run in UTC (instead of the system timezone) if no timezone has been set on the schedule. If you’re using a deprecated scheduler like SystemCronScheduler
or K8sScheduler
, we recommend that you switch to the native Dagster scheduler. The deprecated schedulers will be removed in the next Dagster release.
Names provided to alias
on solids now enforce the same naming rules as solids. You may have to update provided names to meet these requirements.
The retries
method on Executor
should now return a RetryMode
instead of a Retries
. This will only affect custom Executor
classes.
Submitting partition backfills in Dagit now requires dagster-daemon
to be running. The instance setting in dagster.yaml
to optionally enable daemon-based backfills has been removed, because all backfills are now daemon-based backfills.
# removed, no longer a valid setting in dagster.yaml
backfill:
daemon_enabled: true
The corresponding value flag dagsterDaemon.backfill.enabled
has also been removed from the Dagster helm chart.
dagster.yaml
has been removed. The sensor daemon now runs in a continuous loop so this customization is no longer useful.# removed, no longer a valid setting in dagster.yaml
sensor_settings:
interval_seconds: 10
instance
argument to RunLauncher.launch_run
has been removed. If you have written a custom RunLauncher, you’ll need to update the signature of that method. You can still access the DagsterInstance
on the RunLauncher
via the _instance
parameter.has_config_entry
, has_configurable_inputs
, and has_configurable_outputs
properties of solid
and composite_solid
have been removed.name
argument to PipelineDefinition
has been removed, and the argument is now required.execute_run_with_structured_logs
and execute_step_with_structured_logs
internal CLI entry points have been removed. Use execute_run
or execute_step
instead.python_environment
key has been removed from workspace.yaml
. Instead, to specify that a repository location should use a custom python environment, set the executable_path
key within a python_file
, python_module
, or python_package
key. See the docs for more information on configuring your workspace.yaml
file.read
or to
keys accordingly.Bugfixes
Community Contributions
seed
and docs generate
are now available as solids in the
library dagster-dbt
. (thanks @dehume-drizly!)New
Dagit now has a global search feature in the left navigation, allowing you to jump quickly to pipelines, schedules, and sensors across your workspace. You can trigger search by clicking the search input or with the / keyboard shortcut.
Timestamps in Dagit have been updated to be more consistent throughout the app, and are now localized based on your browser’s settings.
Adding SQLPollingEventWatcher
for alternatives to filesystem or DB-specific listen/notify
functionality
We have added the BoolSource
config type (similar to the StringSource
type). The config value for
this type can be a boolean literal or a pointer to an environment variable that is set to a boolean
value.
The QueuedRunCoordinator
daemon is now more resilient to errors while dequeuing runs. Previously
runs which could not launch would block the queue. They will now be marked as failed and removed
from the queue.
When deploying your own gRPC server for your pipelines, you can now specify that connecting to that
server should use a secure SSL connection. For example, the following workspace.yaml
file specifies
that a secure connection should be used:
load_from:
- grpc_server:
host: localhost
port: 4266
location_name: "my_grpc_server"
ssl: true
The dagster-daemon
process uses fewer resources and spins up fewer subprocesses to load pipeline
information. Previously, the scheduler, sensor, and run queue daemon each spun up their own process
for this–now they share a single process.
Integrations
dagster-k8s
, dagster-celery-k8s
, user-code-example
, and
k8s-dagit-example
images to a public ECR registry in addition to DockerHub. If you are
encountering rate limits when attempting to pull images from DockerHub, you should now be able to
pull these images from public.ecr.aws/dagster.dagster-spark
config schemas now support loading values for all fields via
environment variables.Bugfixes
redis.internal
is set to True
in helm chart.dagster-daemon
process sometimes left dangling subprocesses running
during sensor execution, causing excess resource usage.tag
method on solid invocations (as opposed to solid
definitions) are now correctly propagated during execution. They were previously being ignored.Experimental
MySQL (via dagster-mysql) is now supported as a backend for event log, run, & schedule storages. Add the following to your dagster.yaml to use MySQL for storage:
run_storage:
module: dagster_mysql.run_storage
class: MySQLRunStorage
config:
mysql_db:
username: { username }
password: { password }
hostname: { hostname }
db_name: { database }
port: { port }
event_log_storage:
module: dagster_mysql.event_log
class: MySQLEventLogStorage
config:
mysql_db:
username: { username }
password: { password }
hostname: { hostname }
db_name: { db_name }
port: { port }
schedule_storage:
module: dagster_mysql.schedule_storage
class: MySQLScheduleStorage
config:
mysql_db:
username: { username }
password: { password }
hostname: { hostname }
db_name: { db_name }
port: { port }
New
dagster instance migrate
to upgrade.step
and type
filtering now offer fuzzy search, all log event types are now searchable, and visual bugs within the input have been repaired. Additionally, the default setting for “Hide non-matches” has been flipped to true
.dagster-daemon
process now runs faster when running multiple schedulers or sensors from the same repository.fs_io_manager
now defaults the base directory to base_dir
via the Dagster instance’s local_artifact_storage
configuration. Previously, it defaults to the directory where the pipeline is executed.versioned_filesystem_io_manager
and custom_path_fs_io_manager
now require base_dir
as part of the resource configs. Previously, the base_dir
defaulted to the directory where the pipeline was executed.dagster instance migrate
and configuring your instance with the following settings in dagster.yaml
:There is a corresponding flag in the Dagster helm chart to enable this instance configuration. See the Helm chart’s values.yaml
file for more information.
description
parameter that takes in a human-readable string description and displays it on the corresponding landing page in Dagit.Integrations
gcs_pickle_io_manager
now also retries on 403 Forbidden errors, which previously would only retry on 429 TooManyRequests.Bug Fixes
Tuple
with nested inner types in solid definitions no longer causes GraphQL errorsdagster new-repo
should now properly generate subdirectories and files, without needing to install dagster
from source (e.g. with pip install --editable
).Dependencies
pendulum
datetime/timezone library.Documentation
New
dagster run delete
CLI command to delete a run and its associated event log entries.partition_days_offset
argument to the @daily_schedule
decorator that allows you to customize which partition is used for each execution of your schedule. The default value of this parameter is 1
, which means that a schedule that runs on day N will fill in the partition for day N-1. To create a schedule that uses the partition for the current day, set this parameter to 0
, or increase it to make the schedule use an earlier day’s partition. Similar arguments have also been added for the other partitioned schedule decorators (@monthly_schedule
, @weekly_schedule
, and @hourly_schedule
).dagster new-repo
command now includes a workspace.yaml file for your new repository.workspace.yaml
file to load your pipelines, you can now specify an environment variable for the server’s hostname and port. For example, this is now a valid workspace:load_from:
- grpc_server:
host:
env: FOO_HOST
port:
env: FOO_PORT
Integrations
K8sRunLauncher
and CeleryK8sRunLauncher
no longer reload the pipeline being executed just before launching it. The previous behavior ensured that the latest version of the pipeline was always being used, but was inconsistent with other run launchers. Instead, to ensure that you’re running the latest version of your pipeline, you can refresh your repository in Dagit by pressing the button next to the repository name.Bug Fixes
--path-prefix
option. This has been fixed.ModeDefinition
that contains a single executor, that executor is now selected by default.reconstructable
on pipelines with that were also decorated with hooks no longer raises an error.dagster-daemon liveness-check
command previously returned false when daemons surfaced non-fatal errors to be displayed in Dagit, leading to crash loops in Kubernetes. The command has been fixed to return false only when the daemon has stopped running.OutputDefinition
s with io_manager_key
s, or InputDefinition
s with root_manager_key
s, but any of the modes provided for the pipeline definition do not include a resource definition for the required key, Dagster now raises an error immediately instead of when the pipeline is executed.dagster-dbt
has been updated to handle the new run_results.json
schema for dbt 0.19.0.Dependencies
Documentation
Community Contributions
/License
for packages that claim distribution under Apache-2.0 (thanks @bollwyvl!)New
dagster/dagster-k8s
and dagster/dagster-celery-k8s
can be
used for all processes which don't require user code (Dagit, Daemon, and Celery workers when using the CeleryK8sExecutor). user-code-example
can
be used for a sample user repository. The prior images (k8s-dagit
, k8s-celery-worker
, k8s-example
)
are deprecated.configured
api on solids now enforces name argument as positional. The name
argument remains a keyword argument on executors. name
argument has been removed from resources, and loggers to reflect that they are anonymous. Previously, you would receive an error message if the name
argument was provided to configured
on resources or loggers.minimum_interval_seconds
field, the overall sensor daemon interval can now be configured in the dagster.yaml
instance settings with:sensor_settings:
interval_seconds: 30 # (default)
This changes the interval at which the daemon checks for sensors which haven't run within their minimum_interval_seconds
.
TypeCheck
dagster-daemon
process now runs each of its daemons in its own thread. This allows the scheduler, sensor loop, and daemon for launching queued runs to run in parallel, without slowing each other down. The dagster-daemon
process will shut down if any of the daemon threads crash or hang, so that the execution environment knows that it needs to be restarted.dagster new-repo
is a new CLI command that generates a Dagster repository with skeleton code in your filesystem. This CLI command is experimental and it may generate different files in future versions, even between dot releases. As of 0.10.5, dagster new-repo
does not support Windows. See here for official API docs.grpc_server
repository location, Dagit will automatically detect changes and prompt you to reload when the remote server updates.Integrations
Bugfixes
Bugfixes
New
minimum_interval_seconds
argument, which determines the minimum amount of time between sensor evaluations.Bugfixes
-n
/--max_workers
default value for the dagster api grpc
command to be None
. When set to None
, the gRPC server will use the default number of workers which is based on the CPU count. If you were previously setting this value to 1
, we recommend removing the argument or increasing the number.Community Contributions
New
Bugfixes
Fixed an issue where run start times and end times were displayed in the wrong timezone in Dagit when using Postgres storage.
Schedules with partitions that weren’t able to execute due to not being able to find a partition will now display the name of the partition they were unable to find on the “Last tick” entry for that schedule.
Improved timing information display for queued and canceled runs within the Runs table view and on individual Run pages in Dagit.
Improvements to the tick history view for schedules and sensors.
Fixed formatting issues on the Dagit instance configuration page.
Miscellaneous Dagit bugfixes and improvements.
The dagster pipeline launch command will now respect run concurrency limits if they are applied on your instance.
Fixed an issue where re-executing a run created by a sensor would cause the daemon to stop executing any additional runs from that sensor.
Sensor runs with invalid run configuration will no longer create a failed run - instead, an error will appear on the page for the sensor, allowing you to fix the configuration issue.
General dagstermill housekeeping: test refactoring & type annotations, as well as repinning ipykernel to solve #3401
Documentation
Community Contributions
k8s-example
by 25% (104 MB) (thanks @alex-treebeard and @mrdavidlaing!)snowflake_resource
can now be configured to use the SQLAlchemy connector (thanks @basilvetas!)New
userDeployments.deployments
in the Helm chart, replicaCount
now defaults to 1 if not specified.Bugfixes
env
, envConfigMaps
, and envSecrets
.Documentation
QueuedRunCoordinator
to limit run concurrency.SystemCronScheduler
or K8sScheduler
to the new scheduler.IOManager
abstraction provides a new, streamlined primitive for granular control over where
and how solid outputs are stored and loaded. This is intended to replace the (deprecated)
intermediate/system storage abstractions, See the
IO Manager Overview for more
information.required_resource_keys
parameter on @resource
.DynamicOutputDefinition
API.
Dagster can now map the downstream dependencies over a dynamic output at runtime.Dropping Python 2 support
Removal of deprecated APIs
These APIs were marked for deprecation with warnings in the 0.9.0 release, and have been removed in the 0.10.0 release.
input_hydration_config
has been removed. Use the dagster_type_loader
decorator
instead.output_materialization_config
has been removed. Use dagster_type_materializer
instead.SystemStorageDefinition
,
@system_storage
, and default_system_storage_defs
. Use the new IOManagers
API instead. See
the IO Manager Overview for more
information.config_field
argument on decorators and definitions classes has been removed and replaced
with config_schema
. This is a drop-in rename.step_keys_to_execute
to the functions reexecute_pipeline
and
reexecute_pipeline_iterator
has been removed. Use the step_selection
argument to select
subsets for execution instead.repository
key in your workspace.yaml
;
use load_from
instead. See the
Workspaces Overview for
documentation about how to define a workspace.Breaking API Changes
SolidExecutionResult.compute_output_event_dict
has been renamed to
SolidExecutionResult.compute_output_events_dict
. A solid execution result is returned from
methods such as result_for_solid
. Any call sites will need to be updated..compute
suffix is no longer applied to step keys. Step keys that were previously named
my_solid.compute
will now be named my_solid
. If you are using any API method that takes a
step_selection argument, you will need to update the step keys accordingly.pipeline_def
property has been removed from the InitResourceContext
passed to functions
decorated with @resource
.Dagstermill
If you are using define_dagstermill_solid
with the output_notebook
parameter set to True
,
you will now need to provide a file manager resource (subclass of
dagster.core.storage.FileManager
) on your pipeline mode under the resource key "file_manager"
,
e.g.:
from dagster import ModeDefinition, local_file_manager, pipeline
from dagstermill import define_dagstermill_solid
my_dagstermill_solid = define_dagstermill_solid("my_dagstermill_solid", output_notebook=True, ...)
@pipeline(mode_defs=[ModeDefinition(resource_defs={"file_manager": local_file_manager})])
def my_dagstermill_pipeline():
my_dagstermill_solid(...)
Helm Chart
scheduler
values in the helm chart has changed. Instead of a simple toggle
on/off, we now require an explicit scheduler.type
to specify usage of the
DagsterDaemonScheduler
, K8sScheduler
, or otherwise. If your specified scheduler.type
has
required config, these fields must be specified under scheduler.config
.snake_case
fields have been changed to camelCase
. Please update your values.yaml
as follows:pipeline_run
→ pipelineRun
dagster_home
→ dagsterHome
env_secrets
→ envSecrets
env_config_maps
→ envConfigMaps
celery
and k8sRunLauncher
have now been consolidated under the Helm value
runLauncher
for simplicity. Use the field runLauncher.type
to specify usage of the
K8sRunLauncher
, CeleryK8sRunLauncher
, or otherwise. By default, the K8sRunLauncher
is
enabled.CeleryK8sRunLauncher
, you should explicitly enable your message broker of choice.userDeployments
are now enabled by default.Event log messages streamed to stdout
and stderr
have been streamlined to be a single line
per event.
Experimental support for memoization and versioning lets you execute pipelines incrementally, selecting which solids need to be rerun based on runtime criteria and versioning their outputs with configurable identifiers that capture their upstream dependencies.
To set up memoized step selection, users can provide a MemoizableIOManager
, whose has_output
function decides whether a given solid output needs to be computed or already exists. To execute
a pipeline with memoized step selection, users can supply the dagster/is_memoized_run
run tag
to execute_pipeline
.
To set the version on a solid or resource, users can supply the version
field on the definition.
To access the derived version for a step output, users can access the version
field on the
OutputContext
passed to the handle_output
and load_input
methods of IOManager
and the
has_output
method of MemoizableIOManager
.
Schedules that are executed using the new DagsterDaemonScheduler
can now execute in any
timezone by adding an execution_timezone
parameter to the schedule. Daylight Savings Time
transitions are also supported. See the
Schedules Overview for
more information and examples.
Helm
We've added schema validation to our Helm chart. You can now check that your values YAML file is correct by running:
helm lint helm/dagster -f helm/dagster/values.yaml
Added support for resource annotations throughout our Helm chart.
Added Helm deployment of the dagster daemon & daemon scheduler.
Added Helm support for configuring a compute log manager in your dagster instance.
User code deployments now include a user ConfigMap
by default.
Changed the default liveness probe for Dagit to use httpGet "/dagit_info"
instead of
tcpSocket:80
Dagster-K8s [Kubernetes]
Dagster-Celery-K8s
dagster-docker
library with a DockerRunLauncher
that launches each run in its own
Docker container. (See Deploying with Docker docs
for an example.)create_databricks_job_solid
for creating solids that launch Databricks jobs.Bugfixes
New
Bugfixes
Community Contributions
Bugfixes
mher/flower:0.9.5
for the Flower pod.New
dagster schedule up
or press the Reconcile button before turning on a new schedule for the first timeCommunity Contributions
Bugfixes
Experimental
New
Bugfixes
@pipeline
decorated functions with -> None typing no longer cause unexpected problems.Breaking Changes
CliApiRunLauncher
and GrpcRunLauncher
have been combined into DefaultRunLauncher
.
If you had one of these run launchers in your dagster.yaml
, replace it with DefaultRunLauncher
or remove the run_launcher:
section entirely.New
Bugfixes
dagster-k8s/config
) will now be
passed to the k8s jobs when using the dagster-k8s
and dagster-celery-k8s
run launchers.
Previously, only user-defined k8s config in the pipeline definition’s tag was passed down.Experimental
QueuedRunCoordinator
enables limiting the number of concurrent runs.
The DefaultRunCoordinator
launches jobs directly from Dagit, preserving existing behavior.New
Community contributions
Bug fixes
PipelineDefinition
's that do not meet resource requirements for its types will now fail at definition timeDeprecated
@pipeline
. This return value actually had no impact at all and was ignored, but we are making changes that will use that value in the future. By changing your code to not return anything now you will avoid any breaking changes with zero user-visible impact.Breaking Changes
DagsterKubernetesPodOperator
in dagster-airflow
.execute_plan
mutation from dagster-graphql
.ModeDefinition
, PartitionSetDefinition
, PresetDefinition
, @repository
, @pipeline
, and ScheduleDefinition
names must pass the regular expression r"^[A-Za-z0-9_]+$"
and not be python keywords or disallowed names. See DISALLOWED_NAMES
in dagster.core.definitions.utils
for exhaustive list of illegal names.dagster-slack
is now upgraded to use slackclient 2.x - this means that this resource will only support Python 3.6 and above.dagster api grpc-health-check
cli command present in Dagster 0.9.16
and later.New
K8sRunLauncher
, in place of the CeleryK8sRunLauncher
.Community Contributions
--limit
flag on the dagster run list
command (Thanks @haydarai!)Bugfixes
Breaking Changes
executeRunInProcess
was removed from dagster-graphql
.New
Community Contributions
Bugfixes
Experimental
Documentation
New
Community Contributions
Bugfixes
Experimental
Bugfixes
build_reconstructable_pipeline
.Documentations
Breaking Changes
Community Contributions
EventMetadataEntry
(Thanks @ChocoletMousse!)build_composite_solid_definition
method to Lakehouse (Thanks @sd2k!)New
Bugfixes
DagsterInvalidAssetKey
error so that it no longer fails upon being thrownDocumentation
Breaking Changes
compute
option from Dask DataFrame materialization configs for all output types. Setting this option to False
(default True
) would result in a future that is never computed, leading to missing materializationsCommunity Contributions
New
$DAGSTER_HOME
if it is not set or improperly setup when starting up a Dagster instancedagster debug export
- a new CLI entry added for exporting a run by id to a filedagit-debug
- a new CLI added for loading dagit with a run to debugdagit
now has a button to download the debug file for a run via the action menu on the runs pagedagster api grpc
command now defaults to the current working directory if none is specifiedBugfixes
dagit --empty-workspace
None
when an iterable is expected, causing errors in the celery execution loop.Experimental
New
Bugfixes
Experimental
Documentation
New
reexecute_pipeline
now takes step_selection
, which accepts queries like *solid_a.compute++
(i.e., solid_a.compute
, all of its ancestors, its immediate descendants, and their immediate descendants). steps_to_execute
is deprecated and will be removed in 0.10.0.Community contributions
Bugfixes
Dagit
Schedules
clickable and link to View All page in the schedule section.Experimental
house update --module package.module —assets my_asset*
Documentation
Bugfixes
New
Community contributions
Bugfixes
Experimental
New
Bugfixes
Docs
Breaking Changes
configured
API on a solid or composite solid, a new solid name must be provided.New
dagster.yaml
:opt_in:
local_servers: true
PipelineDefinition
or @pipeline
, e.g. @pipeline(hook_defs={hook_a})
. It will apply the hooks on every single solid instance within the pipeline.Breaking Changes
--env
flag from CLI--host
CLI param has been renamed to --grpc_host
to avoid conflict with the dagit --host
param.New
dagster_ge
library. Example usage is under a new example, called ge_example
, and documentation for the library can be found under the libraries section of the api docs.PythonObjectDagsterType
can now take a tuple of types as well as a single type, more closely mirroring isinstance
and allowing Union types to be represented in Dagster.configured
API can now be used on all definition types (including CompositeDefinition
). Example usage has been updated in the configuration documentation.Bugfixes
Experimental
dagster-k8s/config
tag that lets users pass in custom configuration to the Kubernetes Job
, Job
metadata, JobSpec
, PodSpec
, and PodTemplateSpec
metadata. @solid(
tags = {
'dagster-k8s/config': {
'container_config': {
'resources': {
'requests': { 'cpu': '250m', 'memory': '64Mi' },
'limits': { 'cpu': '500m', 'memory': '2560Mi' },
}
},
'pod_template_spec_metadata': {
'annotations': { "cluster-autoscaler.kubernetes.io/safe-to-evict": "true"}
},
'pod_spec_config': {
'affinity': {
'nodeAffinity': {
'requiredDuringSchedulingIgnoredDuringExecution': {
'nodeSelectorTerms': [{
'matchExpressions': [{
'key': 'beta.kubernetes.io/os', 'operator': 'In', 'values': ['windows', 'linux'],
}]
}]
}
}
}
},
},
},
)
def my_solid(context):
context.log.info('running')
Breaking Changes
--env
flag no longer works for the pipeline launch
or pipeline execute
commands. Use --config
instead.pipeline execute
command no longer accepts the --workspace
argument.
To execute pipelines in a workspace, use pipeline launch
instead.New
ResourceDefinition.mock_resource
helper for magic mocking resources. Example usage can be found hererow_count
metadata entry from the Dask DataFrame type check (thanks @kinghuang!)orient
to the config options when materializing a Dask DataFrame to json
(thanks @kinghuang!)Bugfixes
configured
to a solid definition would overwrite inputs from run config.Bugfixes
dagster-k8s-celery
executor when executing solid subsetsBreaking Changes
IntermediateStore
API. IntermediateStorage
now wraps an ObjectStore, and TypeStoragePlugin
now accepts an IntermediateStorage
instance instead of an IntermediateStore
instance. (Noe that IntermediateStore
and IntermediateStorage
are both internal APIs that are used in some non-core libraries).Breaking Changes
dagit
key is no longer part of the instance configuration schema and must be removed from dagster.yaml
files before they can be used.-d
can no longer be used as a command-line argument to specify a mode. Use --mode
instead.--preset
instead of --preset-name
to specify a preset to the pipeline launch
command.config
argument to the ConfigMapping
, @composite_solid
, @solid
, SolidDefinition
, @executor
, ExecutorDefinition
, @logger
, LoggerDefinition
, @resource
, and ResourceDefinition
APIs, which we deprecated in 0.8.0. Use config_schema
instead.New
-d
or --working-directory
can be used to specify a working directory in any command that
takes in a -f
or --python_file
argument.create_dagster_pandas_dataframe_type
. This is the currently
supported API for custom pandas data frame type creation.configured
API for predefining configuration for various definitions: https://docs.dagster.io/overview/configuration/#configuredBreaking Changes
AssetMaterializations
no longer accepts a dagster_type
argument. This reverts the change
billed as "AssetMaterializations
can now have type information attached as metadata." in the
previous release.New
AssetMaterializations
can now have type information attached as metadata. See the materializations tutorial for moreBugfixes
context['ts']
was not passed properlytask_acks_late: true
that resulted in a 409 Conflict error
from Kubernetes. The creation of a Kubernetes Job will now be aborted if another Job with the same name existsDocs
New
CeleryK8sRunLauncher
supports termination of pipeline runs. This can be accessed via the
“Terminate” button in Dagit’s Pipeline Run view or via “Cancel” in Dagit’s All Runs page. This
will terminate the run master K8s Job along with all running step job K8s Jobs; steps that are
still in the Celery queue will not create K8s Jobs. The pipeline and all impacted steps will
be marked as failed. We recommend implementing resources as context managers and we will execute
the finally block upon termination.K8sRunLauncher
supports termination of pipeline runs.AssetMaterialization
events display the asset key in the Runs view.Bugfixes
DagsterInstance
was leaving database connections open due to not being
garbage collected.Enum
in resource config schemas resulted in an error.New
configured
API makes it easy to create configured versions of resources.Materialization
event type in favor of the new AssetMaterialization
event type,
which requires the asset_key
parameter. Solids yielding Materialization
events will continue
to work as before, though the Materialization
event will be removed in a future release.intermediate_storage_defs
argument to ModeDefinition
, which accepts a
list of IntermediateStorageDefinition
s, e.g. s3_plus_default_intermediate_storage_defs
.
As before, the default includes an in-memory intermediate and a local filesystem intermediate
storage.system_storage_defs
argument to ModeDefinition
in favor of
intermediate_storage_defs
. system_storage_defs
will be removed in 0.10.0 at the earliest.@intermediate_storage
decorator, which makes it easy to define intermediate
storages.s3_file_manager
and local_file_manager
resources to replace the file managers
that previously lived inside system storages. The airline demo has been updated to include
an example of how to do this:
https://github.com/dagster-io/dagster/blob/0.8.8/examples/airline_demo/airline_demo/solids.py#L171.Bugfixes
default_value
config on a field now works as expected. #2725Breaking Changes
dagster
and dagit
CLI commands no longer add the working directory to the
PYTHONPATH when resolving modules, which may break some imports. Explicitly installed python
packages can be specified in workspaces using the python_package
workspace yaml config option.
The python_module
config option is deprecated and will be removed in a future release.New
--path-prefix
to the dagit CLI. #2073date_partition_range
util function now accepts an optional inclusive
boolean argument. By default, the function does not return include the partition for which the end time of the date range is greater than the current time. If inclusive=True
, then the list of partitions returned will include the extra partition.MultiDependency
or fan-in inputs will now only cause the solid step to skip if all of the
fanned-in inputs upstream outputs were skippedBugfixes
input_hydration_config
argumentsalias
on a solid output will produce a useful error message (thanks @iKintosh!)daily_schedule
) for certain workspace.yaml formatsBreaking Changes
dagster-celery
module has been broken apart to manage dependencies more coherently. There
are now three modules: dagster-celery
, dagster-celery-k8s
, and dagster-celery-docker
.dagster-celery worker start
command now takes a required -A
parameter
which must point to the app.py
file within the appropriate module. E.g if you are using the
celery_k8s_job_executor
then you must use the -A dagster_celery_k8s.app
option when using the
celery
or dagster-celery
cli tools. Similar for the celery_docker_executor
:
-A dagster_celery_docker.app
must be used.input_hydration_config
and output_materialization_config
decorators to
dagster_type_
and dagster_type_materializer
respectively. Renamed DagsterType's
input_hydration_config
and output_materialization_config
arguments to loader
and materializer
respectively.New
New pipeline scoped runs tab in Dagit
Add the following Dask Job Queue clusters: moab, sge, lsf, slurm, oar (thanks @DavidKatz-il!)
K8s resource-requirements for run coordinator pods can be specified using the dagster-k8s/ resource_requirements
tag on pipeline definitions:
@pipeline(
tags={
'dagster-k8s/resource_requirements': {
'requests': {'cpu': '250m', 'memory': '64Mi'},
'limits': {'cpu': '500m', 'memory': '2560Mi'},
}
},
)
def foo_bar_pipeline():
Added better error messaging in dagit for partition set and schedule configuration errors
An initial version of the CeleryDockerExecutor was added (thanks @mrdrprofuroboros!). The celery workers will launch tasks in docker containers.
Experimental: Great Expectations integration is currently under development in the new library dagster-ge. Example usage can be found here
Breaking Changes
Engine
and ExecutorConfig
have been deleted in favor of Executor
. Instead of the @executor
decorator decorating a function that returns an ExecutorConfig
it should now decorate a function that returns an Executor
.New
dict
can be used as an alias for Permissive()
within a config schema declaration.StringSource
in the S3ComputeLogManager
configuration schema to support using environment variables in the configuration (Thanks @mrdrprofuroboros!)Bugfixes
$DAGSTER_HOME
environment variable is not an absolute path (Thanks @AndersonReyes!)staging_prefix
in the DatabricksPySparkStepLauncher
configuration to be an absolute path (Thanks @sd2k!)input_hydration_config
(Thanks @joeyfreund!)Bugfix
New
dagster asset wipe <asset_key>
Breaking Changes
Previously, the gcs_resource
returned a GCSResource
wrapper which had a single client
property that returned a google.cloud.storage.client.Client
. Now, the gcs_resource
returns the client directly.
To update solids that use the gcp_resource
, change:
context.resources.gcs.client
To:
context.resources.gcs
New
reexecute_pipeline
to reexecute an existing pipeline run.project
field to the gcs_resource
in dagster_gcp
.dagster asset wipe
to remove all existing asset keys.Bugfix
executeRunInProcess
.dagster schedule up
output to be repository location scopedBugfix
dagster instance migrate
.launch_scheduled_execution
that would mask configuration errors.dagster-k8s
when specifying per-step resources.New
label
optional parameter for materializations with asset_key
specified.Assets
page to have a typeahead selector and hierarchical views based on asset_key path.SSHResource
, replacing sftp_solid.Docs
Bugfix
OSError: [Errno 24] Too many open files
when enough
temporary files were created.New
Major Changes
Please see the 080_MIGRATION.md
migration guide for details on updating existing code to be
compatible with 0.8.0
Workspace, host and user process separation, and repository definition Dagit and other tools no longer load a single repository containing user definitions such as pipelines into the same process as the framework code. Instead, they load a "workspace" that can contain multiple repositories sourced from a variety of different external locations (e.g., Python modules and Python virtualenvs, with containers and source control repositories soon to come).
The repositories in a workspace are loaded into their own "user" processes distinct from the "host" framework process. Dagit and other tools now communicate with user code over an IPC mechanism. This architectural change has a couple of advantages:
We have introduced a new file format, workspace.yaml
, in order to support this new architecture.
The workspace yaml encodes what repositories to load and their location, and supersedes the
repository.yaml
file and associated machinery.
As a consequence, Dagster internals are now stricter about how pipelines are loaded. If you have
written scripts or tests in which a pipeline is defined and then passed across a process boundary
(e.g., using the multiprocess_executor
or dagstermill), you may now need to wrap the pipeline
in the reconstructable
utility function for it to be reconstructed across the process boundary.
In addition, rather than instantiate the RepositoryDefinition
class directly, users should now
prefer the @repository
decorator. As part of this change, the @scheduler
and
@repository_partitions
decorators have been removed, and their functionality subsumed under
@repository
.
Dagit organization The Dagit interface has changed substantially and is now oriented around pipelines. Within the context of each pipeline in an environment, the previous "Pipelines" and "Solids" tabs have been collapsed into the "Definition" tab; a new "Overview" tab provides summary information about the pipeline, its schedules, its assets, and recent runs; the previous "Playground" tab has been moved within the context of an individual pipeline. Related runs (e.g., runs created by re-executing subsets of previous runs) are now grouped together in the Playground for easy reference. Dagit also now includes more advanced support for display of scheduled runs that may not have executed ("schedule ticks"), as well as longitudinal views over scheduled runs, and asset-oriented views of historical pipeline runs.
Assets Assets are named materializations that can be generated by your pipeline solids, which support specialized views in Dagit. For example, if we represent a database table with an asset key, we can now index all of the pipelines and pipeline runs that materialize that table, and view them in a single place. To use the asset system, you must enable an asset-aware storage such as Postgres.
Run launchers The distinction between "starting" and "launching" a run has been effaced. All
pipeline runs instigated through Dagit now make use of the RunLauncher
configured on the
Dagster instance, if one is configured. Additionally, run launchers can now support termination of
previously launched runs. If you have written your own run launcher, you may want to update it to
support termination. Note also that as of 0.7.9, the semantics of RunLauncher.launch_run
have
changed; this method now takes the run_id
of an existing run and should no longer attempt to
create the run in the instance.
Flexible reexecution Pipeline re-execution from Dagit is now fully flexible. You may re-execute arbitrary subsets of a pipeline's execution steps, and the re-execution now appears in the interface as a child run of the original execution.
Support for historical runs Snapshots of pipelines and other Dagster objects are now persisted along with pipeline runs, so that historial runs can be loaded for review with the correct execution plans even when pipeline code has changed. This prepares the system to be able to diff pipeline runs and other objects against each other.
Step launchers and expanded support for PySpark on EMR and Databricks We've introduced a new
StepLauncher
abstraction that uses the resource system to allow individual execution steps to
be run in separate processes (and thus on separate execution substrates). This has made extensive
improvements to our PySpark support possible, including the option to execute individual PySpark
steps on EMR using the EmrPySparkStepLauncher
and on Databricks using the
DatabricksPySparkStepLauncher
The emr_pyspark
example demonstrates how to use a step launcher.
Clearer names What was previously known as the environment dictionary is now called the
run_config
, and the previous environment_dict
argument to APIs such as execute_pipeline
is
now deprecated. We renamed this argument to focus attention on the configuration of the run
being launched or executed, rather than on an ambiguous "environment". We've also renamed the
config
argument to all use definitions to be config_schema
, which should reduce ambiguity
between the configuration schema and the value being passed in some particular case. We've also
consolidated and improved documentation of the valid types for a config schema.
Lakehouse We're pleased to introduce Lakehouse, an experimental, alternative programming model
for data applications, built on top of Dagster core. Lakehouse allows developers to define data
applications in terms of data assets, such as database tables or ML models, rather than in terms
of the computations that produce those assets. The simple_lakehouse
example gives a taste of
what it's like to program in Lakehouse. We'd love feedback on whether this model is helpful!
Airflow ingest We've expanded the tooling available to teams with existing Airflow installations
that are interested in incrementally adopting Dagster. Previously, we provided only injection
tools that allowed developers to write Dagster pipelines and then compile them into Airflow DAGs
for execution. We've now added ingestion tools that allow teams to move to Dagster for execution
without having to rewrite all of their legacy pipelines in Dagster. In this approach, Airflow
DAGs are kept in their own container/environment, compiled into Dagster pipelines, and run via
the Dagster orchestrator. See the airflow_ingest
example for details!
Breaking Changes
dagster
The @scheduler
and @repository_partitions
decorators have been removed. Instances of
ScheduleDefinition
and PartitionSetDefinition
belonging to a repository should be specified
using the @repository
decorator instead.
Support for the Dagster solid selection DSL, previously introduced in Dagit, is now uniform
throughout the Python codebase, with the previous solid_subset
arguments (--solid-subset
in
the CLI) being replaced by solid_selection
(--solid-selection
). In addition to the names of
individual solids, this argument now supports selection queries like *solid_name++
(i.e.,
solid_name
, all of its ancestors, its immediate descendants, and their immediate descendants).
The built-in Dagster type Path
has been removed.
PartitionSetDefinition
names, including those defined by a PartitionScheduleDefinition
,
must now be unique within a single repository.
Asset keys are now sanitized for non-alphanumeric characters. All characters besides
alphanumerics and _
are treated as path delimiters. Asset keys can also be specified using
AssetKey
, which accepts a list of strings as an explicit path. If you are running 0.7.10 or
later and using assets, you may need to migrate your historical event log data for asset keys
from previous runs to be attributed correctly. This event_log
data migration can be invoked
as follows:
from dagster.core.storage.event_log.migration import migrate_event_log_data
from dagster import DagsterInstance
migrate_event_log_data(instance=DagsterInstance.get())
The interface of the Scheduler
base class has changed substantially. If you've written a
custom scheduler, please get in touch!
The partitioned schedule decorators now generate PartitionSetDefinition
names using
the schedule name, suffixed with _partitions
.
The repository
property on ScheduleExecutionContext
is no longer available. If you were
using this property to pass to Scheduler
instance methods, this interface has changed
significantly. Please see the Scheduler
class documentation for details.
The CLI option --celery-base-priority
is no longer available for the command:
dagster pipeline backfill
. Use the tags option to specify the celery priority, (e.g.
dagster pipeline backfill my_pipeline --tags '{ "dagster-celery/run_priority": 3 }'
The execute_partition_set
API has been removed.
The deprecated is_optional
parameter to Field
and OutputDefinition
has been removed.
Use is_required
instead.
The deprecated runtime_type
property on InputDefinition
and OutputDefinition
has been
removed. Use dagster_type
instead.
The deprecated has_runtime_type
, runtime_type_named
, and all_runtime_types
methods on
PipelineDefinition
have been removed. Use has_dagster_type
, dagster_type_named
, and
all_dagster_types
instead.
The deprecated all_runtime_types
method on SolidDefinition
and CompositeSolidDefinition
has been removed. Use all_dagster_types
instead.
The deprecated metadata
argument to SolidDefinition
and @solid
has been removed. Use
tags
instead.
The graphviz-based DAG visualization in Dagster core has been removed. Please use Dagit!
dagit
dagit-cli
has been removed, and dagit
is now the only console entrypoint.dagster-aws
dagster_aws.EmrRunJobFlowSolidDefinition
has been removed.dagster-bash
bash_command_solid
and bash_script_solid
solid factory functions have been renamed to create_shell_command_solid
and
create_shell_script_solid
.dagster-celery
--celery-base-priority
is no longer available for the command:
dagster pipeline backfill
. Use the tags option to specify the celery priority, (e.g.
dagster pipeline backfill my_pipeline --tags '{ "dagster-celery/run_priority": 3 }'
dagster-dask
dagster_dask.dask_executor
has changed. The previous config should
now be nested under the key local
.dagster-gcp
BigQueryClient
has been removed. Use bigquery_resource
instead.dagster-dbt
dagster-spark
dagster_spark.SparkSolidDefinition
has been removed - use create_spark_solid
instead.SparkRDD
Dagster type, which only worked with an in-memory engine, has been removed.dagster-twilio
TwilioClient
has been removed. Use twilio_resource
instead.New
dagster
asset_key
on any Materialization
to use the new asset system. You will also
need to configure an asset-aware storage, such as Postgres. The longitudinal_pipeline
example
demonstrates this system.end_time
.dagit
/graphiql
as well as at /graphql
.dagster-aws
dagster_aws.S3ComputeLogManager
may now be configured to override the S3 endpoint and
associated SSL settings.dagster-azure
adls2_system_storage
or, for direct access, the adls2_resource
resource. (Thanks
@sd2k!)dagster-dask
dagster_dask.dask_executor
. For full support, you will need
to install extras with pip install dagster-dask[yarn, pbs, kube]
. (Thanks @DavidKatz-il!)dagster-databricks
databricks_pyspark_step_launcher
. (Thanks @sd2k!)dagster-gcp
dagster-k8s
CeleryK8sRunLauncher
to submit execution plan steps to Celery task queues for
execution as k8s Jobs.dagster-pandas
dagster-papertrail
papertrail_logger
may now be set using either
environment variables or literals.dagster-pyspark
emr_pyspark_step_launcher
, or on Databricks using
the new dagster-databricks package. The emr_pyspark
example demonstrates how to use a step
launcher.dagster-snowflake
snowflake_resource
may now be set using either
environment variables or literals.dagster-spark
dagster_spark.create_spark_solid
now accepts a required_resource_keys
argument, which
enables setting up a step launcher for Spark solids, like the emr_pyspark_step_launcher
.Bugfix
dagster pipeline execute
now sets a non-zero exit code when pipeline execution fails.Bugfix
NoOpComputeLogManager
to be configured as the compute_logs
implementation in
dagster.yaml
New
New
dagster schedule logs {schedule_name}
command will show the log file for a given schedule. This
helps uncover errors like missing environment variables and import errors.dagster schedule debug
command. As before, these
errors can be resolve using dagster schedule up
Bugfix
dagster.yaml
Breaking Changes
dagster pipeline backfill
command no longer takes a mode
flag. Instead, it uses the mode
specified on the PartitionSetDefinition
. Similarly, the runs created from the backfill also use
the solid_subset
specified on the PartitionSetDefinition
BugFix
dagster schedule
debug
command will display issues related to missing crob jobs, extraneous cron jobs, and duplicate cron
jobs. Running dagster schedule up
will fix any issues.New
dagster
package.Bugfix
__doc__
from the function they decorate.Bugfix
dagster_celery
had introduced a spurious dependency on dagster_k8s
(#2435)New
RepositoryDefinition
now takes schedule_defs
and partition_set_defs
directly. The loading
scheme for these definitions via repository.yaml
under the scheduler:
and partitions:
keys
is deprecated and expected to be removed in 0.8.0.make_dagster_repo_from_airflow_example_dags
).dagster-celery worker start -n my-worker -- --uid=42
will pass the
--uid
flag to celery.PresetDefinition
that has no environment defined.dagster schedule debug
command to help debug scheduler state.SystemCronScheduler
now verifies that a cron job has been successfully been added to the
crontab when turning a schedule on, and shows an error message if unsuccessful.Breaking Changes
dagster instance migrate
is required for this release to support the new experimental assets
view.Path
is no longer valid in config schemas. Use str
or dagster.String
instead.@pyspark_solid
decorator - its functionality, which was experimental, is subsumed by
requiring a StepLauncher resource (e.g. emr_pyspark_step_launcher) on the solid.Dagit
Experimental
asset_key
string parameter to Materializations and created a new “Assets” tab in Dagit
to view pipelines and runs associated with these keys. The API and UI of these asset-based are
likely to change, but feedback is welcome and will be used to inform these changes.emr_pyspark_step_launcher
that enables launching PySpark solids in EMR. The
"simple_pyspark" example demonstrates how it’s used.Bugfix
CompositeSolidResult
objects.Breaking Changes
DagsterInstance.launch_run
, this method now takes a run id
instead of an instance of PipelineRun
. Additionally, DagsterInstance.create_run
and
DagsterInstance.create_empty_run
have been replaced by DagsterInstance.get_or_create_run
and
DagsterInstance.create_run_for_pipeline
.RunLauncher
, there are two required changes:RunLauncher.launch_run
takes a pipeline run that has already been created. You should remove
any calls to instance.create_run
in this method.startPipelineExecution
(defined in the
dagster_graphql.client.query.START_PIPELINE_EXECUTION_MUTATION
) in the run launcher, you
should call startPipelineExecutionForCreatedRun
(defined in
dagster_graphql.client.query.START_PIPELINE_EXECUTION_FOR_CREATED_RUN_MUTATION
).RemoteDagitRunLauncher
for an example implementation.New
Bugfix
Documentation
Breaking Changes
execute_pipeline_with_mode
and execute_pipeline_with_preset
APIs have been dropped in
favor of new top level arguments to execute_pipeline
, mode
and preset
.RunConfig
to pass options to execute_pipeline
has been deprecated, and RunConfig
will be removed in 0.8.0.execute_solid_within_pipeline
and execute_solids_within_pipeline
APIs, intended to support
tests, now take new top level arguments mode
and preset
.New
Runs
view will apply that tag as a filter.Bugfix
Experimental
make_dagster_pipeline_from_airflow_dag
). This is in the early experimentation phase.Schedules
detailed view.Documentation
Breaking Changes
The default sqlite and dagster-postgres
implementations have been altered to extract the
event step_key
field as a column, to enable faster per-step queries. You will need to run
dagster instance migrate
to update the schema. You may optionally migrate your historical event
log data to extract the step_key
using the migrate_event_log_data
function. This will ensure
that your historical event log data will be captured in future step-key based views. This
event_log
data migration can be invoked as follows:
from dagster.core.storage.event_log.migration import migrate_event_log_data
from dagster import DagsterInstance
migrate_event_log_data(instance=DagsterInstance.get())
We have made pipeline metadata serializable and persist that along with run information.
While there are no user-facing features to leverage this yet, it does require an instance
migration. Run dagster instance migrate
. If you have already run the migration for the
event_log
changes above, you do not need to run it again. Any unforeseen errors related to the
new snapshot_id
in the runs
table or the new snapshots
table are related to this migration.
dagster-pandas ColumnTypeConstraint
has been removed in favor of ColumnDTypeFnConstraint
and
ColumnDTypeInSetConstraint
.
New
FileManager
machinery.PandasColumn
constructors now support pandas 1.0 dtypes.env:
to load from environment variables.Bugfix
dagit
would not populate tags specified on the pipeline
definition.Failure
was not displayed in the error modal in
dagit
.dagstermill.get_context()
outside of
the parameters cell of a dagstermill notebook could lead to unexpected behavior.Experimental
Schedule
tab for scheduled, partitioned pipelines.
Includes views of run status, execution time, and materializations across partitions. The UI is
in flux and is currently optimized for daily schedules, but feedback is welcome.Breaking Changes
default_value
in Field
no longer accepts native instances of python enums. Instead
the underlying string representation in the config system must be used.default_value
in Field
no longer accepts callables.dagster_aws
imports have been reorganized; you should now import resources from
dagster_aws.<AWS service name>
. dagster_aws
provides s3
, emr
, redshift
, and cloudwatch
modules.dagster_aws
S3 resource no longer attempts to model the underlying boto3 API, and you can
now just use any boto3 S3 API directly on a S3 resource, e.g.
context.resources.s3.list_objects_v2
. (#2292)New
Playground
view in dagit
showing an interactive config mapInputDefinition
dagster pipeline launch
to launch runs using a configured RunLauncher
pdb
utility to SolidExecutionContext
to help with debugging, available within a solid
as context.pdb
PresetDefinition.with_additional_config
to allow for config overridesBugfix
@weekly
partitioned schedule decoratordagstermill
--kernel
flag.dagster-dbt
dbt_solid
now has a Nothing
input to allow for sequencingdagster-k8s
get_celery_engine_config
to select celery engine, leveraging Celery infrastructureDocumentation
New
Added the IntSource
type, which lets integers be set from environment variables in config.
You may now set tags on pipeline definitions. These will resolve in the following cases:
execute_pipeline
api will create a run with the union
of pipeline tags and RunConfig
tags, with RunConfig
tags taking precedence.Output materialization configs may now yield multiple Materializations, and the tutorial has been updated to reflect this.
We now export the SolidExecutionContext
in the public API so that users can correctly type hint
solid compute functions.
Dagit
Bugfix
None
.threads_per_worker
on Dask distributed clusters.dagster-postgres
dagster-aws
s3_resource
now exposes a list_objects_v2
method corresponding to the underlying boto3
API. (Thanks, @basilvetas!)redshift_resource
to access Redshift databases.dagster-k8s
K8sRunLauncher
config now includes the load_kubeconfig
and kubeconfig_file
options.Documentation
Dependencies
Community
We've added opt-in telemetry to Dagster so we can collect usage statistics in order to inform development priorities. Telemetry data will motivate projects such as adding features in frequently-used parts of the CLI and adding more examples in the docs in areas where users encounter more errors.
We will not see or store solid definitions (including generated context) or pipeline definitions (including modes and resources). We will not see or store any data that is processed within solids and pipelines.
If you'd like to opt in to telemetry, please add the following to $DAGSTER_HOME/dagster.yaml
:
telemetry:
enabled: true
Thanks to @basilvetas and @hspak for their contributions!
New
dagster_postgres.PostgresScheduleStorage
on the instance.execute_pipeline_with_mode
API to allow executing a pipeline in test with a specific
mode without having to specify RunConfig
.--celery-base-priority
to dagster pipeline backfill
.@weekly
schedule decorator.Deprecations
dagster-ge
library has been removed from this release due to drift from the underlying
Great Expectations implementation.dagster-pandas
PandasColumn
now includes an is_optional
flag, replacing the previous
ColumnExistsConstraint
.ignore_missing_values flag
to PandasColumn
in order to apply column
constraints only to the non-missing rows in a column.dagster-k8s
Documentation
New
dagster-k8s
Helm chart.dagster-k8s
Helm chart.Bugfix
SourceString
.dagster schedule up
would fail in certain scenarios
with the SystemCronScheduler
.Pandas
Dagstermill
Docs
Experimental
RetryRequested
exception, was added.
This API is experimental and likely to change.Other
runtime_type
to dagster_type
in definitions. The following are deprecated
and will be removed in a future version.InputDefinition.runtime_type
is deprecated. Use InputDefinition.dagster_type
instead.OutputDefinition.runtime_type
is deprecated. Use OutputDefinition.dagster_type
instead.CompositeSolidDefinition.all_runtime_types
is deprecated. Use
CompositeSolidDefinition.all_dagster_types
instead.SolidDefinition.all_runtime_types
is deprecated. Use SolidDefinition.all_dagster_types
instead.PipelineDefinition.has_runtime_type
is deprecated. Use PipelineDefinition.has_dagster_type
instead.PipelineDefinition.runtime_type_named
is deprecated. Use
PipelineDefinition.dagster_type_named
instead.PipelineDefinition.all_runtime_types
is deprecated. Use
PipelineDefinition.all_dagster_types
instead.Docs
New
OutputDefinition
to take is_required
rather than is_optional
argument. This is to
remain consistent with changes to Field
in 0.7.1 and to avoid confusion
with python's typing and dagster's definition of Optional
, which indicates None-ability,
rather than existence. is_optional
is deprecated and will be removed in a future version.Bugfixes
Dagit
dagster_pandas
dagster_pandas
dataframes.dagster_aws
s3_resource
no longer uses an unsigned session by default.Bugfixes
Documentation
Breaking Changes
There are a substantial number of breaking changes in the 0.7.0 release.
Please see 070_MIGRATION.md
for instructions regarding migrating old code.
Scheduler
The scheduler configuration has been moved from the @schedules
decorator to DagsterInstance
.
Existing schedules that have been running are no longer compatible with current storage. To
migrate, remove the scheduler
argument on all @schedules
decorators:
instead of:
@schedules(scheduler=SystemCronScheduler)
def define_schedules():
...
Remove the scheduler
argument:
@schedules
def define_schedules():
...
Next, configure the scheduler on your instance by adding the following to
$DAGSTER_HOME/dagster.yaml
:
scheduler:
module: dagster_cron.cron_scheduler
class: SystemCronScheduler
Finally, if you had any existing schedules running, delete the existing $DAGSTER_HOME/schedules
directory and run dagster schedule wipe && dagster schedule up
to re-instatiate schedules in a
valid state.
The should_execute
and environment_dict_fn
argument to ScheduleDefinition
now have a
required first argument context
, representing the ScheduleExecutionContext
Config System Changes
In the config system, Dict
has been renamed to Shape
; List
to Array
; Optional
to
Noneable
; and PermissiveDict
to Permissive
. The motivation here is to clearly delineate
config use cases versus cases where you are using types as the inputs and outputs of solids as
well as python typing types (for mypy and friends). We believe this will be clearer to users in
addition to simplifying our own implementation and internal abstractions.
Our recommended fix is not to use Shape
and Array
, but instead to use our new condensed
config specification API. This allow one to use bare dictionaries instead of Shape
, lists with
one member instead of Array
, bare types instead of Field
with a single argument, and python
primitive types (int
, bool
etc) instead of the dagster equivalents. These result in
dramatically less verbose config specs in most cases.
So instead of
from dagster import Shape, Field, Int, Array, String
# ... code
config=Shape({ # Dict prior to change
'some_int' : Field(Int),
'some_list: Field(Array[String]) # List prior to change
})
one can instead write:
config={'some_int': int, 'some_list': [str]}
No imports and much simpler, cleaner syntax.
config_field
is no longer a valid argument on solid
, SolidDefinition
, ExecutorDefintion
,
executor
, LoggerDefinition
, logger
, ResourceDefinition
, resource
, system_storage
, and
SystemStorageDefinition
. Use config
instead.
For composite solids, the config_fn
no longer takes a ConfigMappingContext
, and the context
has been deleted. To upgrade, remove the first argument to config_fn
.
So instead of
@composite_solid(config={}, config_fn=lambda context, config: {})
one must instead write:
@composite_solid(config={}, config_fn=lambda config: {})
Field
takes a is_required
rather than a is_optional
argument. This is to avoid confusion
with python's typing and dagster's definition of Optional
, which indicates None-ability,
rather than existence. is_optional
is deprecated and will be removed in a future version.
Required Resources
All solids, types, and config functions that use a resource must explicitly list that
resource using the argument required_resource_keys
. This is to enable efficient
resource management during pipeline execution, especially in a multiprocessing or
remote execution environment.
The @system_storage
decorator now requires argument required_resource_keys
, which was
previously optional.
Dagster Type System Changes
dagster.Set
and dagster.Tuple
can no longer be used within the config system.DagsterType
, rather than a class than inherits from
RuntimeType
. Instead of dynamically generating a class to create a custom runtime type, just
create an instance of a DagsterType
. The type checking function is now an argument to the
DagsterType
, rather than an abstract method that has to be implemented in
a subclass.RuntimeType
has been renamed to DagsterType
is now an encouraged API for type creation.bool
in addition
to a TypeCheck
object.type_check_fn
on DagsterType
(formerly type_check
and RuntimeType
, respectively) now
takes a first argument context
of type TypeCheckContext
in addition to the second argument of
value
.define_python_dagster_type
has been eliminated in favor of PythonObjectDagsterType
.dagster_type
has been renamed to usable_as_dagster_type
.as_dagster_type
has been removed and similar capabilities added as
make_python_type_usable_as_dagster_type
.PythonObjectDagsterType
and usable_as_dagster_type
no longer take a type_check
argument. If
a custom type_check is needed, use DagsterType
.dagster_pyspark
or
dagster_pandas
and expecting Pyspark or Pandas types to work as Dagster types, e.g., in type
annotations to functions decorated with @solid
to indicate that they are input or output types
for a solid, you will need to call make_python_type_usable_as_dagster_type
from your code in
order to map the Python types to the Dagster types, or just use the Dagster types
(dagster_pandas.DataFrame
instead of pandas.DataFrame
) directly.Other
step_metadata_fn
has been removed from SolidDefinition
& @solid
.SolidDefinition
& @solid
now takes tags
and enforces that values are strings or
are safely encoded as JSON. metadata
is deprecated and will be removed in a future version.resource_mapper_fn
has been removed from SolidInvocation
.New
Dagit now includes a much richer execution view, with a Gantt-style visualization of step execution and a live timeline.
Early support for Python 3.8 is now available, and Dagster/Dagit along with many of our libraries are now tested against 3.8. Note that several of our upstream dependencies have yet to publish wheels for 3.8 on all platforms, so running on Python 3.8 likely still involves building some dependencies from source.
dagster/priority
tags can now be used to prioritize the order of execution for the built-in
in-process and multiprocess engines.
dagster-postgres
storages can now be configured with separate arguments and environment
variables, such as:
run_storage:
module: dagster_postgres.run_storage
class: PostgresRunStorage
config:
postgres_db:
username: test
password:
env: ENV_VAR_FOR_PG_PASSWORD
hostname: localhost
db_name: test
Support for RunLauncher
s on DagsterInstance
allows for execution to be "launched" outside of
the Dagit/Dagster process. As one example, this is used by dagster-k8s
to submit pipeline
execution as a Kubernetes Job.
Added support for adding tags to runs initiated from the Playground
view in Dagit.
Added @monthly_schedule
decorator.
Added Enum.from_python_enum
helper to wrap Python enums for config. (Thanks @kdungs!)
[dagster-bash] The Dagster bash solid factory now passes along kwargs
to the underlying
solid construction, and now has a single Nothing
input by default to make it easier to create a
sequencing dependency. Also, logs are now buffered by default to make execution less noisy.
[dagster-aws] We've improved our EMR support substantially in this release. The
dagster_aws.emr
library now provides an EmrJobRunner
with various utilities for creating EMR
clusters, submitting jobs, and waiting for jobs/logs. We also now provide a
emr_pyspark_resource
, which together with the new @pyspark_solid
decorator makes moving
pyspark execution from your laptop to EMR as simple as changing modes.
[dagster-pandas] Added create_dagster_pandas_dataframe_type
, PandasColumn
, and
Constraint
API's in order for users to create custom types which perform column validation,
dataframe validation, summary statistics emission, and dataframe serialization/deserialization.
[dagster-gcp] GCS is now supported for system storage, as well as being supported with the Dask executor. (Thanks @habibutsu!) Bigquery solids have also been updated to support the new API.
Bugfix
RunStorage
clean up pipeline run tags when a run
is deleted. Requires a storage migration, using dagster instance migrate
.@solid
and @lambda_solid
decorators now correctly wrap their decorated functions, in the
sense of functools.wraps
.dagster_k8s
has been hardened against various failure modes and is now
compatible with Helm 2.raise_error
flag in execute_pipeline
now actually raises user exceptions instead
of a wrapper type.Documentation
Thank you Thank you to all of the community contributors to this release!! In alphabetical order: @habibutsu, @kdungs, @vatervonacht, @zzztimbo.
Bugfix
Documentation
New
dagster-celery
Bugfix
@pyspark_solid
decorator now handles inputs correctly.Documentation
New
value
key.Breaking
dagster instance migrate
. To check
what event log storages you are using, run dagster instance info
.InputMapping
or OutputMapping
are now enforced.New
Dagit
solid_name+*
and +solid_name+
. When viewing
very large DAGs, nothing is displayed by default and *
produces the original behavior.dagster-aws
region_name
and endpoint_url
.dagster-gcp
dagster-postgres
dagster-pyspark
dagster-spark
spark_outputs
must now be specified when initializing a SparkSolidDefinition
, rather than in
config.create_spark_solid
helper and new spark_resource
.Bugfix
SolidExecutionResult
(e.g., in test) for
dagster-pyspark solids.dagster_ssh.sftp_solid
.Documentation
Thank you Thank you to all of the community contributors to this release!! In alphabetical order: @cclauss, @deem0n, @irabinovitch, @pseudoPixels, @Ramshackle-Jamathon, @rparrapy, @yamrzou.
Breaking
selector
argument to PipelineDefinition
has been removed. This API made it possible to
construct a PipelineDefinition
in an invalid state. Use PipelineDefinition.build_sub_pipeline
instead.New
dagster_prometheus
library, which exposes a basic Prometheus resource.Dagit
Bugfix
frozenlist
and frozendict
now pass Dagster's parameter type checks for list
and dict
.Nits
Breaking
type_check_fn
on a custom type was
required to return None (=passed) or else raise Failure
(=failed). Now, a type_check_fn
may
return True
/False
to indicate success/failure in the ordinary case, or else return a
TypeCheck
. The newsuccess
field on TypeCheck
now indicates success/failure. This obviates
the need for the typecheck_metadata_fn
, which has been removed.CompositeSolidExecutionResult
rather than a SolidExecutionResult
.dagster.core.storage.sqlite_run_storage.SqliteRunStorage
has moved to
dagster.core.storage.runs.SqliteRunStorage
. Any persisted dagster.yaml
files should be updated
with the new classpath.is_secret
has been removed from Field
. It was not being used to any effect.environmentType
and configTypes
fields have been removed from the dagster-graphql
Pipeline
type. The configDefinition
field on SolidDefinition
has been renamed to
configField
.Bugfix
PresetDefinition.from_files
is now guaranteed to give identical results across all Python
minor versions.DagsterKubernetesPodOperator
has been fixed.New
@pyspark_solid
decorator.Nits
features
in the dagster.yaml
will no longer have any effect.dagit
no longer prematurely returns control to terminal on Windowsraise_on_error
is now available on the execute_solid
test utilitycheck_dagster_type
added as a utility to help test type checks on custom typesSet
and Tuple
typesretryRunId
, stepKeys
execution parameters instead of a reexecutionConfig
input objectAdds a type_check
parameter to PythonObjectType
, as_dagster_type
, and @as_dagster_type
to
enable custom type checks in place of default isinstance
checks.
See documentation here:
https://dagster.readthedocs.io/en/latest/sections/learn/tutorial/types.html#custom-type-checks
Improved the type inference experience by automatically wrapping bare python types as dagster types.
Reworked our tutorial (now with more compelling/scary breakfast cereal examples) and public API documentation. See the new tutorial here: https://dagster.readthedocs.io/en/latest/sections/learn/tutorial/index.html
New solids explorer in Dagit allows you to browse and search for solids used across the repository.
Enabled solid dependency selection in the Dagit search filter.
+{solid_name}
.{solid_name}+
.+{solid_name}+
.Added a terminate button in Dagit to terminate an active run.
Added an --output
flag to dagster-graphql
CLI.
Added confirmation step for dagster run wipe
and dagster schedule wipe
commands (Thanks
@shahvineet98).
Fixed a wrong title in the dagster-snowflake
library README (Thanks @Step2Web).
@pipeline
and @composite_solid
to automatically give solids
aliases with an incrementing integer suffix when there are conflicts. This removes to the need
to manually alias solid definitions that are used multiple times.dagster schedule wipe
command to delete all schedules and remove all schedule cron jobsexecute_solid
test util now works on composite solids.--remote
flag to dagster-graphql
for querying remote Dagit servers.latest
on Docker Hub were erroneously
published with an older version of Dagster (#1814)dagster schedule start --start-all
command (#1812)dagster schedule restart
. Also added a
flag to restart all running schedules: dagster schedule restart --restart-all-running
.New
This major release includes features for scheduling, operating, and executing pipelines that elevate Dagit and dagster from a local development tool to a deployable service.
DagsterInstance
introduced as centralized system to control run, event, compute log,
and local intermediates storage.Scheduler
abstraction has been introduced along side an initial implementation of
SystemCronScheduler
in dagster-cron
.dagster-aws
has been extended with a CLI for deploying dagster to AWS. This can spin
up a Dagit node and all the supporting infrastructure—security group, RDS PostgreSQL
instance, etc.—without having to touch the AWS console, and for deploying your code
to that instance.Runs
: a completely overhauled Runs history page. Includes the ability to Retry
,
Cancel
, and Delete
pipeline runs from the new runs page.Scheduler
: a page for viewing and interacting with schedules.Compute Logs
: stdout and stderr are now viewable on a per execution step basis in each run.
This is available in real time for currently executing runs and for historical runs.Reload
button in the top right in Dagit restarts the web-server process and updates
the UI to reflect repo changes, including DAG structure, solid names, type names, etc.
This replaces the previous file system watching behavior.Breaking Changes
--log
and --log-dir
no longer supported as CLI args. Existing runs and events stored
via these flags are no longer compatible with current storage.raise_on_error
moved from in process executor config to argument to arguments in
python API methods such as execute_pipeline
ExecutorDefinition
and @executor
APIs are used to define in-process, multiprocess, and Dask executors, and may be used by users to
define new executors. Like loggers and storage, executors may be added to a ModeDefinition
and
may be selected and configured through the execution
field in the environment dict or YAML,
including through Dagit. Executors may no longer be configured through the RunConfig
.execute_pipeline
API, and the Dask executor is configured through the environment.
(See the dagster-dask README for details.)PresetDefinition.from_files
API for constructing a preset from a list of environment
files (replacing the old usage of this class). PresetDefinition
may now be directly
instantiated with an environment dict.--raise-on-error
or --no-raise-on-error
flag. Set this
option in executor config.MarkdownMetadataEntryData
class, so events yielded from client code may now render
markdown in their metadata.DAGIT_
, e.g. DAGIT_PORT
.valueRepr
field has been removed from ExecutionStepInputEvent
and
ExecutionStepOutputEvent
.define_python_dagster_type
function.metadata_fn
to typecheck_metadata_fn
in all runtime type creation APIs.result_value
and result_values
to output_value
and output_values
on
SolidExecutionResult
define_dagstermill_solid
, get_context
,
yield_event
, yield_result
, DagstermillExecutionContext
, DagstermillError
, and
DagstermillExecutionError
. Please see the new
guide
for details.