Tune Documentation

Introduction

Tune is an abstraction layer for general parameter tuning. It is built on Fugue so it can seamlessly run on any backend supported by Fugue, such as Spark, Dask and local.

Installation

pip install tune

It’s recommended to also install Scikit-Learn (for all compatible models tuning) and Hyperopt (to enable Bayesian Optimization

pip install tune[hyperopt,sklearn]

Quick Start

To quickly start, please go through these tutorials on Kaggle:

  1. Search Space

  2. Non-iterative Problems, such as Scikit-Learn model tuning

  3. Iterative Problems, such as Keras model tuning

Design Philosophy

Tune does not follow Scikit-Learn’s model selection APIs and does not provide distributed backend for it. We believe that parameter tuning is a general problem that is not only for machine learning, so our abstractions are built from ground up, the lower level APIs do not assume the objective is a machine learning model, while the higher level APIs are dedicated to solve specific problems, such as Scikit-Learn compatible model tuning and Keras model tuning.

Although we didn’t base our solution on any of HyperOpt, Optuna, Ray Tune and Nevergrad etc., we are truly inspired by these wonderful solutions and their design. We also integrated with many of them for deeper level optimizations.

Tuning problems are never easy, here are our goals:

  • Provide the simplest and most intuitive APIs for major tuning cases. We always start from real tuning cases, figure out the minimal requirement for each of them and then determine the layers of abstraction. Read this tutorial, you can see how minimal the interfaces can be.

  • Be scale agnostic and platform agnostic. We want you to worry less about distributed computing, and just focus on the tuning logic itself. Built on Fugue, Tune let you develop your tuning process iteratively. You can test with small spaces on local machine, and then switch to larger spaces and run distributedly with no code change. It can effectively save time and cost and make the process fun and rewarding. And to run any tuning logic distributedly, you only need a core framework itself (Spark, Dask, etc.) and you do not need a database, a queue service or even an embeded cluster.

  • Be highly extendable and flexible on lower level. For example

    • you can extend on Fugue level, for example create an execution engine for Prefect to run the tuning jobs as a Prefect workflow

    • you can integrate third party optimizers and use Tune just as a distributed orchestrator.

    • you can start external instances (e.g. EC2 instances) for different training subtasks and to fully utilize your cloud

    • you can combine with distributed training as long as your have enough compute resource

Current Focuses

Here are our current focuses:

  • A flexible space design and can describe a hybrid space of grid search, random search and second level optimization such as bayesian optimization

  • Integrate with 3rd party tuning frameworks. We have integrated HyperOpt and Optuna. And Nevergrad is on the way.

  • Create generalized and distributed versions of Successive Halving, Hyperband and Asynchronous Successive Halving.

Collaboration

We are looking for collaborators, if you are interested, please let us know.

Please join our Slack channel.

Top Level API Reference

The Space Concept

Space

class Space(*args, **kwargs)[source]

Bases: object

Search space object

Important

Please read Space Tutorial.

Parameters

kwargs (Any) – parameters in the search space

Space(a=1, b=1)  # static space
Space(a=1, b=Grid(1,2), c=Grid("a", "b"))  # grid search
Space(a=1, b=Grid(1,2), c=Rand(0, 1))  # grid search + level 2 search
Space(a=1, b=Grid(1,2), c=Rand(0, 1)).sample(10, sedd=0)  # grid + random search

# union
Space(a=1, b=Grid(2,3)) + Space(b=Rand(1,5)).sample(10)

# cross product
Space(a=1, b=Grid(2,3)) * Space(c=Rand(1,5), d=Grid("a","b"))

# combo (grid + random + level 2)
space1 = Space(a=1, b=Grid(2,4))
space2 = Space(b=RandInt(10, 20))
space3 = Space(c=Rand(0,1)).sample(10)
space = (space1 + space2) * space3
assert Space(a=1, b=Rand(0,1)).has_stochastic
assert not Space(a=1, b=Rand(0,1)).sample(10).has_stochastic
assert not Space(a=1, b=Grid(0,1)).has_stochastic
assert not Space(a=1, b=1).has_stochastic

# get all configurations
space = Space(a=Grid(2,4), b=Rand(0,1)).sample(100)
for conf in space:
    print(conf)
all_conf = list(space)
property has_stochastic

Whether the space contains any StochasticExpression

sample(n, seed=None)[source]

Draw random samples from the current space. Please read Space Tutorial.

Parameters
  • n (int) – number of samples to draw

  • seed (Optional[Any]) – random seed, defaults to None

Returns

a new Space containing all samples

Return type

tune.concepts.space.spaces.Space

Note

TuningParametersTemplate

class TuningParametersTemplate(raw)[source]

Bases: object

Parameter template to extract tuning parameter expressions from nested data structure

Parameters

raw (Dict[str, Any]) – the dictionary of input parameters.

Note

Please use to_template() to initialize this class.

# common cases
to_template(dict(a=1, b=1))
to_template(dict(a=Rand(0, 1), b=1))

# expressions may nest in dicts or arrays
template = to_template(
    dict(a=dict(x1=Rand(0, 1), x2=Rand(3,4)), b=[Grid("a", "b")]))

assert [Rand(0, 1), Rand(3, 4), Grid("a", "b")] == template.params
assert dict(
    p0=Rand(0, 1), p1=Rand(3, 4), p2=Grid("a", "b")
) == template.params_dict
assert dict(a=1, x2=3), b=["a"]) == template.fill([1, 3, "a"])
assert dict(a=1, x2=3), b=["a"]) == template.fill_dict(
    dict(p2="a", p1=3, p0=1)
)
concat(other)[source]

Concatenate with another template and generate a new template.

Note

The other template must not have any key existed in this template, otherwise ValueError will be raised

Returns

the merged template

Parameters

other (tune.concepts.space.parameters.TuningParametersTemplate) –

Return type

tune.concepts.space.parameters.TuningParametersTemplate

static decode(data)[source]

Retrieve the template from a base64 string

Parameters

data (str) –

Return type

tune.concepts.space.parameters.TuningParametersTemplate

property empty: bool

Whether the template contains any tuning expression

encode()[source]

Convert the template to a base64 string

Return type

str

fill(params)[source]

Fill the original data structure with values

Parameters
  • params (List[Any]) – the list of values to be filled into the original data structure, in depth-first order

  • copy – whether to return a deeply copied paramters, defaults to False

Returns

the original data structure filled with values

Return type

Dict[str, Any]

fill_dict(params)[source]

Fill the original data structure with dictionary of values

Parameters
  • params (Dict[str, Any]) – the dictionary of values to be filled into the original data structure, keys must be p0, p1, p2, …

  • copy – whether to return a deeply copied paramters, defaults to False

Returns

the original data structure filled with values

Return type

Dict[str, Any]

property has_grid: bool

Whether the template contains grid expressions

property has_stochastic: bool

Whether the template contains stochastic expressions

property params: List[tune.concepts.space.parameters.TuningParameterExpression]

Get all tuning parameter expressions in depth-first order

property params_dict: Dict[str, tune.concepts.space.parameters.TuningParameterExpression]

Get all tuning parameter expressions in depth-first order, with correspondent made-up new keys p0, p1, p2, …

product_grid()[source]

cross product all grid parameters

Yield

new templates with the grid paramters filled

Return type

Iterable[tune.concepts.space.parameters.TuningParametersTemplate]

assert [dict(a=1,b=Rand(0,1)), dict(a=2,b=Rand(0,1))] ==                 list(to_template(dict(a=Grid(1,2),b=Rand(0,1))).product_grid())
sample(n, seed=None)[source]

sample all stochastic parameters

Parameters
  • n (int) – number of samples, must be a positive integer

  • seed (Optional[Any]) – random seed defaulting to None. It will take effect if it is not None.

Yield

new templates with the grid paramters filled

Return type

Iterable[tune.concepts.space.parameters.TuningParametersTemplate]

assert [dict(a=1.1,b=Grid(0,1)), dict(a=1.5,b=Grid(0,1))] ==                 list(to_template(dict(a=Rand(1,2),b=Grid(0,1))).sample(2,0))
property simple_value: Dict[str, Any]

If the template contains no tuning expression, it’s simple and it will return parameters dictionary, otherwise, ValueError will be raised

property template: Dict[str, Any]

The template dictionary, all tuning expressions will be replaced by None

Grid

class Grid(*args)[source]

Bases: tune.concepts.space.parameters.TuningParameterExpression

Grid search, every value will be used. Please read Space Tutorial.

Parameters

args (Any) – values for the grid search

Choice

class Choice(*args)[source]

Bases: tune.concepts.space.parameters.StochasticExpression

A random choice of values. Please read Space Tutorial.

Parameters

args (Any) – values to choose from

generate(seed=None)[source]

Return a randomly chosen value.

Parameters

seed (Optional[Any]) – if set, it will be used to call seed() , defaults to None

Return type

Any

property jsondict: Dict[str, Any]

Dict representation of the expression that is json serializable

property values: List[Any]

values to choose from

TransitionChoice

class TransitionChoice(*args)[source]

Bases: tune.concepts.space.parameters.Choice

An ordered random choice of values. Please read Space Tutorial.

Parameters

args (Any) – values to choose from

property jsondict: Dict[str, Any]

Dict representation of the expression that is json serializable

Rand

class Rand(low, high, q=None, log=False, include_high=True)[source]

Bases: tune.concepts.space.parameters.RandBase

Continuous uniform random variables. Please read Space Tutorial.

Parameters
  • low (float) – range low bound (inclusive)

  • high (float) – range high bound (exclusive)

  • q (Optional[float]) – step between adjacent values, if set, the value will be rounded using q, defaults to None

  • log (bool) – whether to do uniform sampling in log space, defaults to False. If True, low must be positive and lower values get higher chance to be sampled

  • include_high (bool) –

generate(seed=None)[source]

Return a randomly chosen value.

Parameters

seed (Optional[Any]) – if set, it will be used to call seed() , defaults to None

Return type

float

property jsondict: Dict[str, Any]

Dict representation of the expression that is json serializable

RandInt

class RandInt(low, high, q=1, log=False, include_high=True)[source]

Bases: tune.concepts.space.parameters.RandBase

Uniform distributed random integer values. Please read Space Tutorial.

Parameters
  • low (int) – range low bound (inclusive)

  • high (int) – range high bound (exclusive)

  • log (bool) – whether to do uniform sampling in log space, defaults to False. If True, low must be >=1 and lower values get higher chance to be sampled

  • q (int) –

  • include_high (bool) –

generate(seed=None)[source]

Return a randomly chosen value.

Parameters

seed (Optional[Any]) – if set, it will be used to call seed() , defaults to None

Return type

float

property jsondict: Dict[str, Any]

Dict representation of the expression that is json serializable

General Non-Iterative Problems

suggest_for_noniterative_objective(objective, space, df=None, df_name='__tune__df_', temp_path='', partition_keys=None, top_n=1, local_optimizer=None, logger=None, monitor=None, stopper=None, stop_check_interval=None, distributed=None, shuffle_candidates=True, execution_engine=None, execution_engine_conf=None)[source]

Given non-iterative objective, space and (optional) dataframe, suggest the best parameter combinations.

Important

Please read Non-Iterative Tuning Guide

Parameters
  • objective (Any) – a simple python function or NonIterativeObjectiveFunc compatible object, please read Non-Iterative Objective Explained

  • space (tune.concepts.space.spaces.Space) – search space, please read Space Tutorial

  • df (Optional[Any]) – Pandas, Spark, Dask or any dataframe that can be converted to Fugue DataFrame, defaults to None

  • df_name (str) – dataframe name, defaults to the value of TUNE_DATASET_DF_DEFAULT_NAME

  • temp_path (str) – temp path for serialized dataframe partitions. It can be empty if you preset using TUNE_OBJECT_FACTORY.set_temp_path(). For details, read TuneDataset Tutorial, defaults to “”

  • partition_keys (Optional[List[str]]) – partition keys for df, defaults to None. For details, please read TuneDataset Tutorial

  • top_n (int) – number of best results to return, defaults to 1. If <=0 all results will be returned

  • local_optimizer (Optional[Any]) – an object that can be converted to NonIterativeObjectiveLocalOptimizer, please read Non-Iterative Optimizers, defaults to None

  • logger (Optional[Any]) – MetricLogger object or a function producing it, defaults to None

  • monitor (Optional[Any]) – realtime monitor, defaults to None. Read Monitoring Guide

  • stopper (Optional[Any]) – early stopper, defaults to None. Read Early Stopping Guide

  • stop_check_interval (Optional[Any]) – an object that can be converted to timedelta, defaults to None. For details, read to_timedelta()

  • distributed (Optional[bool]) – whether to use the exeuction engine to run different trials distributedly, defaults to None. If None, it’s equal to True.

  • shuffle_candidates (bool) – whether to shuffle the candidate configurations, defaults to True. This is no effect on final result.

  • execution_engine (Optional[Any]) – Fugue ExecutionEngine like object, defaults to None. If None, NativeExecutionEngine will be used, the task will be running on local machine.

  • execution_engine_conf (Optional[Any]) – Parameters like object, defaults to None

Returns

a list of best results

Return type

List[tune.concepts.flow.report.TrialReport]

optimize_noniterative(objective, dataset, optimizer=None, distributed=None, logger=None, monitor=None, stopper=None, stop_check_interval=None)[source]
Parameters
  • objective (Any) –

  • dataset (tune.concepts.dataset.TuneDataset) –

  • optimizer (Optional[Any]) –

  • distributed (Optional[bool]) –

  • logger (Optional[Any]) –

  • monitor (Optional[Any]) –

  • stopper (Optional[Any]) –

  • stop_check_interval (Optional[Any]) –

Return type

tune.concepts.dataset.StudyResult

Level 2 Optimizers

Hyperopt

class HyperoptLocalOptimizer(max_iter, seed=0, kwargs_func=None)[source]

Bases: tune.noniterative.objective.NonIterativeObjectiveLocalOptimizer

Parameters
run(func, trial, logger)[source]
Parameters
Return type

tune.concepts.flow.report.TrialReport

Optuna

class OptunaLocalOptimizer(max_iter, create_study=None)[source]

Bases: tune.noniterative.objective.NonIterativeObjectiveLocalOptimizer

Parameters
  • max_iter (int) –

  • create_study (Optional[Callable[[], optuna.study.study.Study]]) –

run(func, trial, logger)[source]
Parameters
Return type

tune.concepts.flow.report.TrialReport

General Iterative Problems

Successive Halving

suggest_by_sha(objective, space, plan, train_df=None, temp_path='', partition_keys=None, top_n=1, monitor=None, distributed=None, execution_engine=None, execution_engine_conf=None)[source]
Parameters
  • objective (Any) –

  • space (tune.concepts.space.spaces.Space) –

  • plan (List[Tuple[float, int]]) –

  • train_df (Optional[Any]) –

  • temp_path (str) –

  • partition_keys (Optional[List[str]]) –

  • top_n (int) –

  • monitor (Optional[Any]) –

  • distributed (Optional[bool]) –

  • execution_engine (Optional[Any]) –

  • execution_engine_conf (Optional[Any]) –

Return type

List[tune.concepts.flow.report.TrialReport]

optimize_by_sha(objective, dataset, plan, checkpoint_path='', distributed=None, monitor=None)[source]
Parameters
  • objective (Any) –

  • dataset (tune.concepts.dataset.TuneDataset) –

  • plan (List[Tuple[float, int]]) –

  • checkpoint_path (str) –

  • distributed (Optional[bool]) –

  • monitor (Optional[Any]) –

Return type

tune.concepts.dataset.StudyResult

Hyperband

suggest_by_hyperband(objective, space, plans, train_df=None, temp_path='', partition_keys=None, top_n=1, monitor=None, distributed=None, execution_engine=None, execution_engine_conf=None)[source]
Parameters
  • objective (Any) –

  • space (tune.concepts.space.spaces.Space) –

  • plans (List[List[Tuple[float, int]]]) –

  • train_df (Optional[Any]) –

  • temp_path (str) –

  • partition_keys (Optional[List[str]]) –

  • top_n (int) –

  • monitor (Optional[Any]) –

  • distributed (Optional[bool]) –

  • execution_engine (Optional[Any]) –

  • execution_engine_conf (Optional[Any]) –

Return type

List[tune.concepts.flow.report.TrialReport]

optimize_by_hyperband(objective, dataset, plans, checkpoint_path='', distributed=None, monitor=None)[source]
Parameters
  • objective (Any) –

  • dataset (tune.concepts.dataset.TuneDataset) –

  • plans (List[List[Tuple[float, int]]]) –

  • checkpoint_path (str) –

  • distributed (Optional[bool]) –

  • monitor (Optional[Any]) –

Return type

tune.concepts.dataset.StudyResult

Continuous ASHA

suggest_by_continuous_asha(objective, space, plan, train_df=None, temp_path='', partition_keys=None, top_n=1, monitor=None, execution_engine=None, execution_engine_conf=None)[source]
Parameters
  • objective (Any) –

  • space (tune.concepts.space.spaces.Space) –

  • plan (List[Tuple[float, int]]) –

  • train_df (Optional[Any]) –

  • temp_path (str) –

  • partition_keys (Optional[List[str]]) –

  • top_n (int) –

  • monitor (Optional[Any]) –

  • execution_engine (Optional[Any]) –

  • execution_engine_conf (Optional[Any]) –

Return type

List[tune.concepts.flow.report.TrialReport]

optimize_by_continuous_asha(objective, dataset, plan, checkpoint_path='', always_checkpoint=False, study_early_stop=None, trial_early_stop=None, monitor=None)[source]
Parameters
Return type

tune.concepts.dataset.StudyResult

For Scikit-Learn

sk_space(model, **params)[source]
Parameters
  • model (str) –

  • params (Dict[str, Any]) –

Return type

tune.concepts.space.spaces.Space

suggest_sk_models_by_cv(space, train_df, scoring, cv=5, temp_path='', feature_prefix='', label_col='label', save_model=False, partition_keys=None, top_n=1, local_optimizer=None, monitor=None, stopper=None, stop_check_interval=None, distributed=None, execution_engine=None, execution_engine_conf=None)[source]
Parameters
  • space (tune.concepts.space.spaces.Space) –

  • train_df (Any) –

  • scoring (str) –

  • cv (int) –

  • temp_path (str) –

  • feature_prefix (str) –

  • label_col (str) –

  • save_model (bool) –

  • partition_keys (Optional[List[str]]) –

  • top_n (int) –

  • local_optimizer (Optional[tune.noniterative.objective.NonIterativeObjectiveLocalOptimizer]) –

  • monitor (Optional[Any]) –

  • stopper (Optional[Any]) –

  • stop_check_interval (Optional[Any]) –

  • distributed (Optional[bool]) –

  • execution_engine (Optional[Any]) –

  • execution_engine_conf (Optional[Any]) –

Return type

List[tune.concepts.flow.report.TrialReport]

suggest_sk_models(space, train_df, test_df, scoring, temp_path='', feature_prefix='', label_col='label', save_model=False, partition_keys=None, top_n=1, local_optimizer=None, monitor=None, stopper=None, stop_check_interval=None, distributed=None, execution_engine=None, execution_engine_conf=None)[source]
Parameters
  • space (tune.concepts.space.spaces.Space) –

  • train_df (Any) –

  • test_df (Any) –

  • scoring (str) –

  • temp_path (str) –

  • feature_prefix (str) –

  • label_col (str) –

  • save_model (bool) –

  • partition_keys (Optional[List[str]]) –

  • top_n (int) –

  • local_optimizer (Optional[tune.noniterative.objective.NonIterativeObjectiveLocalOptimizer]) –

  • monitor (Optional[Any]) –

  • stopper (Optional[Any]) –

  • stop_check_interval (Optional[Any]) –

  • distributed (Optional[bool]) –

  • execution_engine (Optional[Any]) –

  • execution_engine_conf (Optional[Any]) –

Return type

List[tune.concepts.flow.report.TrialReport]

For Tensorflow Keras

class KerasTrainingSpec(params, dfs)[source]

Bases: object

Parameters
  • params (Any) –

  • dfs (Dict[str, Any]) –

compile_model(**add_kwargs)[source]
Parameters

add_kwargs (Any) –

Return type

keras.engine.training.Model

compute_sort_metric(**add_kwargs)[source]
Parameters

add_kwargs (Any) –

Return type

float

property dfs: Dict[str, Any]
finalize()[source]
Return type

None

fit(**add_kwargs)[source]
Parameters

add_kwargs (Any) –

Return type

keras.callbacks.History

generate_sort_metric(metric)[source]
Parameters

metric (float) –

Return type

float

get_compile_params()[source]
Return type

Dict[str, Any]

get_fit_metric(history)[source]
Parameters

history (keras.callbacks.History) –

Return type

float

get_fit_params()[source]
Return type

Tuple[List[Any], Dict[str, Any]]

get_model()[source]
Return type

keras.engine.training.Model

load_checkpoint(fs, model)[source]
Parameters
  • fs (fs.base.FS) –

  • model (keras.engine.training.Model) –

Return type

None

property params: tune.concepts.space.parameters.TuningParametersTemplate
save_checkpoint(fs, model)[source]
Parameters
  • fs (fs.base.FS) –

  • model (keras.engine.training.Model) –

Return type

None

keras_space(model, **params)[source]
Parameters
  • model (Any) –

  • params (Any) –

Return type

tune.concepts.space.spaces.Space

suggest_keras_models_by_continuous_asha(space, plan, train_df=None, temp_path='', partition_keys=None, top_n=1, monitor=None, execution_engine=None, execution_engine_conf=None)[source]
Parameters
  • space (tune.concepts.space.spaces.Space) –

  • plan (List[Tuple[float, int]]) –

  • train_df (Optional[Any]) –

  • temp_path (str) –

  • partition_keys (Optional[List[str]]) –

  • top_n (int) –

  • monitor (Optional[Any]) –

  • execution_engine (Optional[Any]) –

  • execution_engine_conf (Optional[Any]) –

Return type

List[tune.concepts.flow.report.TrialReport]

suggest_keras_models_by_hyperband(space, plans, train_df=None, temp_path='', partition_keys=None, top_n=1, monitor=None, distributed=None, execution_engine=None, execution_engine_conf=None)[source]
Parameters
  • space (tune.concepts.space.spaces.Space) –

  • plans (List[List[Tuple[float, int]]]) –

  • train_df (Optional[Any]) –

  • temp_path (str) –

  • partition_keys (Optional[List[str]]) –

  • top_n (int) –

  • monitor (Optional[Any]) –

  • distributed (Optional[bool]) –

  • execution_engine (Optional[Any]) –

  • execution_engine_conf (Optional[Any]) –

Return type

List[tune.concepts.flow.report.TrialReport]

suggest_keras_models_by_sha(space, plan, train_df=None, temp_path='', partition_keys=None, top_n=1, monitor=None, distributed=None, execution_engine=None, execution_engine_conf=None)[source]
Parameters
  • space (tune.concepts.space.spaces.Space) –

  • plan (List[Tuple[float, int]]) –

  • train_df (Optional[Any]) –

  • temp_path (str) –

  • partition_keys (Optional[List[str]]) –

  • top_n (int) –

  • monitor (Optional[Any]) –

  • distributed (Optional[bool]) –

  • execution_engine (Optional[Any]) –

  • execution_engine_conf (Optional[Any]) –

Return type

List[tune.concepts.flow.report.TrialReport]

Complete API Reference

tune

tune.api

tune.api.factory
class TuneObjectFactory[source]

Bases: object

get_path_or_temp(path)[source]
Parameters

path (str) –

Return type

str

make_dataset(dag, dataset, df=None, df_name='__tune__df_', test_df=None, test_df_name='__tune__df__validation_', partition_keys=None, shuffle=True, temp_path='')[source]
Parameters
  • dag (fugue.workflow.workflow.FugueWorkflow) –

  • dataset (Any) –

  • df (Optional[Any]) –

  • df_name (str) –

  • test_df (Optional[Any]) –

  • test_df_name (str) –

  • partition_keys (Optional[List[str]]) –

  • shuffle (bool) –

  • temp_path (str) –

Return type

tune.concepts.dataset.TuneDataset

set_temp_path(path)[source]
Parameters

path (str) –

Return type

None

tune.api.optimize
optimize_by_continuous_asha(objective, dataset, plan, checkpoint_path='', always_checkpoint=False, study_early_stop=None, trial_early_stop=None, monitor=None)[source]
Parameters
Return type

tune.concepts.dataset.StudyResult

optimize_by_hyperband(objective, dataset, plans, checkpoint_path='', distributed=None, monitor=None)[source]
Parameters
  • objective (Any) –

  • dataset (tune.concepts.dataset.TuneDataset) –

  • plans (List[List[Tuple[float, int]]]) –

  • checkpoint_path (str) –

  • distributed (Optional[bool]) –

  • monitor (Optional[Any]) –

Return type

tune.concepts.dataset.StudyResult

optimize_by_sha(objective, dataset, plan, checkpoint_path='', distributed=None, monitor=None)[source]
Parameters
  • objective (Any) –

  • dataset (tune.concepts.dataset.TuneDataset) –

  • plan (List[Tuple[float, int]]) –

  • checkpoint_path (str) –

  • distributed (Optional[bool]) –

  • monitor (Optional[Any]) –

Return type

tune.concepts.dataset.StudyResult

optimize_noniterative(objective, dataset, optimizer=None, distributed=None, logger=None, monitor=None, stopper=None, stop_check_interval=None)[source]
Parameters
  • objective (Any) –

  • dataset (tune.concepts.dataset.TuneDataset) –

  • optimizer (Optional[Any]) –

  • distributed (Optional[bool]) –

  • logger (Optional[Any]) –

  • monitor (Optional[Any]) –

  • stopper (Optional[Any]) –

  • stop_check_interval (Optional[Any]) –

Return type

tune.concepts.dataset.StudyResult

tune.api.suggest
suggest_by_continuous_asha(objective, space, plan, train_df=None, temp_path='', partition_keys=None, top_n=1, monitor=None, execution_engine=None, execution_engine_conf=None)[source]
Parameters
  • objective (Any) –

  • space (tune.concepts.space.spaces.Space) –

  • plan (List[Tuple[float, int]]) –

  • train_df (Optional[Any]) –

  • temp_path (str) –

  • partition_keys (Optional[List[str]]) –

  • top_n (int) –

  • monitor (Optional[Any]) –

  • execution_engine (Optional[Any]) –

  • execution_engine_conf (Optional[Any]) –

Return type

List[tune.concepts.flow.report.TrialReport]

suggest_by_hyperband(objective, space, plans, train_df=None, temp_path='', partition_keys=None, top_n=1, monitor=None, distributed=None, execution_engine=None, execution_engine_conf=None)[source]
Parameters
  • objective (Any) –

  • space (tune.concepts.space.spaces.Space) –

  • plans (List[List[Tuple[float, int]]]) –

  • train_df (Optional[Any]) –

  • temp_path (str) –

  • partition_keys (Optional[List[str]]) –

  • top_n (int) –

  • monitor (Optional[Any]) –

  • distributed (Optional[bool]) –

  • execution_engine (Optional[Any]) –

  • execution_engine_conf (Optional[Any]) –

Return type

List[tune.concepts.flow.report.TrialReport]

suggest_by_sha(objective, space, plan, train_df=None, temp_path='', partition_keys=None, top_n=1, monitor=None, distributed=None, execution_engine=None, execution_engine_conf=None)[source]
Parameters
  • objective (Any) –

  • space (tune.concepts.space.spaces.Space) –

  • plan (List[Tuple[float, int]]) –

  • train_df (Optional[Any]) –

  • temp_path (str) –

  • partition_keys (Optional[List[str]]) –

  • top_n (int) –

  • monitor (Optional[Any]) –

  • distributed (Optional[bool]) –

  • execution_engine (Optional[Any]) –

  • execution_engine_conf (Optional[Any]) –

Return type

List[tune.concepts.flow.report.TrialReport]

suggest_for_noniterative_objective(objective, space, df=None, df_name='__tune__df_', temp_path='', partition_keys=None, top_n=1, local_optimizer=None, logger=None, monitor=None, stopper=None, stop_check_interval=None, distributed=None, shuffle_candidates=True, execution_engine=None, execution_engine_conf=None)[source]

Given non-iterative objective, space and (optional) dataframe, suggest the best parameter combinations.

Important

Please read Non-Iterative Tuning Guide

Parameters
  • objective (Any) – a simple python function or NonIterativeObjectiveFunc compatible object, please read Non-Iterative Objective Explained

  • space (tune.concepts.space.spaces.Space) – search space, please read Space Tutorial

  • df (Optional[Any]) – Pandas, Spark, Dask or any dataframe that can be converted to Fugue DataFrame, defaults to None

  • df_name (str) – dataframe name, defaults to the value of TUNE_DATASET_DF_DEFAULT_NAME

  • temp_path (str) – temp path for serialized dataframe partitions. It can be empty if you preset using TUNE_OBJECT_FACTORY.set_temp_path(). For details, read TuneDataset Tutorial, defaults to “”

  • partition_keys (Optional[List[str]]) – partition keys for df, defaults to None. For details, please read TuneDataset Tutorial

  • top_n (int) – number of best results to return, defaults to 1. If <=0 all results will be returned

  • local_optimizer (Optional[Any]) – an object that can be converted to NonIterativeObjectiveLocalOptimizer, please read Non-Iterative Optimizers, defaults to None

  • logger (Optional[Any]) – |LoggerLikeObject|, defaults to None

  • monitor (Optional[Any]) – realtime monitor, defaults to None. Read Monitoring Guide

  • stopper (Optional[Any]) – early stopper, defaults to None. Read Early Stopping Guide

  • stop_check_interval (Optional[Any]) – an object that can be converted to timedelta, defaults to None. For details, read to_timedelta()

  • distributed (Optional[bool]) – whether to use the exeuction engine to run different trials distributedly, defaults to None. If None, it’s equal to True.

  • shuffle_candidates (bool) – whether to shuffle the candidate configurations, defaults to True. This is no effect on final result.

  • execution_engine (Optional[Any]) – Fugue ExecutionEngine like object, defaults to None. If None, NativeExecutionEngine will be used, the task will be running on local machine.

  • execution_engine_conf (Optional[Any]) – Parameters like object, defaults to None

Returns

a list of best results

Return type

List[tune.concepts.flow.report.TrialReport]

tune.concepts

tune.concepts.flow
tune.concepts.flow.judge
class Monitor[source]

Bases: object

finalize()[source]
Return type

None

initialize()[source]
Return type

None

on_get_budget(trial, rung, budget)[source]
Parameters
Return type

None

on_judge(decision)[source]
Parameters

decision (tune.concepts.flow.judge.TrialDecision) –

Return type

None

on_report(report)[source]
Parameters

report (tune.concepts.flow.report.TrialReport) –

Return type

None

class NoOpTrailJudge(monitor=None)[source]

Bases: tune.concepts.flow.judge.TrialJudge

Parameters

monitor (Optional[Monitor]) –

can_accept(trial)[source]
Parameters

trial (tune.concepts.flow.trial.Trial) –

Return type

bool

get_budget(trial, rung)[source]
Parameters
Return type

float

judge(report)[source]
Parameters

report (tune.concepts.flow.report.TrialReport) –

Return type

tune.concepts.flow.judge.TrialDecision

class RemoteTrialJudge(entrypoint)[source]

Bases: tune.concepts.flow.judge.TrialJudge

Parameters

entrypoint (Callable[[str, Dict[str, Any]], Any]) –

can_accept(trial)[source]
Parameters

trial (tune.concepts.flow.trial.Trial) –

Return type

bool

get_budget(trial, rung)[source]
Parameters
Return type

float

judge(report)[source]
Parameters

report (tune.concepts.flow.report.TrialReport) –

Return type

tune.concepts.flow.judge.TrialDecision

property report: Optional[tune.concepts.flow.report.TrialReport]
class TrialCallback(judge)[source]

Bases: object

Parameters

judge (tune.concepts.flow.judge.TrialJudge) –

entrypoint(name, kwargs)[source]
Parameters

kwargs (Dict[str, Any]) –

Return type

Any

class TrialDecision(report, budget, should_checkpoint, reason='', metadata=None)[source]

Bases: object

Parameters
property budget: float
property metadata: Dict[str, Any]
property reason: str
property report: tune.concepts.flow.report.TrialReport
property should_checkpoint: bool
property should_stop: bool
property trial: tune.concepts.flow.trial.Trial
property trial_id: str
class TrialJudge(monitor=None)[source]

Bases: object

Parameters

monitor (Optional[Monitor]) –

can_accept(trial)[source]
Parameters

trial (tune.concepts.flow.trial.Trial) –

Return type

bool

get_budget(trial, rung)[source]
Parameters
Return type

float

judge(report)[source]
Parameters

report (tune.concepts.flow.report.TrialReport) –

Return type

tune.concepts.flow.judge.TrialDecision

property monitor: tune.concepts.flow.judge.Monitor
reset_monitor(monitor=None)[source]
Parameters

monitor (Optional[tune.concepts.flow.judge.Monitor]) –

Return type

None

tune.concepts.flow.report
class TrialReport(trial, metric, params=None, metadata=None, cost=1.0, rung=0, sort_metric=None, log_time=None)[source]

Bases: object

The result from running the objective. It is immutable.

Parameters
  • trial (tune.concepts.flow.trial.Trial) – the original trial sent to the objective

  • metric (Any) – the raw metric from the objective output

  • params (Any) – updated parameters based on the trial input, defaults to None. If none, it means the params from the trial was not updated, otherwise it is an object convertible to TuningParametersTemplate by to_template()

  • metadata (Optional[Dict[str, Any]]) – metadata from the objective output, defaults to None

  • cost (float) – cost to run the objective, defaults to 1.0

  • rung (int) – number of rungs in the current objective, defaults to 0. This is for iterative problems

  • sort_metric (Any) – the metric for comparison, defaults to None. It must be smaller better. If not set, it implies the metric is sort_metric and it is smaller better

  • log_time (Any) – the time generating this report, defaults to None. If None, current time will be used

Attention

This class is not for users to construct directly.

copy()[source]

Copy the current object.

Returns

the copied object

Return type

tune.concepts.flow.report.TrialReport

Note

This is shallow copy, but it is also used by __deepcopy__ of this object. This is because we disable deepcopy of TrialReport.

property cost: float

The cost to run the objective

fill_dict(data)[source]

Fill a row of StudyResult with the report information

Parameters

data (Dict[str, Any]) – a row (as dict) from StudyResult

Returns

the updated data

Return type

Dict[str, Any]

generate_sort_metric(min_better, digits)[source]

Construct a new report object with the new derived``sort_metric``

Parameters
  • min_better (bool) – whether the current metric() is smaller better

  • digits (int) – number of digits to keep in sort_metric

Returns

a new object with the updated value

Return type

tune.concepts.flow.report.TrialReport

property log_time: datetime.datetime

The time generating this report

property metadata: Dict[str, Any]

The metadata from the objective output

property metric: float

The raw metric from the objective output

property params: tune.concepts.space.parameters.TuningParametersTemplate

The parameters used by the objective to generate the metric()

reset_log_time()[source]

Reset log_time() to now

Return type

tune.concepts.flow.report.TrialReport

property rung: int

The number of rungs in the current objective, defaults to 0. This is for iterative problems

property sort_metric: float

The metric for comparison

property trial: tune.concepts.flow.trial.Trial

The original trial sent to the objective

property trial_id: str

tune.concepts.flow.trial.Trial.trial_id()

with_cost(cost)[source]

Construct a new report object with the new cost

Parameters

cost (float) – new cost

Returns

a new object with the updated value

Return type

tune.concepts.flow.report.TrialReport

with_rung(rung)[source]

Construct a new report object with the new rung

Parameters

rung (int) – new rung

Returns

a new object with the updated value

Return type

tune.concepts.flow.report.TrialReport

with_sort_metric(sort_metric)[source]

Construct a new report object with the new sort_metric

Parameters

sort_metric (Any) – new sort_metric

Returns

a new object with the updated value

Return type

tune.concepts.flow.report.TrialReport

class TrialReportHeap(min_heap)[source]

Bases: object

Parameters

min_heap (bool) –

pop()[source]
Return type

tune.concepts.flow.report.TrialReport

push(report)[source]
Parameters

report (tune.concepts.flow.report.TrialReport) –

Return type

None

values()[source]
Return type

Iterable[tune.concepts.flow.report.TrialReport]

class TrialReportLogger(new_best_only=False)[source]

Bases: object

Parameters

new_best_only (bool) –

property best: Optional[tune.concepts.flow.report.TrialReport]
log(report)[source]
Parameters

report (tune.concepts.flow.report.TrialReport) –

Return type

None

on_report(report)[source]
Parameters

report (tune.concepts.flow.report.TrialReport) –

Return type

bool

tune.concepts.flow.trial
class Trial(trial_id, params, metadata=None, keys=None, dfs=None)[source]

Bases: object

The input data collection for running an objective. It is immutable.

Parameters
  • trial_id (str) – the unique id for a trial

  • params (Any) – parameters for tuning, an object convertible to TuningParametersTemplate by to_template()

  • metadata (Optional[Dict[str, Any]]) – metadata for tuning, defaults to None. It is set during the construction of TuneDataset

  • keys (Optional[List[str]]) – partitions keys of the TuneDataset, defaults to None

  • dfs (Optional[Dict[str, Any]]) – dataframes extracted from TuneDataset, defaults to None

Attention

This class is not for users to construct directly. Use Space instead.

copy()[source]

Copy the current object.

Returns

the copied object

Return type

tune.concepts.flow.trial.Trial

Note

This is shallow copy, but it is also used by __deepcopy__ of this object. This is because we disable deepcopy of Trial.

property dfs: Dict[str, Any]

Dataframes extracted from TuneDataset

property keys: List[str]

Partitions keys of the TuneDataset

property metadata: Dict[str, Any]

Metadata of the trial

property params: tune.concepts.space.parameters.TuningParametersTemplate

Parameters for tuning

property trial_id: str

The unique id of this trial

with_dfs(dfs)[source]

Set dataframes for the trial, a new Trial object will be constructed and with the new dfs

Parameters

dfs (Dict[str, Any]) – dataframes to attach to the trial

Return type

tune.concepts.flow.trial.Trial

with_params(params)[source]

Set parameters for the trial, a new Trial object will be constructed and with the new params

Parameters

params (Any) – parameters for tuning

Return type

tune.concepts.flow.trial.Trial

tune.concepts.space
tune.concepts.space.parameters
class Choice(*args)[source]

Bases: tune.concepts.space.parameters.StochasticExpression

A random choice of values. Please read Space Tutorial.

Parameters

args (Any) – values to choose from

generate(seed=None)[source]

Return a randomly chosen value.

Parameters

seed (Optional[Any]) – if set, it will be used to call seed() , defaults to None

Return type

Any

property jsondict: Dict[str, Any]

Dict representation of the expression that is json serializable

property values: List[Any]

values to choose from

class FuncParam(func, *args, **kwargs)[source]

Bases: object

Function paramter. It defers the function call after all its parameters are no longer tuning parameters

Parameters
  • func (Callable) – function to generate parameter value

  • args (Any) – list arguments

  • kwargs (Any) – key-value arguments

s = Space(a=1, b=FuncParam(lambda x, y: x + y, x=Grid(0, 1), y=Grid(3, 4)))
assert [
    dict(a=1, b=3),
    dict(a=1, b=4),
    dict(a=1, b=4),
    dict(a=1, b=5),
] == list(s)
class Grid(*args)[source]

Bases: tune.concepts.space.parameters.TuningParameterExpression

Grid search, every value will be used. Please read Space Tutorial.

Parameters

args (Any) – values for the grid search

class NormalRand(mu, sigma, q=None)[source]

Bases: tune.concepts.space.parameters.RandBase

Continuous normally distributed random variables. Please read Space Tutorial.

Parameters
  • mu (float) – mean of the normal distribution

  • sigma (float) – standard deviation of the normal distribution

  • q (Optional[float]) – step between adjacent values, if set, the value will be rounded using q, defaults to None

generate(seed=None)[source]

Return a randomly chosen value.

Parameters

seed (Optional[Any]) – if set, it will be used to call seed() , defaults to None

Return type

float

property jsondict: Dict[str, Any]

Dict representation of the expression that is json serializable

class NormalRandInt(mu, sigma, q=1)[source]

Bases: tune.concepts.space.parameters.RandBase

Normally distributed random integer values. Please read Space Tutorial.

Parameters
  • mu (int) – mean of the normal distribution

  • sigma (float) – standard deviation of the normal distribution

  • q (int) –

generate(seed=None)[source]

Return a randomly chosen value.

Parameters

seed (Optional[Any]) – if set, it will be used to call seed() , defaults to None

Return type

int

property jsondict: Dict[str, Any]

Dict representation of the expression that is json serializable

class Rand(low, high, q=None, log=False, include_high=True)[source]

Bases: tune.concepts.space.parameters.RandBase

Continuous uniform random variables. Please read Space Tutorial.

Parameters
  • low (float) – range low bound (inclusive)

  • high (float) – range high bound (exclusive)

  • q (Optional[float]) – step between adjacent values, if set, the value will be rounded using q, defaults to None

  • log (bool) – whether to do uniform sampling in log space, defaults to False. If True, low must be positive and lower values get higher chance to be sampled

  • include_high (bool) –

generate(seed=None)[source]

Return a randomly chosen value.

Parameters

seed (Optional[Any]) – if set, it will be used to call seed() , defaults to None

Return type

float

property jsondict: Dict[str, Any]

Dict representation of the expression that is json serializable

class RandBase(q=None, log=False)[source]

Bases: tune.concepts.space.parameters.StochasticExpression

Base class for continuous random variables. Please read Space Tutorial.

Parameters
  • q (Optional[float]) – step between adjacent values, if set, the value will be rounded using q, defaults to None

  • log (bool) – whether to do uniform sampling in log space, defaults to False. If True, lower values get higher chance to be sampled

class RandInt(low, high, q=1, log=False, include_high=True)[source]

Bases: tune.concepts.space.parameters.RandBase

Uniform distributed random integer values. Please read Space Tutorial.

Parameters
  • low (int) – range low bound (inclusive)

  • high (int) – range high bound (exclusive)

  • log (bool) – whether to do uniform sampling in log space, defaults to False. If True, low must be >=1 and lower values get higher chance to be sampled

  • q (int) –

  • include_high (bool) –

generate(seed=None)[source]

Return a randomly chosen value.

Parameters

seed (Optional[Any]) – if set, it will be used to call seed() , defaults to None

Return type

float

property jsondict: Dict[str, Any]

Dict representation of the expression that is json serializable

class StochasticExpression[source]

Bases: tune.concepts.space.parameters.TuningParameterExpression

Stochastic search base class. Please read Space Tutorial.

generate(seed=None)[source]

Return a randomly chosen value.

Parameters

seed (Optional[Any]) – if set, it will be used to call seed() , defaults to None

Return type

Any

generate_many(n, seed=None)[source]

Generate n randomly chosen values

Parameters
  • n (int) – number of random values to generate

  • seed (Optional[Any]) – random seed, defaults to None

Returns

a list of values

Return type

List[Any]

property jsondict: Dict[str, Any]

Dict representation of the expression that is json serializable

class TransitionChoice(*args)[source]

Bases: tune.concepts.space.parameters.Choice

An ordered random choice of values. Please read Space Tutorial.

Parameters

args (Any) – values to choose from

property jsondict: Dict[str, Any]

Dict representation of the expression that is json serializable

class TuningParameterExpression[source]

Bases: object

Base class of all tuning parameter expressions

class TuningParametersTemplate(raw)[source]

Bases: object

Parameter template to extract tuning parameter expressions from nested data structure

Parameters

raw (Dict[str, Any]) – the dictionary of input parameters.

Note

Please use to_template() to initialize this class.

# common cases
to_template(dict(a=1, b=1))
to_template(dict(a=Rand(0, 1), b=1))

# expressions may nest in dicts or arrays
template = to_template(
    dict(a=dict(x1=Rand(0, 1), x2=Rand(3,4)), b=[Grid("a", "b")]))

assert [Rand(0, 1), Rand(3, 4), Grid("a", "b")] == template.params
assert dict(
    p0=Rand(0, 1), p1=Rand(3, 4), p2=Grid("a", "b")
) == template.params_dict
assert dict(a=1, x2=3), b=["a"]) == template.fill([1, 3, "a"])
assert dict(a=1, x2=3), b=["a"]) == template.fill_dict(
    dict(p2="a", p1=3, p0=1)
)
concat(other)[source]

Concatenate with another template and generate a new template.

Note

The other template must not have any key existed in this template, otherwise ValueError will be raised

Returns

the merged template

Parameters

other (tune.concepts.space.parameters.TuningParametersTemplate) –

Return type

tune.concepts.space.parameters.TuningParametersTemplate

static decode(data)[source]

Retrieve the template from a base64 string

Parameters

data (str) –

Return type

tune.concepts.space.parameters.TuningParametersTemplate

property empty: bool

Whether the template contains any tuning expression

encode()[source]

Convert the template to a base64 string

Return type

str

fill(params)[source]

Fill the original data structure with values

Parameters
  • params (List[Any]) – the list of values to be filled into the original data structure, in depth-first order

  • copy – whether to return a deeply copied paramters, defaults to False

Returns

the original data structure filled with values

Return type

Dict[str, Any]

fill_dict(params)[source]

Fill the original data structure with dictionary of values

Parameters
  • params (Dict[str, Any]) – the dictionary of values to be filled into the original data structure, keys must be p0, p1, p2, …

  • copy – whether to return a deeply copied paramters, defaults to False

Returns

the original data structure filled with values

Return type

Dict[str, Any]

property has_grid: bool

Whether the template contains grid expressions

property has_stochastic: bool

Whether the template contains stochastic expressions

property params: List[tune.concepts.space.parameters.TuningParameterExpression]

Get all tuning parameter expressions in depth-first order

property params_dict: Dict[str, tune.concepts.space.parameters.TuningParameterExpression]

Get all tuning parameter expressions in depth-first order, with correspondent made-up new keys p0, p1, p2, …

product_grid()[source]

cross product all grid parameters

Yield

new templates with the grid paramters filled

Return type

Iterable[tune.concepts.space.parameters.TuningParametersTemplate]

assert [dict(a=1,b=Rand(0,1)), dict(a=2,b=Rand(0,1))] ==                 list(to_template(dict(a=Grid(1,2),b=Rand(0,1))).product_grid())
sample(n, seed=None)[source]

sample all stochastic parameters

Parameters
  • n (int) – number of samples, must be a positive integer

  • seed (Optional[Any]) – random seed defaulting to None. It will take effect if it is not None.

Yield

new templates with the grid paramters filled

Return type

Iterable[tune.concepts.space.parameters.TuningParametersTemplate]

assert [dict(a=1.1,b=Grid(0,1)), dict(a=1.5,b=Grid(0,1))] ==                 list(to_template(dict(a=Rand(1,2),b=Grid(0,1))).sample(2,0))
property simple_value: Dict[str, Any]

If the template contains no tuning expression, it’s simple and it will return parameters dictionary, otherwise, ValueError will be raised

property template: Dict[str, Any]

The template dictionary, all tuning expressions will be replaced by None

to_template(data)[source]

Convert an oject to TuningParametersTemplate

Parameters

data (Any) – data object (dict or TuningParametersTemplate or str (encoded string))

Returns

the template object

Return type

tune.concepts.space.parameters.TuningParametersTemplate

tune.concepts.space.spaces
class Space(*args, **kwargs)[source]

Bases: object

Search space object

Important

Please read Space Tutorial.

Parameters

kwargs (Any) – parameters in the search space

Space(a=1, b=1)  # static space
Space(a=1, b=Grid(1,2), c=Grid("a", "b"))  # grid search
Space(a=1, b=Grid(1,2), c=Rand(0, 1))  # grid search + level 2 search
Space(a=1, b=Grid(1,2), c=Rand(0, 1)).sample(10, sedd=0)  # grid + random search

# union
Space(a=1, b=Grid(2,3)) + Space(b=Rand(1,5)).sample(10)

# cross product
Space(a=1, b=Grid(2,3)) * Space(c=Rand(1,5), d=Grid("a","b"))

# combo (grid + random + level 2)
space1 = Space(a=1, b=Grid(2,4))
space2 = Space(b=RandInt(10, 20))
space3 = Space(c=Rand(0,1)).sample(10)
space = (space1 + space2) * space3
assert Space(a=1, b=Rand(0,1)).has_stochastic
assert not Space(a=1, b=Rand(0,1)).sample(10).has_stochastic
assert not Space(a=1, b=Grid(0,1)).has_stochastic
assert not Space(a=1, b=1).has_stochastic

# get all configurations
space = Space(a=Grid(2,4), b=Rand(0,1)).sample(100)
for conf in space:
    print(conf)
all_conf = list(space)
property has_stochastic

Whether the space contains any StochasticExpression

sample(n, seed=None)[source]

Draw random samples from the current space. Please read Space Tutorial.

Parameters
  • n (int) – number of samples to draw

  • seed (Optional[Any]) – random seed, defaults to None

Returns

a new Space containing all samples

Return type

tune.concepts.space.spaces.Space

Note

tune.concepts.checkpoint
class Checkpoint(fs)[source]

Bases: object

An abstraction for tuning checkpoint

Parameters

fs (fs.base.FS) – the file system

Attention

Normally you don’t need to create a checkpoint by yourself, please read Checkpoint Tutorial if you want to understand how it works.

create()[source]

Create a new checkpoint

Return type

tune.concepts.checkpoint.NewCheckpoint

property latest: fs.base.FS

latest checkpoint folder

Raises

AssertionError – if there was no checkpoint

class NewCheckpoint(checkpoint)[source]

Bases: object

A helper class for adding new checkpoints

Parameters

checkpoint (tune.concepts.checkpoint.Checkpoint) – the parent checkpoint

Attention

Do not construct this class directly, please read Checkpoint Tutorial for details

tune.concepts.dataset
class StudyResult(dataset, result)[source]

Bases: object

A collection of the input TuneDataset and the tuning result

Parameters

Attention

Do not construct this class directly.

next_tune_dataset(best_n=0)[source]

Convert the result back to a new TuneDataset to be used by the next steps.

Parameters

best_n (int) – top n result to extract, defaults to 0 (entire result)

Returns

a new dataset for tuning

Return type

tune.concepts.dataset.TuneDataset

result(best_n=0)[source]

Get the top n results sorted by tune.concepts.flow.report.TrialReport.sort_metric()

Parameters

best_n (int) – number of result to get, defaults to 0. if <=0 then it will return the entire result

Returns

result subset

Return type

fugue.workflow.workflow.WorkflowDataFrame

union_with(other)[source]

Union with another result set and update itself

Parameters

other (tune.concepts.dataset.StudyResult) – the other result dataset

Return type

None

Note

This method also removes duplicated reports based on tune.concepts.flow.trial.Trial.trial_id(). Each trial will have only the best report in the updated result

class TuneDataset(data, dfs, keys)[source]

Bases: object

A Fugue WorkflowDataFrame with metadata representing all dataframes required for a tuning task.

Parameters

Attention

Do not construct this class directly, please read TuneDataset Tutorial to find the right way

property data: fugue.workflow.workflow.WorkflowDataFrame

the Fugue WorkflowDataFrame containing all required dataframes

property dfs: List[str]

All dataframe names (you can also find them part of the column names of data() )

property keys: List[str]

Partition keys (columns) of data()

split(weights, seed)[source]

Split the dataset randomly to small partitions. This is useful for some algorithms such as Hyperband, because it needs different subset to run successive halvings with different parameters.

Parameters
  • weights (List[float]) – a list of numeric values. The length represents the number of splitd partitions, and the values represents the proportion of each partition

  • seed (Any) – random seed for the split

Returns

a list of sub-datasets

Return type

List[tune.concepts.dataset.TuneDataset]

# randomly split the data to two partitions 25% and 75%
dataset.split([1, 3], seed=0)
# same because weights will be normalized
dataset.split([10, 30], seed=0)
class TuneDatasetBuilder(space, path='')[source]

Bases: object

Builder of TuneDataset, for details please read TuneDataset Tutorial

Parameters
add_df(name, df, how='')[source]

Add a dataframe to the dataset

Parameters
  • name (str) – name of the dataframe, it will also create a __tune_df__<name> column in the dataset dataframe

  • df (fugue.workflow.workflow.WorkflowDataFrame) – the dataframe to add.

  • how (str) – join type, can accept semi, left_semi, anti, left_anti, inner, left_outer, right_outer, full_outer, cross

Returns

the builder itself

Return type

tune.concepts.dataset.TuneDatasetBuilder

Note

For the first dataframe you add, how should be empty. From the second dataframe you add, how must be set.

Note

If df is prepartitioned, the partition key will be used to join with the added dataframes. Read TuneDataset Tutorial for more details

add_dfs(dfs, how='')[source]

Add multiple dataframes with the same join type

Parameters
  • dfs (fugue.workflow.workflow.WorkflowDataFrames) – dictionary like dataframe collection. The keys will be used as the dataframe names

  • how (str) – join type, can accept semi, left_semi, anti, left_anti, inner, left_outer, right_outer, full_outer, cross

Returns

the builder itself

Return type

tune.concepts.dataset.TuneDatasetBuilder

build(wf, batch_size=1, shuffle=True, trial_metadata=None)[source]

Build TuneDataset, for details please read TuneDataset Tutorial

Parameters
  • wf (fugue.workflow.workflow.FugueWorkflow) – the workflow associated with the dataset

  • batch_size (int) – how many configurations as a batch, defaults to 1

  • shuffle (bool) – whether to shuffle the entire dataset, defaults to True. This is to make the tuning process more even, it will look better. It should have slight benefit on speed, no effect on result.

  • trial_metadata (Optional[Dict[str, Any]]) – metadata to pass to each Trial, defaults to None

Returns

the dataset for tuning

Return type

tune.concepts.dataset.TuneDataset

tune.iterative

tune.iterative.asha
class ASHAJudge(schedule, always_checkpoint=False, study_early_stop=None, trial_early_stop=None, monitor=None)[source]

Bases: tune.concepts.flow.judge.TrialJudge

Parameters
property always_checkpoint: bool
can_accept(trial)[source]
Parameters

trial (tune.concepts.flow.trial.Trial) –

Return type

bool

get_budget(trial, rung)[source]
Parameters
Return type

float

judge(report)[source]
Parameters

report (tune.concepts.flow.report.TrialReport) –

Return type

tune.concepts.flow.judge.TrialDecision

property schedule: List[Tuple[float, int]]
class RungHeap(n)[source]

Bases: object

Parameters

n (int) –

property best: float
property bests: List[float]
property capacity: int
property full: bool
push(report)[source]
Parameters

report (tune.concepts.flow.report.TrialReport) –

Return type

bool

values()[source]
Return type

Iterable[tune.concepts.flow.report.TrialReport]

tune.iterative.objective
class IterativeObjectiveFunc[source]

Bases: object

copy()[source]
Return type

tune.iterative.objective.IterativeObjectiveFunc

property current_trial: tune.concepts.flow.trial.Trial
finalize()[source]
Return type

None

generate_sort_metric(value)[source]
Parameters

value (float) –

Return type

float

initialize()[source]
Return type

None

load_checkpoint(fs)[source]
Parameters

fs (fs.base.FS) –

Return type

None

run(trial, judge, checkpoint_basedir_fs)[source]
Parameters
Return type

None

run_single_iteration()[source]
Return type

tune.concepts.flow.report.TrialReport

run_single_rung(budget)[source]
Parameters

budget (float) –

Return type

tune.concepts.flow.report.TrialReport

property rung: int
save_checkpoint(fs)[source]
Parameters

fs (fs.base.FS) –

Return type

None

validate_iterative_objective(func, trial, budgets, validator, continuous=False, checkpoint_path='', monitor=None)[source]
Parameters
Return type

None

tune.iterative.sha
tune.iterative.study
class IterativeStudy(objective, checkpoint_path)[source]

Bases: object

Parameters
optimize(dataset, judge)[source]
Parameters
Return type

tune.concepts.dataset.StudyResult

tune.noniterative

tune.noniterative.convert
noniterative_objective(func=None, min_better=True)[source]
Parameters
  • func (Optional[Callable]) –

  • min_better (bool) –

Return type

Callable[[Any], tune.noniterative.objective.NonIterativeObjectiveFunc]

to_noniterative_objective(obj, min_better=True, global_vars=None, local_vars=None)[source]
Parameters
  • obj (Any) –

  • min_better (bool) –

  • global_vars (Optional[Dict[str, Any]]) –

  • local_vars (Optional[Dict[str, Any]]) –

Return type

tune.noniterative.objective.NonIterativeObjectiveFunc

tune.noniterative.objective
class NonIterativeObjectiveFunc[source]

Bases: object

generate_sort_metric(value)[source]
Parameters

value (float) –

Return type

float

run(trial)[source]
Parameters

trial (tune.concepts.flow.trial.Trial) –

Return type

tune.concepts.flow.report.TrialReport

safe_run(trial)[source]
Parameters

trial (tune.concepts.flow.trial.Trial) –

Return type

tune.concepts.flow.report.TrialReport

class NonIterativeObjectiveLocalOptimizer[source]

Bases: object

property distributable: bool
run(func, trial, logger)[source]
Parameters
Return type

tune.concepts.flow.report.TrialReport

run_monitored_process(func, trial, stop_checker, logger, interval='60sec')[source]
Parameters
Return type

tune.concepts.flow.report.TrialReport

validate_noniterative_objective(func, trial, validator, optimizer=None, logger=None)[source]
Parameters
Return type

None

tune.noniterative.stopper
class NonIterativeStopper(log_best_only=False)[source]

Bases: tune.concepts.flow.judge.TrialJudge

Parameters

log_best_only (bool) –

can_accept(trial)[source]
Parameters

trial (tune.concepts.flow.trial.Trial) –

Return type

bool

get_reports(trial)[source]
Parameters

trial (tune.concepts.flow.trial.Trial) –

Return type

List[tune.concepts.flow.report.TrialReport]

judge(report)[source]
Parameters

report (tune.concepts.flow.report.TrialReport) –

Return type

tune.concepts.flow.judge.TrialDecision

on_report(report)[source]
Parameters

report (tune.concepts.flow.report.TrialReport) –

Return type

bool

should_stop(trial)[source]
Parameters

trial (tune.concepts.flow.trial.Trial) –

Return type

bool

property updated: bool
class NonIterativeStopperCombiner(left, right, is_and)[source]

Bases: tune.noniterative.stopper.NonIterativeStopper

Parameters
get_reports(trial)[source]
Parameters

trial (tune.concepts.flow.trial.Trial) –

Return type

List[tune.concepts.flow.report.TrialReport]

on_report(report)[source]
Parameters

report (tune.concepts.flow.report.TrialReport) –

Return type

bool

should_stop(trial)[source]
Parameters

trial (tune.concepts.flow.trial.Trial) –

Return type

bool

class SimpleNonIterativeStopper(partition_should_stop, log_best_only=False)[source]

Bases: tune.noniterative.stopper.NonIterativeStopper

Parameters
on_report(report)[source]
Parameters

report (tune.concepts.flow.report.TrialReport) –

Return type

bool

should_stop(trial)[source]
Parameters

trial (tune.concepts.flow.trial.Trial) –

Return type

bool

class TrialReportCollection(new_best_only=False)[source]

Bases: tune.concepts.flow.report.TrialReportLogger

Parameters

new_best_only (bool) –

log(report)[source]
Parameters

report (tune.concepts.flow.report.TrialReport) –

Return type

None

property reports: List[tune.concepts.flow.report.TrialReport]
n_samples(n)[source]
Parameters

n (int) –

Return type

tune.noniterative.stopper.SimpleNonIterativeStopper

n_updates(n)[source]
Parameters

n (int) –

Return type

tune.noniterative.stopper.SimpleNonIterativeStopper

no_update_period(period)[source]
Parameters

period (Any) –

Return type

tune.noniterative.stopper.SimpleNonIterativeStopper

small_improvement(threshold, updates)[source]
Parameters
  • threshold (float) –

  • updates (int) –

Return type

tune.noniterative.stopper.SimpleNonIterativeStopper

tune.noniterative.study
class NonIterativeStudy(objective, optimizer)[source]

Bases: object

Parameters
optimize(dataset, distributed=None, monitor=None, stopper=None, stop_check_interval=None, logger=None)[source]
Parameters
Return type

tune.concepts.dataset.StudyResult

tune.constants

tune.exceptions

exception TuneCompileError[source]

Bases: fugue.exceptions.FugueWorkflowCompileError

exception TuneInterrupted[source]

Bases: tune.exceptions.TuneRuntimeError

exception TuneRuntimeError[source]

Bases: fugue.exceptions.FugueWorkflowRuntimeError

tune_hyperopt

tune_hyperopt.optimizer

class HyperoptLocalOptimizer(max_iter, seed=0, kwargs_func=None)[source]

Bases: tune.noniterative.objective.NonIterativeObjectiveLocalOptimizer

Parameters
run(func, trial, logger)[source]
Parameters
Return type

tune.concepts.flow.report.TrialReport

tune_optuna

tune_optuna.optimizer

class OptunaLocalOptimizer(max_iter, create_study=None)[source]

Bases: tune.noniterative.objective.NonIterativeObjectiveLocalOptimizer

Parameters
  • max_iter (int) –

  • create_study (Optional[Callable[[], optuna.study.study.Study]]) –

run(func, trial, logger)[source]
Parameters
Return type

tune.concepts.flow.report.TrialReport

tune_sklearn

tune_sklearn.objective

class SKCVObjective(scoring, cv=5, feature_prefix='', label_col='label', checkpoint_path=None)[source]

Bases: tune_sklearn.objective.SKObjective

Parameters
  • scoring (Any) –

  • cv (int) –

  • feature_prefix (str) –

  • label_col (str) –

  • checkpoint_path (Optional[str]) –

Return type

None

run(trial)[source]
Parameters

trial (tune.concepts.flow.trial.Trial) –

Return type

tune.concepts.flow.report.TrialReport

class SKObjective(scoring, feature_prefix='', label_col='label', checkpoint_path=None)[source]

Bases: tune.noniterative.objective.NonIterativeObjectiveFunc

Parameters
  • scoring (Any) –

  • feature_prefix (str) –

  • label_col (str) –

  • checkpoint_path (Optional[str]) –

Return type

None

generate_sort_metric(value)[source]
Parameters

value (float) –

Return type

float

run(trial)[source]
Parameters

trial (tune.concepts.flow.trial.Trial) –

Return type

tune.concepts.flow.report.TrialReport

tune_sklearn.suggest

suggest_sk_models(space, train_df, test_df, scoring, temp_path='', feature_prefix='', label_col='label', save_model=False, partition_keys=None, top_n=1, local_optimizer=None, monitor=None, stopper=None, stop_check_interval=None, distributed=None, execution_engine=None, execution_engine_conf=None)[source]
Parameters
  • space (tune.concepts.space.spaces.Space) –

  • train_df (Any) –

  • test_df (Any) –

  • scoring (str) –

  • temp_path (str) –

  • feature_prefix (str) –

  • label_col (str) –

  • save_model (bool) –

  • partition_keys (Optional[List[str]]) –

  • top_n (int) –

  • local_optimizer (Optional[tune.noniterative.objective.NonIterativeObjectiveLocalOptimizer]) –

  • monitor (Optional[Any]) –

  • stopper (Optional[Any]) –

  • stop_check_interval (Optional[Any]) –

  • distributed (Optional[bool]) –

  • execution_engine (Optional[Any]) –

  • execution_engine_conf (Optional[Any]) –

Return type

List[tune.concepts.flow.report.TrialReport]

suggest_sk_models_by_cv(space, train_df, scoring, cv=5, temp_path='', feature_prefix='', label_col='label', save_model=False, partition_keys=None, top_n=1, local_optimizer=None, monitor=None, stopper=None, stop_check_interval=None, distributed=None, execution_engine=None, execution_engine_conf=None)[source]
Parameters
  • space (tune.concepts.space.spaces.Space) –

  • train_df (Any) –

  • scoring (str) –

  • cv (int) –

  • temp_path (str) –

  • feature_prefix (str) –

  • label_col (str) –

  • save_model (bool) –

  • partition_keys (Optional[List[str]]) –

  • top_n (int) –

  • local_optimizer (Optional[tune.noniterative.objective.NonIterativeObjectiveLocalOptimizer]) –

  • monitor (Optional[Any]) –

  • stopper (Optional[Any]) –

  • stop_check_interval (Optional[Any]) –

  • distributed (Optional[bool]) –

  • execution_engine (Optional[Any]) –

  • execution_engine_conf (Optional[Any]) –

Return type

List[tune.concepts.flow.report.TrialReport]

tune_sklearn.utils

sk_space(model, **params)[source]
Parameters
  • model (str) –

  • params (Dict[str, Any]) –

Return type

tune.concepts.space.spaces.Space

to_sk_model(obj)[source]
Parameters

obj (Any) –

Return type

Type

to_sk_model_expr(model)[source]
Parameters

model (Any) –

Return type

Any

tune_tensorflow

tune_tensorflow.objective

class KerasObjective(type_dict)[source]

Bases: tune.iterative.objective.IterativeObjectiveFunc

Parameters

type_dict (Dict[str, Type[tune_tensorflow.spec.KerasTrainingSpec]]) –

Return type

None

copy()[source]
Return type

tune_tensorflow.objective.KerasObjective

finalize()[source]
Return type

None

generate_sort_metric(value)[source]
Parameters

value (float) –

Return type

float

initialize()[source]
Return type

None

load_checkpoint(fs)[source]
Parameters

fs (fs.base.FS) –

Return type

None

property model: keras.engine.training.Model
run_single_rung(budget)[source]
Parameters

budget (float) –

Return type

tune.concepts.flow.report.TrialReport

save_checkpoint(fs)[source]
Parameters

fs (fs.base.FS) –

Return type

None

property spec: tune_tensorflow.spec.KerasTrainingSpec

tune_tensorflow.spec

class KerasTrainingSpec(params, dfs)[source]

Bases: object

Parameters
  • params (Any) –

  • dfs (Dict[str, Any]) –

compile_model(**add_kwargs)[source]
Parameters

add_kwargs (Any) –

Return type

keras.engine.training.Model

compute_sort_metric(**add_kwargs)[source]
Parameters

add_kwargs (Any) –

Return type

float

property dfs: Dict[str, Any]
finalize()[source]
Return type

None

fit(**add_kwargs)[source]
Parameters

add_kwargs (Any) –

Return type

keras.callbacks.History

generate_sort_metric(metric)[source]
Parameters

metric (float) –

Return type

float

get_compile_params()[source]
Return type

Dict[str, Any]

get_fit_metric(history)[source]
Parameters

history (keras.callbacks.History) –

Return type

float

get_fit_params()[source]
Return type

Tuple[List[Any], Dict[str, Any]]

get_model()[source]
Return type

keras.engine.training.Model

load_checkpoint(fs, model)[source]
Parameters
  • fs (fs.base.FS) –

  • model (keras.engine.training.Model) –

Return type

None

property params: tune.concepts.space.parameters.TuningParametersTemplate
save_checkpoint(fs, model)[source]
Parameters
  • fs (fs.base.FS) –

  • model (keras.engine.training.Model) –

Return type

None

tune_tensorflow.suggest

suggest_keras_models_by_continuous_asha(space, plan, train_df=None, temp_path='', partition_keys=None, top_n=1, monitor=None, execution_engine=None, execution_engine_conf=None)[source]
Parameters
  • space (tune.concepts.space.spaces.Space) –

  • plan (List[Tuple[float, int]]) –

  • train_df (Optional[Any]) –

  • temp_path (str) –

  • partition_keys (Optional[List[str]]) –

  • top_n (int) –

  • monitor (Optional[Any]) –

  • execution_engine (Optional[Any]) –

  • execution_engine_conf (Optional[Any]) –

Return type

List[tune.concepts.flow.report.TrialReport]

suggest_keras_models_by_hyperband(space, plans, train_df=None, temp_path='', partition_keys=None, top_n=1, monitor=None, distributed=None, execution_engine=None, execution_engine_conf=None)[source]
Parameters
  • space (tune.concepts.space.spaces.Space) –

  • plans (List[List[Tuple[float, int]]]) –

  • train_df (Optional[Any]) –

  • temp_path (str) –

  • partition_keys (Optional[List[str]]) –

  • top_n (int) –

  • monitor (Optional[Any]) –

  • distributed (Optional[bool]) –

  • execution_engine (Optional[Any]) –

  • execution_engine_conf (Optional[Any]) –

Return type

List[tune.concepts.flow.report.TrialReport]

suggest_keras_models_by_sha(space, plan, train_df=None, temp_path='', partition_keys=None, top_n=1, monitor=None, distributed=None, execution_engine=None, execution_engine_conf=None)[source]
Parameters
  • space (tune.concepts.space.spaces.Space) –

  • plan (List[Tuple[float, int]]) –

  • train_df (Optional[Any]) –

  • temp_path (str) –

  • partition_keys (Optional[List[str]]) –

  • top_n (int) –

  • monitor (Optional[Any]) –

  • distributed (Optional[bool]) –

  • execution_engine (Optional[Any]) –

  • execution_engine_conf (Optional[Any]) –

Return type

List[tune.concepts.flow.report.TrialReport]

tune_tensorflow.utils

extract_keras_spec(params, type_dict)[source]
Parameters
Return type

Type[tune_tensorflow.spec.KerasTrainingSpec]

keras_space(model, **params)[source]
Parameters
  • model (Any) –

  • params (Any) –

Return type

tune.concepts.space.spaces.Space

to_keras_spec(obj)[source]
Parameters

obj (Any) –

Return type

Type[tune_tensorflow.spec.KerasTrainingSpec]

to_keras_spec_expr(spec)[source]
Parameters

spec (Any) –

Return type

str

tune_notebook

tune_notebook.monitors

class NotebookSimpleChart(interval='1sec', best_only=True, always_update=False)[source]

Bases: tune.concepts.flow.judge.Monitor

Parameters
  • interval (Any) –

  • best_only (bool) –

  • always_update (bool) –

finalize()[source]
Return type

None

on_report(report)[source]
Parameters

report (tune.concepts.flow.report.TrialReport) –

Return type

None

plot(df)[source]
Parameters

df (pandas.core.frame.DataFrame) –

Return type

None

class NotebookSimpleHist(interval='1sec')[source]

Bases: tune_notebook.monitors.NotebookSimpleChart

Parameters

interval (Any) –

plot(df)[source]
Parameters

df (pandas.core.frame.DataFrame) –

Return type

None

class NotebookSimpleRungs(interval='1sec')[source]

Bases: tune_notebook.monitors.NotebookSimpleChart

Parameters

interval (Any) –

plot(df)[source]
Parameters

df (pandas.core.frame.DataFrame) –

Return type

None

class NotebookSimpleTimeSeries(interval='1sec')[source]

Bases: tune_notebook.monitors.NotebookSimpleChart

Parameters

interval (Any) –

plot(df)[source]
Parameters

df (pandas.core.frame.DataFrame) –

Return type

None

class PrintBest[source]

Bases: tune.concepts.flow.judge.Monitor

on_report(report)[source]
Parameters

report (tune.concepts.flow.report.TrialReport) –

Return type

None

tune_test

tune_test.local_optmizer

class NonIterativeObjectiveLocalOptimizerTests[source]

Bases: object

DataFrame level general test suite. All new DataFrame types should pass this test suite.

class Tests(methodName='runTest')[source]

Bases: unittest.case.TestCase

make_optimizer(**kwargs)[source]
Parameters

kwargs (Any) –

Return type

tune.noniterative.objective.NonIterativeObjectiveLocalOptimizer

test_choice()[source]
test_optimization()[source]
test_optimization_dummy()[source]
test_optimization_nested_param()[source]
test_rand()[source]
test_randint()[source]
test_transition_choice()[source]

Short Tutorials

Search Space

THIS IS THE MOST IMPORTANT CONCEPT OF TUNE, MUST READ

Tune defines its own searching space concept and different expressions. It inherits the Fugue philosophy: one expression for all frameworks. For the underlying optimizers (e.g. HyperOpt, Optuna), tune unifies the behaviors. For example Rand(1.0, 5.0, q=1.5) will uniformly search on [1.0 , 2.5, 4.0] no matter you use HyperOpt or Optuna as the underlying optimizer.

In Tune, spaces are predefined before search, it is opposite to Optuna where you get variables inside objectives during runtime. In this way, your space definition is totally separated from objective definition, and your objectives may be just simple python functions independent from Tune.

[1]:
from tune import Space, Grid, Rand, RandInt, Choice
import pandas as pd

Simple Cases

The simplest cases are spaces with only static variables. So the spaces will always generate single configuration.

[2]:
space = Space(a=1, b=1)
print(list(space))
[{'a': 1, 'b': 1}]

Random Expressions

Random search requires .sample method after you define the original space to specify how many random combinations you want to draw from the expression.

Choice

Choice refers to discrete unordered set of values. So Choice(1, 2, 3) is equivalent to Choide(2, 1, 3). When you do random sampling from Choice, every value has equal chance. When you do advanced search such as Bayesian Optimization, it also assumes no relation between values.

[4]:
space = Space(a=1, b=Choice("aa", "bb", "cc")).sample(2, seed=1)
print(list(space))
[{'a': 1, 'b': 'bb'}, {'a': 1, 'b': 'aa'}]
Rand

Rand is the most common expression for a variable. It refers to sampling from a range of value.

Rand(low, high)

uniformly search between [low, high)

[5]:
samples = Rand(10.1, 20.2).generate_many(10000, seed=0)
pd.DataFrame(samples).hist();
_images/notebooks_space_9_0.png
Rand(low, high, log=True)

search in the log space, but still in [low, high) so the smaller values get higher chance to be selected.

For log space searching, low must be greater or equal to 1.

The algorithm: exp(uniform(log(low), log(high)))

[6]:
samples = Rand(10.1, 1000, log=True).generate_many(10000, seed=0)
pd.DataFrame(samples).hist();
_images/notebooks_space_11_0.png
Rand(low, high, q, include_high)

uniformly search between low and high with step q. include_high (default True) indicates whether the high value can be a candidate.

[7]:
print(Rand(-1.0,4.0,q=2.5).generate_many(10, seed=0))
print(Rand(-1.0,4.0,q=2.5,include_high=False).generate_many(10, seed=0))

samples = Rand(1.0,2.0,q=0.3).generate_many(10000, seed=0)
pd.DataFrame(samples).hist();
[1.5, 4.0, 1.5, 1.5, 1.5, 1.5, 1.5, 4.0, 4.0, 1.5]
[1.5, 1.5, 1.5, 1.5, -1.0, 1.5, -1.0, 1.5, 1.5, -1.0]
_images/notebooks_space_13_1.png
Rand(low, high, q, include_high, log=True)

search between low and high with step q in log space. include_high (default True) indicates whether the high value can be a candidate.

[8]:
samples = Rand(1.0,16.0,q=5, log=True).generate_many(10000, seed=0)
pd.DataFrame(samples).hist()

samples = Rand(1.0,16.0,q=5, log=True, include_high=False).generate_many(10000, seed=0)
pd.DataFrame(samples).hist();
_images/notebooks_space_15_0.png
_images/notebooks_space_15_1.png
RandInt

RandInt can be considered as a special case of Rand where the low, high and q are all integers

RandInt(low, high, include_high)
[9]:
samples = RandInt(-2,2).generate_many(10000, seed=0)
pd.DataFrame(samples).hist()

samples = RandInt(-2,2,include_high=False).generate_many(10000, seed=0)
pd.DataFrame(samples).hist();
_images/notebooks_space_17_0.png
_images/notebooks_space_17_1.png
RandInt(low, high, include_high, q)

Search starting from low with step q to high

[10]:
samples = RandInt(-2,4,q=2).generate_many(10000, seed=0)
pd.DataFrame(samples).hist()

samples = RandInt(-2,4,include_high=False,q=2).generate_many(10000, seed=0)
pd.DataFrame(samples).hist();
_images/notebooks_space_19_0.png
_images/notebooks_space_19_1.png
RandInt(low, high, include_high, q, log)

Search starting from low with step q to high. The difference is it’s in log space, so lower values get higher chance.

Also for log searching space, low must be >=1

[11]:
samples = RandInt(1,7,q=2,log=True).generate_many(10000, seed=0)
pd.DataFrame(samples).hist()

samples = RandInt(1,7,include_high=False,q=2,log=True).generate_many(10000, seed=0)
pd.DataFrame(samples).hist();
_images/notebooks_space_21_0.png
_images/notebooks_space_21_1.png

Non-Iterative Tuning Guide

Hello World

Let’s do a hybrid parameter tuning with grid search + random search, and run it distributedly

[1]:
def objective(a, b) -> float:
    return a**2 + b**2
[2]:
from tune import Space, Grid, Rand, RandInt, Choice

space = Space(a=Grid(-1,0,1), b=Rand(-10,10)).sample(100, seed=0)
[4]:
from tune import suggest_for_noniterative_objective

result = suggest_for_noniterative_objective(objective, space, top_n=1)[0]
print(result.sort_metric, result)
NativeExecutionEngine doesn't respect num_partitions ROWCOUNT
0.1909396653178624 {'trial': {'trial_id': '58c94f4f-011e-53da-a85b-7e696ced6600', 'params': {'a': 0, 'b': 0.43696643500143395}, 'metadata': {}, 'keys': []}, 'metric': 0.1909396653178624, 'params': {'a': 0, 'b': 0.43696643500143395}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 0.1909396653178624, 'log_time': datetime.datetime(2021, 10, 6, 23, 35, 53, 24547)}

Now run it distributedly, let’s use dask as as the example

[6]:
from fugue_dask import DaskExecutionEngine

result = suggest_for_noniterative_objective(
    objective, space, top_n=1,
    execution_engine = DaskExecutionEngine
)[0]

print(result.sort_metric, result)
0.1909396653178624 {'trial': {'trial_id': '58c94f4f-011e-53da-a85b-7e696ced6600', 'params': {'a': 0, 'b': 0.43696643500143395}, 'metadata': {}, 'keys': []}, 'metric': 0.1909396653178624, 'params': {'a': 0, 'b': 0.43696643500143395}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 0.1909396653178624, 'log_time': datetime.datetime(2021, 10, 6, 23, 36, 16, 996725)}

In order to use tune in a more elegant and easier way, let’s firstly see how to configure the system.

Configuration

Configuring the system is not necessary but it has great benefit for simpifying your following works.

suggest_for_noniterative_objective and optimize_noniterative have a lot of parameters due to the complexity of tuning operations. But tune let you do global configuration so you don’t need to repeat the same configuration for every tuning task.

Customize Optimizer Converter
[7]:
from tune import TUNE_OBJECT_FACTORY
from tune import NonIterativeObjectiveLocalOptimizer
from tune_hyperopt import HyperoptLocalOptimizer
from tune_optuna import OptunaLocalOptimizer
import optuna

optuna.logging.disable_default_handler()

def to_optimizer(obj):
    if isinstance(obj, NonIterativeObjectiveLocalOptimizer):
        return obj
    if obj is None or "hyperopt"==obj:
        return HyperoptLocalOptimizer(max_iter=20, seed=0)
    if "optuna" == obj:
        return OptunaLocalOptimizer(max_iter=20)
    raise NotImplementedError

# make default level 2 optimizer HyperoptLocalOptimizer, so you will not need to set again
TUNE_OBJECT_FACTORY.set_noniterative_local_optimizer_converter(to_optimizer)
Customize Monitor

Monitor is to collect and render information in real time, there are builtin monitors, you can also create your own.

[9]:
from typing import Optional

from tune import TUNE_OBJECT_FACTORY
from tune import Monitor
from tune_notebook import (
    NotebookSimpleHist,
    NotebookSimpleRungs,
    NotebookSimpleTimeSeries,
    PrintBest,
)

def to_monitor(obj) -> Optional[Monitor]:
    if obj is None:
        return None
    if isinstance(obj, Monitor):
        return obj
    if isinstance(obj, str):
        if obj == "hist":
            return NotebookSimpleHist()
        if obj == "rungs":
            return NotebookSimpleRungs()
        if obj == "ts":
            return NotebookSimpleTimeSeries()
        if obj == "text":
            return PrintBest()
    raise NotImplementedError(obj)

TUNE_OBJECT_FACTORY.set_monitor_converter(to_monitor)
Set Temp Path For Tuning

Temp path can be used to store serialized partitions or checkpoints. Most top level API usage requires a valid temporary path. We can use factory method to set a global value.

Notice if you want to tune distributedly, you should set the path to a distributed file system, for example s3.

[10]:
TUNE_OBJECT_FACTORY.set_temp_path("/tmp")

Tuning Examples

Sometimes, your objective function requires a input dataframe. There are two ways to use dataframes in general:

Pros

Cons

Take them as real dataframes, for example pandas dataframes.

Simple and intuitive

Either the datas ize can’t scale or you have to couple with a distributed solution such as Spark

Take them from parameters, for example paths as parameters.

You have the full control how and when and whether to load the data. More scalable.

More code to make it work

In general, the second way is a better idea. But if your case can fit in the first scenario, then tune has a simple solution letting you take the pandas dataframes as input.

[11]:
from sklearn.datasets import load_diabetes
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestRegressor
import pandas as pd
import numpy as np

diabetes = load_diabetes(as_frame=True)["frame"]

def evaluate(train_df:pd.DataFrame, **kwargs) -> float:
    x, y = train_df.drop("target", axis=1), train_df["target"]
    model = RandomForestRegressor(**kwargs)
    # pay attention here, score is larger better so we return the negative value
    return -np.mean(cross_val_score(model, x, y, scoring="neg_mean_absolute_error", cv=4))

evaluate(diabetes)

[11]:
46.646344389844394

With the given diabetes dataset and the objective function evaluate let’s tune it in different ways

Hybrid Tuning
[13]:
# Grid search only
space = Space(n_estimators=Grid(100,200), random_state=0)

result = suggest_for_noniterative_objective(
    evaluate, space, top_n=1,
    df = diabetes, df_name = "train_df"
)[0]

print(result.sort_metric, result)
NativeExecutionEngine doesn't respect num_partitions ROWCOUNT
46.63103787878788 {'trial': {'trial_id': '5d719fa7-9537-58b1-86cd-fa69a4e75272', 'params': {'n_estimators': 100, 'random_state': 0}, 'metadata': {}, 'keys': []}, 'metric': 46.63103787878788, 'params': {'n_estimators': 100, 'random_state': 0}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 46.63103787878788, 'log_time': datetime.datetime(2021, 10, 6, 23, 37, 11, 450017)}
[14]:
# grid + random
space = Space(n_estimators=Grid(100,200), max_depth=RandInt(2,10), random_state=0).sample(3, seed=0)

result = suggest_for_noniterative_objective(
    evaluate, space, top_n=1,
    df = diabetes, df_name = "train_df"
)[0]

print(result.sort_metric, result)
NativeExecutionEngine doesn't respect num_partitions ROWCOUNT
46.52677715635581 {'trial': {'trial_id': '0a53519f-576b-5a9f-8ef9-4a7e7f69de1a', 'params': {'n_estimators': 200, 'max_depth': 6, 'random_state': 0}, 'metadata': {}, 'keys': []}, 'metric': 46.52677715635581, 'params': {'n_estimators': 200, 'max_depth': 6, 'random_state': 0}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 46.52677715635581, 'log_time': datetime.datetime(2021, 10, 6, 23, 37, 26, 492058)}
[16]:
# random + bayesian optimization (hyperopt is used by default)
space = Space(n_estimators=RandInt(50,200))* Space(max_depth=RandInt(2,10), random_state=0).sample(2, seed=0)

result = suggest_for_noniterative_objective(
    evaluate, space, top_n=1,
    df = diabetes, df_name = "train_df"
)[0]

print(result.sort_metric, result)

result = suggest_for_noniterative_objective(
    evaluate, space, top_n=1,
    df = diabetes, df_name = "train_df",
    local_optimizer="optuna" # switch to optuna for bayesian optimization
)[0]

print(result.sort_metric, result)
NativeExecutionEngine doesn't respect num_partitions ROWCOUNT
NativeExecutionEngine doesn't respect num_partitions ROWCOUNT
46.419699856089416 {'trial': {'trial_id': '52919031-4f17-58d2-8cfc-e4a1d0e4555a', 'params': {'n_estimators': 175, 'max_depth': 6, 'random_state': 0}, 'metadata': {}, 'keys': []}, 'metric': 46.419699856089416, 'params': {'n_estimators': 175, 'max_depth': 6, 'random_state': 0}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 46.419699856089416, 'log_time': datetime.datetime(2021, 10, 6, 23, 38, 37, 355059)}
46.41622613826187 {'trial': {'trial_id': '52919031-4f17-58d2-8cfc-e4a1d0e4555a', 'params': {'n_estimators': 176, 'max_depth': 6, 'random_state': 0}, 'metadata': {}, 'keys': []}, 'metric': 46.41622613826187, 'params': {'n_estimators': 176, 'max_depth': 6, 'random_state': 0}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 46.41622613826187, 'log_time': datetime.datetime(2021, 10, 6, 23, 39, 9, 442020)}
Partition And Train And Tune

This is a very important feature of tune. Sometimes, partitioning the data and train and tune small independent models separately can generate better result. This is not necessarily true, but at least we make it very simple for you to try. You only need to specify partition_keys. And with a distributed engine, all independent tasks are fully parallelized.

[17]:
space = Space(n_estimators=Grid(50,200), max_depth=RandInt(2,10), random_state=0).sample(2, seed=0)

result = suggest_for_noniterative_objective(
    evaluate, space, top_n=1,
    df = diabetes, df_name = "train_df",
    partition_keys = ["sex"]  # for male and females, we train and tune separately
)

for r in result:
    print(r.trial.keys, r.sort_metric, r)
NativeExecutionEngine doesn't respect num_partitions ROWCOUNT
[0.0506801187398187] 42.48208345425722 {'trial': {'trial_id': '83f593dd-a3a2-5ac0-b389-ee19f8cc1134', 'params': {'n_estimators': 200, 'max_depth': 8, 'random_state': 0}, 'metadata': {}, 'keys': [0.0506801187398187]}, 'metric': 42.48208345425722, 'params': {'n_estimators': 200, 'max_depth': 8, 'random_state': 0}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 42.48208345425722, 'log_time': datetime.datetime(2021, 10, 6, 23, 40, 38, 579320)}
[-0.044641636506989] 46.66399292343497 {'trial': {'trial_id': '1759366d-de55-5418-b1b5-48cf91f529a0', 'params': {'n_estimators': 50, 'max_depth': 8, 'random_state': 0}, 'metadata': {}, 'keys': [-0.044641636506989]}, 'metric': 46.66399292343497, 'params': {'n_estimators': 50, 'max_depth': 8, 'random_state': 0}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 46.66399292343497, 'log_time': datetime.datetime(2021, 10, 6, 23, 40, 33, 356186)}
Distributed Tuning

tune is based on Fugue so it can run seamlessly using all Fugue supported execution engines and in the same way Fugue uses them.

[18]:
# This space is a combination of grid and random search
# all level 1 searches, so it can be fully distributed
space = Space(n_estimators=Grid(50,200), max_depth=RandInt(2,10), random_state=0).sample(2, seed=0)

result = suggest_for_noniterative_objective(
    evaluate, space, top_n=1,
    df = diabetes, df_name = "train_df",
    partition_keys = ["sex"],
    execution_engine = DaskExecutionEngine  # this makes the tuning process distributed
)

for r in result:
    print(r.trial.keys, r.sort_metric, r)
[0.0506801187398187] 42.79742975473356 {'trial': {'trial_id': '0f2053de-71b2-514d-b4ff-8495b93a042b', 'params': {'n_estimators': 200, 'max_depth': 6, 'random_state': 0}, 'metadata': {}, 'keys': [0.0506801187398187]}, 'metric': 42.79742975473356, 'params': {'n_estimators': 200, 'max_depth': 6, 'random_state': 0}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 42.79742975473356, 'log_time': datetime.datetime(2021, 10, 6, 23, 40, 57, 795165)}
[-0.044641636506989] 47.480845528260254 {'trial': {'trial_id': '46da77b5-089d-57b9-8036-0ca2e3646fdb', 'params': {'n_estimators': 200, 'max_depth': 6, 'random_state': 0}, 'metadata': {}, 'keys': [-0.044641636506989]}, 'metric': 47.480845528260254, 'params': {'n_estimators': 200, 'max_depth': 6, 'random_state': 0}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 47.480845528260254, 'log_time': datetime.datetime(2021, 10, 6, 23, 41, 0, 714602)}

Realtime Monitoring

Fugue framework can let workers communicate with driver in realtime (see this). So tune leverages this feature for monitoring and iterative problems.

[19]:
space = Space(n_estimators=RandInt(1,20), max_depth=RandInt(2,10), random_state=0).sample(100, seed=0)

result = suggest_for_noniterative_objective(
    evaluate, space, top_n=1,
    df = diabetes, df_name = "train_df",
    monitor="ts"
)

for r in result:
    print(r.trial.keys, r.sort_metric, r)
_images/notebooks_noniterative_23_0.png
[] 46.84555314021837 {'trial': {'trial_id': '2c9456ad-f8a7-56df-9195-3266ffabd941', 'params': {'n_estimators': 20, 'max_depth': 3, 'random_state': 0}, 'metadata': {}, 'keys': []}, 'metric': 46.84555314021837, 'params': {'n_estimators': 20, 'max_depth': 3, 'random_state': 0}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 46.84555314021837, 'log_time': datetime.datetime(2021, 10, 6, 23, 41, 19, 488640)}
[] 46.84555314021837 {'trial': {'trial_id': '2c9456ad-f8a7-56df-9195-3266ffabd941', 'params': {'n_estimators': 20, 'max_depth': 3, 'random_state': 0}, 'metadata': {}, 'keys': []}, 'metric': 46.84555314021837, 'params': {'n_estimators': 20, 'max_depth': 3, 'random_state': 0}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 46.84555314021837, 'log_time': datetime.datetime(2021, 10, 6, 23, 41, 23, 761028)}

To enable monitoring on a distributed engine, you must also enable remote call back. Without shortcut, you have to set multiple configs. Here is an example with the fuggle package who sets the shortcuts for callbacks on Kaggle, it’s as simple as one config: callback: True

[20]:
space = Space(n_estimators=RandInt(1,20), max_depth=RandInt(2,10), random_state=0, n_jobs=1).sample(200, seed=0)

callback_conf = {
    "fugue.rpc.server": "fugue.rpc.flask.FlaskRPCServer",
    "fugue.rpc.flask_server.host": "0.0.0.0",
    "fugue.rpc.flask_server.port": "1234",
    "fugue.rpc.flask_server.timeout": "2 sec",
}

result = suggest_for_noniterative_objective(
    evaluate, space, top_n=1,
    df = diabetes, df_name = "train_df",
    monitor="ts",
    execution_engine = DaskExecutionEngine,
    execution_engine_conf=callback_conf
)

for r in result:
    print(r.trial.keys, r.sort_metric, r)
_images/notebooks_noniterative_25_0.png
[] 46.89339381813802 {'trial': {'trial_id': 'af51195c-3da6-59e5-a4ab-9802041ab314', 'params': {'n_estimators': 20, 'max_depth': 5, 'random_state': 0, 'n_jobs': 1}, 'metadata': {}, 'keys': []}, 'metric': 46.89339381813802, 'params': {'n_estimators': 20, 'max_depth': 5, 'random_state': 0, 'n_jobs': 1}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 46.89339381813802, 'log_time': datetime.datetime(2021, 10, 6, 23, 42, 0, 265059)}
[] 46.89339381813802 {'trial': {'trial_id': 'af51195c-3da6-59e5-a4ab-9802041ab314', 'params': {'n_estimators': 20, 'max_depth': 5, 'random_state': 0, 'n_jobs': 1}, 'metadata': {}, 'keys': []}, 'metric': 46.89339381813802, 'params': {'n_estimators': 20, 'max_depth': 5, 'random_state': 0, 'n_jobs': 1}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 46.89339381813802, 'log_time': datetime.datetime(2021, 10, 6, 23, 42, 0, 265059)}

For the shortcuts of monitoring

  1. ts to monitor the up-to-date best metric collected

  2. hist to motitor the histogram of metrics collected

Early Stopping

When you enable monitoring, you often see the curve flattens quickly, so it can save significant time if it can stop trying the remaining trials. To do early stopping, it is required to enable callbacks for distributed engine (for monitoring, if you don’t monitor, you don’t need to enable callback).

In tune, you can also combine stoppers with logical operators

[21]:
from tune import small_improvement, n_updates

space = Space(n_estimators=RandInt(1,20), max_depth=RandInt(2,10), random_state=0, n_jobs=1).sample(200, seed=0)

callback_conf = {
    "fugue.rpc.server": "fugue.rpc.flask.FlaskRPCServer",
    "fugue.rpc.flask_server.host": "0.0.0.0",
    "fugue.rpc.flask_server.port": "1234",
    "fugue.rpc.flask_server.timeout": "2 sec",
}

result = suggest_for_noniterative_objective(
    evaluate, space, top_n=1,
    df = diabetes, df_name = "train_df",
    monitor="ts",
    # stop if at least 5 updates on best
    # AND the last update on best improved less than 0.1 (abs value)
    stopper= n_updates(5) & small_improvement(0.1,1),
    execution_engine = DaskExecutionEngine,
    execution_engine_conf=callback_conf
)

for r in result:
    print(r.trial.keys, r.sort_metric, r)
_images/notebooks_noniterative_27_0.png
[] 47.01773216903467 {'trial': {'trial_id': 'f84ce5f5-207b-5ab7-a81a-be80879d5431', 'params': {'n_estimators': 19, 'max_depth': 4, 'random_state': 0, 'n_jobs': 1}, 'metadata': {}, 'keys': []}, 'metric': 47.01773216903467, 'params': {'n_estimators': 19, 'max_depth': 4, 'random_state': 0, 'n_jobs': 1}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 47.01773216903467, 'log_time': datetime.datetime(2021, 10, 6, 23, 42, 40, 406040)}
[] 47.01773216903467 {'trial': {'trial_id': 'f84ce5f5-207b-5ab7-a81a-be80879d5431', 'params': {'n_estimators': 19, 'max_depth': 4, 'random_state': 0, 'n_jobs': 1}, 'metadata': {}, 'keys': []}, 'metric': 47.01773216903467, 'params': {'n_estimators': 19, 'max_depth': 4, 'random_state': 0, 'n_jobs': 1}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 47.01773216903467, 'log_time': datetime.datetime(2021, 10, 6, 23, 42, 40, 406040)}

The above example combined a warmup period n_updates(5) and improvement check small_improvement(0.1,1) so it does not stop too early or too late.

You can also customize a simple stopper

[22]:
from typing import List
from tune.noniterative.stopper import SimpleNonIterativeStopper
from tune import TrialReport

def less_than(v: float) -> SimpleNonIterativeStopper:
    def func(current: TrialReport, updated: bool, reports: List[TrialReport]):
        return current.sort_metric <= v

    return SimpleNonIterativeStopper(func, log_best_only=True)

[23]:
result = suggest_for_noniterative_objective(
    evaluate, space, top_n=1,
    df = diabetes, df_name = "train_df",
    monitor="ts",
    stopper= less_than(49),
    execution_engine = DaskExecutionEngine,
    execution_engine_conf=callback_conf
)

for r in result:
    print(r.trial.keys, r.sort_metric, r)
_images/notebooks_noniterative_30_0.png
[] 47.74170052753941 {'trial': {'trial_id': 'b9ab0d11-991d-53d2-ad41-246dcbe23c22', 'params': {'n_estimators': 17, 'max_depth': 2, 'random_state': 0, 'n_jobs': 1}, 'metadata': {}, 'keys': []}, 'metric': 47.74170052753941, 'params': {'n_estimators': 17, 'max_depth': 2, 'random_state': 0, 'n_jobs': 1}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 47.74170052753941, 'log_time': datetime.datetime(2021, 10, 6, 23, 43, 15, 891806)}
[] 47.74170052753941 {'trial': {'trial_id': 'b9ab0d11-991d-53d2-ad41-246dcbe23c22', 'params': {'n_estimators': 17, 'max_depth': 2, 'random_state': 0, 'n_jobs': 1}, 'metadata': {}, 'keys': []}, 'metric': 47.74170052753941, 'params': {'n_estimators': 17, 'max_depth': 2, 'random_state': 0, 'n_jobs': 1}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 47.74170052753941, 'log_time': datetime.datetime(2021, 10, 6, 23, 43, 15, 891806)}

The stopper will try to do graceful stop, so after the stop criteria, some running trials may still finish in with a distributed engine and report back, that is normal. If you want to stop faster, for example set: stop_check_interval: "5sec". But if you have a lot of workers, the frequent check may be a burden on the driver side, it also depends on how heavy compute your custom stopper is using.

Notice: You must create new stoppers everytime you call suggest_for_noniterative_objective because SimpleNonIterativeStopper is stateful.

[ ]:

Non-Iterative Objective

Non-Iterative Objective refers to the objective functions with single iteration. They do not report progress during the execution to get a pruning decision.

Interfaceless

The simplest way to construct a Tune compatible non-iterative objective is to wirte a native python function with type annotations.

[3]:
from typing import Tuple, Dict, Any

def objective1(a, b) -> float:
    return a**2 + b**2

def objective2(a, b) -> Tuple[float, Dict[str, Any]]:
    return a**2 + b**2, {"metadata":"x"}

If you function as float or Tuple[float, Dict[str, Any]] as output annotation, they are valid non-iterative objectives for tune

Tuple[float, Dict[str, Any]] is to return both the metric and metadata.

The following code demos how it works on the backend to convert your simple functions to tune compatible objects. You normally don’t need to do that by yourself.

[5]:
from tune import to_noniterative_objective, Trial

f1 = to_noniterative_objective(objective1)
f2 = to_noniterative_objective(objective2, min_better=False)

trial = Trial("id", params=dict(a=1,b=1))
report1 = f1.safe_run(trial)
report2 = f2.safe_run(trial)

print(type(f1))
print(report1.metric, report1.sort_metric, report1.metadata)
print(report2.metric, report2.sort_metric, report2.metadata)
<class 'tune.noniterative.convert._NonIterativeObjectiveFuncWrapper'>
2.0 2.0 {}
2.0 -2.0 {'metadata': 'x'}

Decorator Approach

It is equivalent to use decorator on top of the functions. But now your functions depend on tune package.

[7]:
from tune import noniterative_objective

@noniterative_objective
def objective_3(a, b) -> float:
    return a**2 + b**2

@noniterative_objective(min_better=False)
def objective_4(a, b) -> Tuple[float, Dict[str, Any]]:
    return a**2 + b**2, {"metadata":"x"}

report3 = objective_3.safe_run(trial)
report4 = objective_4.safe_run(trial)

print(report3.metric, report3.sort_metric, report3.metadata)
print(report4.metric, report4.sort_metric, report4.metadata)
2.0 2.0 {}
2.0 -2.0 {'metadata': 'x'}

Interface Approach

With interface approach, you can access all properties of a trial. Also you can use more flexible logic to generate sort metric.

[9]:
from tune import NonIterativeObjectiveFunc, TrialReport

class Objective(NonIterativeObjectiveFunc):
    def generate_sort_metric(self, value: float) -> float:
        return - value * 10

    def run(self, trial: Trial) -> TrialReport:
        params = trial.params.simple_value
        metric = params["a"]**2 + params["b"]**2
        return TrialReport(trial, metric, metadata=dict(m="x"))

report = Objective().safe_run(trial)
print(report.metric, report.sort_metric, report.metadata)

2.0 -20.0 {'m': 'x'}

Factory Method

Almost all higher level APIs of tune are using TUNE_OBJECT_FACTORY to convert various objects to NonIterativeObjectiveFunc.

[10]:
from tune import TUNE_OBJECT_FACTORY

assert isinstance(TUNE_OBJECT_FACTORY.make_noniterative_objective(objective1), NonIterativeObjectiveFunc)
assert isinstance(TUNE_OBJECT_FACTORY.make_noniterative_objective(objective_4), NonIterativeObjectiveFunc)
assert isinstance(TUNE_OBJECT_FACTORY.make_noniterative_objective(Objective()), NonIterativeObjectiveFunc)

That is why in the higher level APIs, you can just pass in a very simple python function as objective but tune is still able to recognize.

Actually you can make it even more flexible by configuring the factory.

[11]:
def to_obj(obj):
    if obj == "test":
        return to_noniterative_objective(objective1, min_better=False)
    if isinstance(obj, NonIterativeObjectiveFunc):
        return obj
    raise NotImplementedError

TUNE_OBJECT_FACTORY.set_noniterative_objective_converter(to_obj)  # user to_obj to replace the built-in default converter

assert isinstance(TUNE_OBJECT_FACTORY.make_noniterative_objective("test"), NonIterativeObjectiveFunc)

If you customize in this way, then you can pass in test to the higher level tuning APIs, and it will be recognized as a compatible objective.

This is a common approach in Fugue projects. It enables you to use mostly primitive data types to represent what you want to do. For advanced users, if you spend some time on such configuration (one time effort), you will find the code is even simpler and less dependent on fugue and tune.

[ ]:

Non-Iterative Optimizers

AKA Level 2 optimizers, are unified 3rd party solutions for random expressions. Look at this space:

[1]:
from tune import Space, Grid, Rand

space = Space(a=Grid(1,2), b=Rand(0,1))
list(space)
[1]:
[{'a': 1, 'b': Rand(low=0, high=1, q=None, log=False, include_high=True)},
 {'a': 2, 'b': Rand(low=0, high=1, q=None, log=False, include_high=True)}]

Grid is for level 1 optimization, all level 1 parameters will be converted to static values before execution. And level 2 parameters will be optimized during runtime using level 2 optimizers. So for the above example, if we have a Spark cluster and Hyperopt, then we can use Hyperot to search for the best b on each of the 2 configurations. And the 2 jobs are parallelized by Spark.

[3]:
from tune import noniterative_objective, Trial

@noniterative_objective
def objective(a ,b) -> float:
    return a**2 + b**2

trial = Trial("dummy", params=list(space)[0])

Use Directly

Notice normally you don’t use them directly, instead you should use them through top level APIs. This is just to demo how they work.

Hyperopt
[5]:
from tune_hyperopt import HyperoptLocalOptimizer

hyperopt_optimizer = HyperoptLocalOptimizer(max_iter=200, seed=0)
report = hyperopt_optimizer.run(objective, trial)

print(report.sort_metric, report)

1.0000000001665414 {'trial': {'trial_id': 'dummy', 'params': {'a': 1, 'b': 1.2905089873156781e-05}, 'metadata': {}, 'keys': []}, 'metric': 1.0000000001665414, 'params': {'a': 1, 'b': 1.2905089873156781e-05}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 1.0000000001665414, 'log_time': datetime.datetime(2021, 10, 6, 23, 30, 51, 970344)}
Optuna
[7]:
from tune_optuna import OptunaLocalOptimizer
import optuna

optuna.logging.disable_default_handler()

optuna_optimizer = OptunaLocalOptimizer(max_iter=200)
report = optuna_optimizer.run(objective, trial)

print(report.sort_metric, report)
1.0000000003655019 {'trial': {'trial_id': 'dummy', 'params': {'a': 1, 'b': 1.9118105424729645e-05}, 'metadata': {}, 'keys': []}, 'metric': 1.0000000003655019, 'params': {'a': 1, 'b': 1.9118105424729645e-05}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 1.0000000003655019, 'log_time': datetime.datetime(2021, 10, 6, 23, 31, 26, 6566)}

As you see, we have unified the interfaces for using these frameworks. In addition, we also unified the semantic of the random expressions, so the random sampling behavior will be highly consistent on different 3rd party solutions.

Use Top Level API

In the following example, we directly use the entire space where you can mix grid search, random search and Bayesian Optimization.

[8]:
from tune import suggest_for_noniterative_objective

report = suggest_for_noniterative_objective(
    objective, space, top_n=1,
    local_optimizer=hyperopt_optimizer
)[0]

print(report.sort_metric, report)

NativeExecutionEngine doesn't respect num_partitions ROWCOUNT
1.0000000001665414 {'trial': {'trial_id': '971ef4a5-71a9-5bf2-b2a4-f0f1acd02b78', 'params': {'a': 1, 'b': 1.2905089873156781e-05}, 'metadata': {}, 'keys': []}, 'metric': 1.0000000001665414, 'params': {'a': 1, 'b': 1.2905089873156781e-05}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 1.0000000001665414, 'log_time': datetime.datetime(2021, 10, 6, 23, 31, 43, 784128)}

You can also provide only random expressions in space, and use in the same way so it looks like a common case similar to the examples

[14]:
report = suggest_for_noniterative_objective(
    objective, Space(a=Rand(-1,1), b=Rand(-100,100)), top_n=1,
    local_optimizer=optuna_optimizer
)[0]

print(report.sort_metric, report)
NativeExecutionEngine doesn't respect num_partitions ROWCOUNT
0.04085386621249434 {'trial': {'trial_id': '45179c01-7358-5546-8f41-d7c6f120523f', 'params': {'a': 0.01604913454189394, 'b': 0.20148521408021614}, 'metadata': {}, 'keys': []}, 'metric': 0.04085386621249434, 'params': {'a': 0.01604913454189394, 'b': 0.20148521408021614}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 0.04085386621249434, 'log_time': datetime.datetime(2021, 10, 6, 23, 34, 47, 379901)}

Factory Method

In the above example, if we don’t set local_optimizer, then the default level 2 optimizer will be used which can’t handle a configuration with random expressions.

So we have a nice way to make certain optimizer the default one.

[10]:
from tune import NonIterativeObjectiveLocalOptimizer, TUNE_OBJECT_FACTORY

def to_optimizer(obj):
    if isinstance(obj, NonIterativeObjectiveLocalOptimizer):
        return obj
    if obj is None or "hyperopt"==obj:
        return HyperoptLocalOptimizer(max_iter=200, seed=0)
    if "optuna" == obj:
        return OptunaLocalOptimizer(max_iter=200)
    raise NotImplementedError

TUNE_OBJECT_FACTORY.set_noniterative_local_optimizer_converter(to_optimizer)

Now Hyperopt becomes the default level 2 optimizer, and you can switch to Optuna by specifying a string parameter

[16]:
report = suggest_for_noniterative_objective(
    objective, Space(a=Rand(-1,1), b=Rand(-100,100)), top_n=1
)[0]  # using hyperopt

print(report.sort_metric, report)

report = suggest_for_noniterative_objective(
    objective, Space(a=Rand(-1,1), b=Rand(-100,100)), top_n=1,
    local_optimizer="optuna"
)[0]  # using hyperopt

print(report.sort_metric, report)
NativeExecutionEngine doesn't respect num_partitions ROWCOUNT
NativeExecutionEngine doesn't respect num_partitions ROWCOUNT
0.02788888054657708 {'trial': {'trial_id': '45179c01-7358-5546-8f41-d7c6f120523f', 'params': {'a': -0.13745463941867586, 'b': -0.09484251498594332}, 'metadata': {}, 'keys': []}, 'metric': 0.02788888054657708, 'params': {'a': -0.13745463941867586, 'b': -0.09484251498594332}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 0.02788888054657708, 'log_time': datetime.datetime(2021, 10, 6, 23, 35, 19, 961138)}
0.010490219126635992 {'trial': {'trial_id': '45179c01-7358-5546-8f41-d7c6f120523f', 'params': {'a': 0.06699961867542388, 'b': -0.07746786575079878}, 'metadata': {}, 'keys': []}, 'metric': 0.010490219126635992, 'params': {'a': 0.06699961867542388, 'b': -0.07746786575079878}, 'metadata': {}, 'cost': 1.0, 'rung': 0, 'sort_metric': 0.010490219126635992, 'log_time': datetime.datetime(2021, 10, 6, 23, 35, 21, 593974)}
[ ]:

Tune Dataset

TuneDataset contains searching space and all related dataframes with metadata for a tuning task.

TuneDataset should not to be constructed by users directly. Instead, you should use TuneDatasetBuilder or the factory method to construct TuneDataset.

[1]:
from fugue_notebook import setup

setup(is_lab=True)

import pandas as pd
from tune import TUNE_OBJECT_FACTORY, TuneDatasetBuilder, Space, Grid
from fugue import FugueWorkflow

TUNE_OBJECT_FACTORY.make_dataset is a wrapper of TuneDatasetBuilder, making the dataset construction even easier. But TuneDatasetBuilder still has the most flexibility. For example, it can add multiple dataframes with different join types while TUNE_OBJECT_FACTORY.make_dataset can add at most two dataframes (nomrally train and validations dataframes).

[2]:
with FugueWorkflow() as dag:
    builder = TuneDatasetBuilder(Space(a=1, b=2))
    dataset = builder.build(dag)
    dataset.data.show();

with FugueWorkflow() as dag:
    dataset = TUNE_OBJECT_FACTORY.make_dataset(dag, Space(a=1, b=2))
    dataset.data.show();
__tune_trials__
0 gASVXwEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
schema: __tune_trials__:str
__tune_trials__
0 gASVXwEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
schema: __tune_trials__:str

Here are the equivalent ways to construct TuneDataset with space and two dataframes.

In TuneDataset, every dataframe will be partition by certain keys, and each partition will be saved into a temp parquet file. The temp path must be specified. Using the factory, you can call set_temp_path once so you no longer need to provide the temp path explicitly, if you still provide a path, it will be used.

[3]:
pdf1 = pd.DataFrame([[0,1],[1,1],[0,2]], columns = ["a", "b"])
pdf2 = pd.DataFrame([[0,0.5],[2,0.1],[0,0.1],[1,0.3]], columns = ["a", "c"])
space = Space(a=1, b=Grid(1,2,3))

with FugueWorkflow() as dag:
    builder = TuneDatasetBuilder(space, path="/tmp")
    # here we must make pdf1 pdf2 the FugueWorkflowDataFrame, and they
    # both need to be partitioned by the same keys so each partition
    # will be saved to a temp parquet file, and the chunks of data are
    # replaced by file paths before join.
    builder.add_df("df1", dag.df(pdf1).partition_by("a"))
    builder.add_df("df2", dag.df(pdf2).partition_by("a"), how="inner")
    dataset = builder.build(dag)
    dataset.data.show();


TUNE_OBJECT_FACTORY.set_temp_path("/tmp")

with FugueWorkflow() as dag:
    # this method is significantly simpler, as long as you don't have more
    # than 2 dataframes for a tuning task, use this.
    dataset = TUNE_OBJECT_FACTORY.make_dataset(
        dag, space,
        df_name="df1", df=pdf1,
        test_df_name="df2", test_df=pdf2,
        partition_keys=["a"],
    )
    dataset.data.show();
a __tune_df__df1 __tune_df__df2 __tune_trials__
0 0 /tmp/01b823d6-2d65-43be-898d-ed4d5b1ab582.parquet /tmp/5c35d480-6fa8-4776-a0f9-770974b73bb4.parquet gASVYgEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
1 0 /tmp/01b823d6-2d65-43be-898d-ed4d5b1ab582.parquet /tmp/5c35d480-6fa8-4776-a0f9-770974b73bb4.parquet gASVYgEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
2 0 /tmp/01b823d6-2d65-43be-898d-ed4d5b1ab582.parquet /tmp/5c35d480-6fa8-4776-a0f9-770974b73bb4.parquet gASVYgEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
3 1 /tmp/15f2ec83-3494-4ba8-80a5-fa7c558c273c.parquet /tmp/2fe00d9c-b690-49c6-87a5-d365d59066c6.parquet gASVYgEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
4 1 /tmp/15f2ec83-3494-4ba8-80a5-fa7c558c273c.parquet /tmp/2fe00d9c-b690-49c6-87a5-d365d59066c6.parquet gASVYgEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
5 1 /tmp/15f2ec83-3494-4ba8-80a5-fa7c558c273c.parquet /tmp/2fe00d9c-b690-49c6-87a5-d365d59066c6.parquet gASVYgEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
schema: a:long,__tune_df__df1:str,__tune_df__df2:str,__tune_trials__:str
a __tune_df__df1 __tune_df__df2 __tune_trials__
0 0 /tmp/943302c8-2704-4b29-a2ac-64946352a90d.parquet /tmp/9084e1ad-2156-4f3a-be36-52cf55d5c2fb.parquet gASVYgEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
1 0 /tmp/943302c8-2704-4b29-a2ac-64946352a90d.parquet /tmp/9084e1ad-2156-4f3a-be36-52cf55d5c2fb.parquet gASVYgEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
2 0 /tmp/943302c8-2704-4b29-a2ac-64946352a90d.parquet /tmp/9084e1ad-2156-4f3a-be36-52cf55d5c2fb.parquet gASVYgEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
3 1 /tmp/74fa6215-116d-4828-a49c-f58358a9b4e7.parquet /tmp/0aa2aae2-3ab7-46e7-82e2-34a14ded2f0f.parquet gASVYgEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
4 1 /tmp/74fa6215-116d-4828-a49c-f58358a9b4e7.parquet /tmp/0aa2aae2-3ab7-46e7-82e2-34a14ded2f0f.parquet gASVYgEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
5 1 /tmp/74fa6215-116d-4828-a49c-f58358a9b4e7.parquet /tmp/0aa2aae2-3ab7-46e7-82e2-34a14ded2f0f.parquet gASVYgEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
schema: a:long,__tune_df__df1:str,__tune_df__df2:str,__tune_trials__:str

We got 6 rows, because the space will contain 3 configurations. And since for the dataframes, we partitioned by a and inner joined, there will be 2 rows. So in total there are 6 rows in the TuneDataset.

Notice, the number of rows of TuneDataset determines max parallelism. For this case, if you assign 10 workers, 4 will always be idle.

Actually, a more common case is that for each of the dataframe, we don’t partition at all. For TUNE_OBJECT_FACTORY.make_dataset we just need to remove the partition_keys.

[4]:
with FugueWorkflow() as dag:
    dataset = TUNE_OBJECT_FACTORY.make_dataset(
        dag, space,
        df_name="df1", df=pdf1,
        test_df_name="df2", test_df=pdf2,
    )
    dataset.data.show();
__tune_df__df1 __tune_df__df2 __tune_trials__
0 /tmp/a774965e-d0df-417c-84d0-bb693ac337d1.parquet /tmp/2f9a93cd-121b-4697-8fe9-0513aa6bcd82.parquet gASVXwEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
1 /tmp/a774965e-d0df-417c-84d0-bb693ac337d1.parquet /tmp/2f9a93cd-121b-4697-8fe9-0513aa6bcd82.parquet gASVXwEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
2 /tmp/a774965e-d0df-417c-84d0-bb693ac337d1.parquet /tmp/2f9a93cd-121b-4697-8fe9-0513aa6bcd82.parquet gASVXwEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
schema: __tune_df__df1:str,__tune_df__df2:str,__tune_trials__:str

But what if we want to partition on df1 but not on df2? Then again, you can use TuneDatasetBuilder

[5]:
with FugueWorkflow() as dag:
    builder = TuneDatasetBuilder(space, path="/tmp")
    builder.add_df("df1", dag.df(pdf1).partition_by("a"))
    # use cross join because there no common key
    builder.add_df("df2", dag.df(pdf2), how="cross")
    dataset = builder.build(dag)
    dataset.data.show();
a __tune_df__df1 __tune_df__df2 __tune_trials__
0 0 /tmp/4e16f5d7-1dc2-438c-86c7-504502c3e1ad.parquet /tmp/3b92a6f2-31aa-485e-a608-58dcdc925a3c.parquet gASVYgEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
1 0 /tmp/4e16f5d7-1dc2-438c-86c7-504502c3e1ad.parquet /tmp/3b92a6f2-31aa-485e-a608-58dcdc925a3c.parquet gASVYgEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
2 0 /tmp/4e16f5d7-1dc2-438c-86c7-504502c3e1ad.parquet /tmp/3b92a6f2-31aa-485e-a608-58dcdc925a3c.parquet gASVYgEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
3 1 /tmp/058862d5-4c24-437e-ae38-c4810d071a11.parquet /tmp/3b92a6f2-31aa-485e-a608-58dcdc925a3c.parquet gASVYgEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
4 1 /tmp/058862d5-4c24-437e-ae38-c4810d071a11.parquet /tmp/3b92a6f2-31aa-485e-a608-58dcdc925a3c.parquet gASVYgEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
5 1 /tmp/058862d5-4c24-437e-ae38-c4810d071a11.parquet /tmp/3b92a6f2-31aa-485e-a608-58dcdc925a3c.parquet gASVYgEAAAAAAABdlIwYdHVuZS5jb25jZXB0cy5mbG93Ln...
schema: a:long,__tune_df__df1:str,__tune_df__df2:str,__tune_trials__:str
[ ]:

Checkpoint

Checkpoint is normally constructed and provided to you, but if you are interested, this can give you some details.

[4]:
from tune import Checkpoint
from triad import FileSystem

root = FileSystem()
fs = root.makedirs("/tmp/test", recreate=True)
checkpoint = Checkpoint(fs)
print(len(checkpoint))
0
[5]:
!ls /tmp/test
[6]:
with checkpoint.create() as folder:
    folder.writetext("a.txt", "test")
[7]:
!ls /tmp/test
STATE  d9ed2530-20f1-42b3-8818-7fbf1b8eedf3

Here is how to create a new checkpoint under /tmp/test

[8]:
with checkpoint.create() as folder:
    folder.writetext("a.txt", "test2")
[9]:
!ls /tmp/test/*/
/tmp/test/8d4e7fed-2a4c-4789-a732-0cb46294e704/:
a.txt

/tmp/test/d9ed2530-20f1-42b3-8818-7fbf1b8eedf3/:
a.txt

Here is how to get the latest checkpoint folder

[10]:
print(len(checkpoint))
print(checkpoint.latest.readtext("a.txt"))
2
test2
[ ]: