Skip to main content

training

superduper.ext.transformers.training

Source code

create_quantization_config​

create_quantization_config(config: superduper.ext.transformers.training.LLMTrainer)
ParameterDescription
configThe configuration to use.

Create quantization config for LLM training.

handle_ray_results​

handle_ray_results(db,
llm,
results)
ParameterDescription
dbdatalayer, used for saving the checkpoint
llmllm model, used for saving the checkpoint
resultsthe ray training results, contains the checkpoint

Handle the ray results.

Will save the checkpoint to db if db and llm provided.

prepare_lora_training​

prepare_lora_training(model,
config: superduper.ext.transformers.training.LLMTrainer)
ParameterDescription
modelThe model to prepare for LoRA training.
configThe configuration to use.

Prepare LoRA training for the model.

Get the LoRA target modules and convert the model to peft model.

train_func​

train_func(training_args: superduper.ext.transformers.training.LLMTrainer,
train_dataset: 'Dataset',
eval_datasets: Union[ForwardRef('Dataset'),
Dict[str,
ForwardRef('Dataset')]],
model_kwargs: dict,
tokenizer_kwargs: dict,
trainer_prepare_func: Optional[Callable] = None,
callbacks=None,
**kwargs)
ParameterDescription
training_argstraining Arguments, see LLMTrainingArguments
train_datasettraining dataset, can be huggingface datasets.Dataset or ray.data.Dataset
eval_datasetsevaluation dataset, can be a dict of datasets
model_kwargsmodel kwargs for AutoModelForCausalLM
tokenizer_kwargstokenizer kwargs for AutoTokenizer
trainer_prepare_funcfunction to prepare trainer This function will be called after the trainer is created, we can add some custom settings to the trainer
callbackslist of callbacks will be added to the trainer
kwargsother kwargs for Trainer All the kwargs will be passed to Trainer, make sure the Trainer support these kwargs

Base training function for LLM model.

tokenize​

tokenize(tokenizer,
example,
X,
y)
ParameterDescription
tokenizerThe tokenizer to use.
exampleThe example to tokenize.
XThe input key.
yThe output key.

Function to tokenize the example.

train​

train(training_args: superduper.ext.transformers.training.LLMTrainer,
train_dataset: datasets.arrow_dataset.Dataset,
eval_datasets: Union[datasets.arrow_dataset.Dataset,
Dict[str,
datasets.arrow_dataset.Dataset]],
model_kwargs: dict,
tokenizer_kwargs: dict,
db: Optional[ForwardRef('Datalayer')] = None,
llm: Optional[ForwardRef('LLM')] = None,
ray_configs: Optional[dict] = None,
**kwargs)
ParameterDescription
training_argstraining Arguments, see LLMTrainingArguments
train_datasettraining dataset
eval_datasetsevaluation dataset, can be a dict of datasets
model_kwargsmodel kwargs for AutoModelForCausalLM
tokenizer_kwargstokenizer kwargs for AutoTokenizer
dbdatalayer, used for creating LLMCallback
llmllm model, used for creating LLMCallback
ray_configsray configs, must provide if using ray
kwargsother kwargs for Trainer

Train LLM model on specified dataset.

The training process can be run on these following modes:

  • Local node without ray, but only support single GPU
  • Local node with ray, support multi-nodes and multi-GPUs
  • Remote node with ray, support multi-nodes and multi-GPUs

If run locally, will use train_func to train the model. Can log the training process to db if db and llm provided. Will reuse the db and llm from the current process. If run on ray, will use ray_train to train the model. Can log the training process to db if db and llm provided. Will rebuild the db and llm for the new process that can access to db. The ray cluster must can access to db.

Checkpoint​

Checkpoint(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: str = None,
*,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
path: Optional[str],
step: int) -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
pathThe path to the checkpoint.
stepThe step of the checkpoint.

Checkpoint component for saving the model checkpoint.

LLMCallback​

LLMCallback(self,
cfg: Optional[ForwardRef('Config')] = None,
identifier: Optional[str] = None,
db: Optional[ForwardRef('Datalayer')] = None,
llm: Optional[ForwardRef('LLM')] = None,
experiment_id: Optional[str] = None)
ParameterDescription
cfgThe configuration to use.
identifierThe identifier to use.
dbThe datalayer to use.
llmThe LLM model to use.
experiment_idThe experiment id to use.

LLM Callback for logging training process to db.

This callback will save the checkpoint to db after each epoch. If the save_total_limit is set, will remove the oldest checkpoint.