diff --git a/examples/llama-factory/finetune-zh.md b/examples/llama-factory/finetune-zh.md new file mode 100644 index 0000000000000000000000000000000000000000..f119a060d3439825019527a7bdf754065b67ab50 --- /dev/null +++ b/examples/llama-factory/finetune-zh.md @@ -0,0 +1,190 @@ +# 使用LLaMA-Factory微调Qwen模型 + +## LLAMA-Factory简介 +[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)æ˜¯ä¸€ä¸ªç®€å•æ˜“用且高效的大模型è®ç»ƒæ¡†æž¶ï¼Œæ”¯æŒä¸Šç™¾ç§å¤§æ¨¡åž‹çš„è®ç»ƒï¼Œæ¡†æž¶ç‰¹æ€§ä¸»è¦åŒ…括: +- 模型ç§ç±»ï¼šLLaMAã€LLaVAã€Mistralã€Mixtral-MoEã€Qwenã€Yiã€Gemmaã€Baichuanã€ChatGLMã€Phi ç‰ç‰ã€‚ +- è®ç»ƒç®—法:(增é‡ï¼‰é¢„è®ç»ƒã€ï¼ˆå¤šæ¨¡æ€ï¼‰æŒ‡ä»¤ç›‘ç£å¾®è°ƒã€å¥–励模型è®ç»ƒã€PPO è®ç»ƒã€DPO è®ç»ƒã€KTO è®ç»ƒã€ORPO è®ç»ƒç‰ç‰ã€‚ +- è¿ç®—精度:16æ¯”ç‰¹å…¨å‚æ•°å¾®è°ƒã€å†»ç»“微调ã€LoRA微调和基于AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQçš„2/3/4/5/6/8比特QLoRA 微调。 +- 优化算法:GaLoreã€BAdamã€DoRAã€LongLoRAã€LLaMA Proã€Mixture-of-Depthsã€LoRA+ã€LoftQå’ŒPiSSA。 +- åŠ é€Ÿç®—å:FlashAttention-2å’ŒUnsloth。 +- 推ç†å¼•擎:Transformerså’ŒvLLM。 +- å®žéªŒé¢æ¿ï¼šLlamaBoardã€TensorBoardã€Wandbã€MLflowç‰ç‰ã€‚ + +本文将介ç»å¦‚何使用LLAMA-Factory对Qwen2系列大模型进行微调(Qwen1.5系列模型也适用),更多特性请å‚考[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory)。 + +## 安装LLaMA-Factory +下载并安装LLaMA-Factory: +```bash +git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git +cd LLaMA-Factory +pip install -e ".[torch,metrics]" +``` + +安装完æˆåŽï¼Œæ‰§è¡Œ`llamafactory-cli version`,若出现以下æç¤ºï¼Œåˆ™è¡¨æ˜Žå®‰è£…æˆåŠŸï¼š +``` +---------------------------------------------------------- +| Welcome to LLaMA Factory, version 0.8.4.dev0 | +| | +| Project page: https://github.com/hiyouga/LLaMA-Factory | +---------------------------------------------------------- +``` + +## 准备è®ç»ƒæ•°æ® +自定义的è®ç»ƒæ•°æ®åº”ä¿å˜ä¸ºjsonl文件,æ¯ä¸€è¡Œçš„æ ¼å¼å¦‚下: +```json +{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant." + }, + { + "role": "user", + "content": "Tell me something about large language models." + }, + { + "role": "assistant", + "content": "Large language models are a type of language model that is trained on a large corpus of text data. They are capable of generating human-like text and are used in a variety of natural language processing tasks..." + }, + { + "role": "user", + "content": "How about Qwen2?" + }, + { + "role": "assistant", + "content": "Qwen2 is a large language model developed by Alibaba Cloud..." + } + + ] +} +``` + +在LLaMA-Factory文件夹下的`data/dataset_info.json`æ–‡ä»¶ä¸æ³¨å†Œè‡ªå®šä¹‰çš„è®ç»ƒæ•°æ®ï¼Œåœ¨æ–‡ä»¶å°¾éƒ¨æ·»åР如䏋é…置信æ¯ï¼š +``` +"qwen_train_data": { + "file_name": "PATH-TO-YOUR-TRAIN-DATA", + "formatting": "sharegpt", + "columns": { + "messages": "messages" + }, + "tags": { + "role_tag": "role", + "content_tag": "content", + "user_tag": "user", + "assistant_tag": "assistant", + "system_tag": "system" + } +} +``` + +## é…ç½®è®ç»ƒå‚æ•° +设置è®ç»ƒå‚æ•°çš„é…置文件,我们æä¾›äº†å…¨é‡å‚æ•°ã€LoRAã€QLoRAè®ç»ƒæ‰€å¯¹åº”çš„ç¤ºä¾‹æ–‡ä»¶ï¼Œä½ å¯ä»¥æ ¹æ®è‡ªèº«éœ€æ±‚自行修改,é…ç½®è¯¦æƒ…è§æœ¬ç›®å½•下对应的文件: +- `qwen2-7b-full-sft.yaml`: å…¨é‡å‚æ•°è®ç»ƒ +- `qwen2-7b-lora-sft.yaml`: LoRAè®ç»ƒ +- `qwen2-7b-qlora-sft.yaml`: QLoRAè®ç»ƒ + +å…¨é‡å‚æ•°è®ç»ƒæ—¶çš„deepspeedé…置文件å¯å‚考[文件](https://github.com/hiyouga/LLaMA-Factory/tree/main/examples/deepspeed) + +部分è®ç»ƒå‚数说明: + +| 傿•° | 说明 | +|-----------------------------|----------------------------------------------------------------------------------------------| +| model_name_or_path | 模型å称或路径 | +| stage | è®ç»ƒé˜¶æ®µï¼Œå¯é€‰: rm(reward modeling), pt(pretrain), sft(Supervised Fine-Tuning), PPO, DPO, KTO, ORPO | +| do_train | true用于è®ç»ƒ, false用于评估 | +| finetuning_type | 微调方å¼ã€‚å¯é€‰: freeze, LoRA, full | +| lora_target | 采å–LoRAæ–¹æ³•çš„ç›®æ ‡æ¨¡å—,默认值为all。 | +| dataset | 使用的数æ®é›†ï¼Œä½¿ç”¨â€,â€åˆ†éš”多个数æ®é›† | +| template | æ•°æ®é›†æ¨¡æ¿ï¼Œè¯·ä¿è¯æ•°æ®é›†æ¨¡æ¿ä¸Žæ¨¡åž‹ç›¸å¯¹åº”。 | +| output_dir | 输出路径 | +| logging_steps | æ—¥å¿—è¾“å‡ºæ¥æ•°é—´éš” | +| save_steps | 模型æ–点ä¿å˜é—´éš” | +| overwrite_output_dir | 是å¦å…许覆盖输出目录 | +| per_device_train_batch_size | æ¯ä¸ªè®¾å¤‡ä¸Šè®ç»ƒçš„æ‰¹æ¬¡å¤§å° | +| gradient_accumulation_steps | æ¢¯åº¦ç§¯ç´¯æ¥æ•° | +| learning_rate | å¦ä¹ 率 | +| lr_scheduler_type | å¦ä¹ 率曲线,å¯é€‰ linear, cosine, polynomial, constant ç‰ã€‚ | +| num_train_epochs | è®ç»ƒå‘¨æœŸæ•° | +| bf16 | 是å¦ä½¿ç”¨ bf16 æ ¼å¼ | + +## 开始è®ç»ƒ + +å…¨é‡å‚æ•°è®ç»ƒï¼š +```bash +FORCE_TORCHRUN=1 llamafactory-cli train qwen2-7b-full-sft.yaml +``` + +LoRAè®ç»ƒï¼š +```bash +llamafactory-cli train qwen2-7b-lora-sft.yaml +``` + +QLoRAè®ç»ƒï¼š +```bash +llamafactory-cli train qwen2-7b-qlora-sft.yaml +``` + +使用上述è®ç»ƒé…置,å„个方法实测的显å˜å 用如下。è®ç»ƒä¸çš„æ˜¾å˜å 用与è®ç»ƒå‚æ•°é…ç½®æ¯æ¯ç›¸å…³ï¼Œå¯æ ¹æ®è‡ªèº«å®žé™…需求进行设置。 +- å…¨é‡å‚æ•°è®ç»ƒï¼š42.18GB +- LoRAè®ç»ƒï¼š20.17GB +- QLoRAè®ç»ƒ: 10.97GB + +## åˆå¹¶æ¨¡åž‹æƒé‡ +如果采用LoRA或者QLoRA进行è®ç»ƒï¼Œè„šæœ¬åªä¿å˜å¯¹åº”çš„LoRAæƒé‡ï¼Œéœ€è¦åˆå¹¶æƒé‡æ‰èƒ½è¿›è¡ŒæŽ¨ç†ã€‚**å…¨é‡å‚æ•°è®ç»ƒæ— éœ€æ‰§è¡Œæ¤æ¥éª¤** + + +```bash +llamafactory-cli export qwen2-7b-merge-lora.yaml +``` + +æƒé‡åˆå¹¶çš„éƒ¨åˆ†å‚æ•°è¯´æ˜Žï¼š + +| 傿•° | 说明 | +|----------------------|-------------| +| model_name_or_path | 预è®ç»ƒæ¨¡åž‹çš„å称或路径 | +| template | æ¨¡åž‹æ¨¡æ¿ | +| export_dir | 导出路径 | +| export_size | æœ€å¤§å¯¼å‡ºæ¨¡åž‹æ–‡ä»¶å¤§å° | +| export_device | 导出设备 | +| export_legacy_format | 是å¦ä½¿ç”¨æ—§æ ¼å¼å¯¼å‡º | + +注æ„: +- åˆå¹¶Qwen2模型æƒé‡ï¼ŒåŠ¡å¿…å°†template设为`qwen`ï¼›æ— è®ºLoRA还是QLoRAè®ç»ƒï¼Œåˆå¹¶æƒé‡æ—¶ï¼Œ`finetuning_type`å‡ä¸º`lora`。 +- adapter_name_or_path需è¦ä¸Žå¾®è°ƒä¸çš„适é…器输出路径output_dir相对应。 + +## æ¨¡åž‹æŽ¨ç† +è®ç»ƒå®Œæˆï¼Œåˆå¹¶æ¨¡åž‹æƒé‡ä¹‹åŽï¼Œå³å¯åŠ è½½å®Œæ•´çš„æ¨¡åž‹æƒé‡è¿›è¡ŒæŽ¨ç†ï¼Œ 推ç†çš„示例脚本如下: +```python +from transformers import AutoModelForCausalLM, AutoTokenizer +device = "cuda" # the device to load the model onto +model_name_or_path = YOUR-MODEL-PATH + +model = AutoModelForCausalLM.from_pretrained( + model_name_or_path, + torch_dtype="auto", + device_map="auto" +) +tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) + +prompt = "Give me a short introduction to large language model." +messages = [ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": prompt} +] +text = tokenizer.apply_chat_template( + messages, + tokenize=False, + add_generation_prompt=True +) +model_inputs = tokenizer([text], return_tensors="pt").to(device) + +generated_ids = model.generate( + model_inputs.input_ids, + max_new_tokens=512 +) +generated_ids = [ + output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) +] + +response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] +``` diff --git a/examples/llama-factory/qwen2-7b-full-sft.yaml b/examples/llama-factory/qwen2-7b-full-sft.yaml new file mode 100644 index 0000000000000000000000000000000000000000..59a9d77ffde5663671b0635e156cfa297f15d82c --- /dev/null +++ b/examples/llama-factory/qwen2-7b-full-sft.yaml @@ -0,0 +1,38 @@ +### model +model_name_or_path: Qwen/Qwen2-7B-Instruct + +### method +stage: sft +do_train: true +finetuning_type: full +deepspeed: PATH-TO-DS-CONFIG + +### dataset +dataset: qwen_train_data +template: qwen +cutoff_len: 1024 +overwrite_cache: true +preprocessing_num_workers: 16 + +### output +output_dir: saves/qwen2-7b/full/sft +logging_steps: 10 +save_steps: 100 +plot_loss: true +overwrite_output_dir: true + +### train +per_device_train_batch_size: 1 +gradient_accumulation_steps: 16 +learning_rate: 1.0e-5 +num_train_epochs: 1.0 +lr_scheduler_type: cosine +warmup_ratio: 0.1 +bf16: true +ddp_timeout: 180000000 + +### eval +val_size: 0.1 +per_device_eval_batch_size: 1 +eval_strategy: steps +eval_steps: 500 diff --git a/examples/llama-factory/qwen2-7b-lora-sft.yaml b/examples/llama-factory/qwen2-7b-lora-sft.yaml new file mode 100644 index 0000000000000000000000000000000000000000..d5c0adb2819f0f270708ea70254139ab050536dd --- /dev/null +++ b/examples/llama-factory/qwen2-7b-lora-sft.yaml @@ -0,0 +1,41 @@ +### model +model_name_or_path: Qwen/Qwen2-7B-Instruct + +### method +stage: sft +do_train: true +finetuning_type: lora +lora_target: all +lora_rank: 16 +lora_alpha: 16 +lora_dropout: 0.05 + +### dataset +dataset: qwen_train_data +template: qwen +cutoff_len: 1024 +overwrite_cache: true +preprocessing_num_workers: 16 + +### output +output_dir: saves/qwen2-7b/lora/sft +logging_steps: 100 +save_steps: 100 +plot_loss: true +overwrite_output_dir: true + +### train +per_device_train_batch_size: 1 +gradient_accumulation_steps: 16 +learning_rate: 1.0e-4 +num_train_epochs: 1.0 +lr_scheduler_type: cosine +warmup_ratio: 0.1 +bf16: true +ddp_timeout: 180000000 + +### eval +val_size: 0.1 +per_device_eval_batch_size: 1 +eval_strategy: steps +eval_steps: 500 diff --git a/examples/llama-factory/qwen2-7b-merge-lora.yaml b/examples/llama-factory/qwen2-7b-merge-lora.yaml new file mode 100644 index 0000000000000000000000000000000000000000..6c6298827afc236f5b16973cfea9fe81946c1a0b --- /dev/null +++ b/examples/llama-factory/qwen2-7b-merge-lora.yaml @@ -0,0 +1,13 @@ +### Note: DO NOT use quantized model or quantization_bit when merging lora adapters + +### model +model_name_or_path: Qwen/Qwen2-7B-Instruct +adapter_name_or_path: PATH-TO-LORA +template: qwen +finetuning_type: lora + +### export +export_dir: models/qwen2-7b-sft-lora-merged +export_size: 2 +export_device: cpu +export_legacy_format: false \ No newline at end of file diff --git a/examples/llama-factory/qwen2-7b-qlora-sft.yaml b/examples/llama-factory/qwen2-7b-qlora-sft.yaml new file mode 100644 index 0000000000000000000000000000000000000000..2f26d9c8fd4ac06e7fccc5d00d6430393e54f1e9 --- /dev/null +++ b/examples/llama-factory/qwen2-7b-qlora-sft.yaml @@ -0,0 +1,43 @@ +### model +model_name_or_path: Qwen/Qwen2-7B-Instruct + +### method +stage: sft +do_train: true +finetuning_type: lora +lora_target: all +quantization_bit: 4 +quantization_method: bitsandbytes # choices: [bitsandbytes (4/8), hqq (2/3/4/5/6/8), eetq (8)] +lora_rank: 16 +lora_alpha: 16 +lora_dropout: 0.05 + +### dataset +dataset: qwen_train_data +template: qwen +cutoff_len: 1024 +overwrite_cache: true +preprocessing_num_workers: 16 + +### output +output_dir: saves/qwen2-7b/qlora/sft +logging_steps: 100 +save_steps: 100 +plot_loss: true +overwrite_output_dir: true + +### train +per_device_train_batch_size: 1 +gradient_accumulation_steps: 16 +learning_rate: 1.0e-4 +num_train_epochs: 1.0 +lr_scheduler_type: cosine +warmup_ratio: 0.1 +bf16: true +ddp_timeout: 180000000 + +### eval +val_size: 0.1 +per_device_eval_batch_size: 1 +eval_strategy: steps +eval_steps: 500