Transformers Repo Install Error
|
|
9
|
34
|
June 6, 2025
|
Stopiteration error
|
|
3
|
97
|
June 6, 2025
|
How many GPU resources do I need for full-fine tuning of the 7b model?
|
|
2
|
5048
|
June 5, 2025
|
Generate: using k-v cache is faster but no difference to memory usage
|
|
5
|
15642
|
June 3, 2025
|
Distributed Training w/ Trainer
|
|
11
|
8799
|
June 3, 2025
|
Grouping by length makes training loss oscillate and makes evaluation loss worse
|
|
2
|
223
|
June 3, 2025
|
How can LLMs be fine-tuned for specialized domain knowledge?
|
|
2
|
205
|
June 3, 2025
|
Implementing Triplet loss in Vit
|
|
1
|
15
|
June 3, 2025
|
Using Huggingface for computer vision (Tensorflow)?
|
|
3
|
404
|
June 2, 2025
|
valueError: Supplied state dict for layers does not contain `bitsandbytes__*` and possibly other `quantized_stats`(when load saved quantized model)
|
|
4
|
696
|
May 30, 2025
|
RGBA -> RGB default background color vs padding color
|
|
1
|
7
|
May 30, 2025
|
Why is Static Cache latency high?
|
|
2
|
12
|
May 29, 2025
|
Error using Trainer with Colab notebook, anyone have a solution?
|
|
1
|
35
|
May 29, 2025
|
LoRA training with accelerate / deepspeed
|
|
3
|
2241
|
May 28, 2025
|
How does Q, K, V differ in LLM?
|
|
1
|
19
|
May 28, 2025
|
The effect of padding_side
|
|
13
|
14289
|
May 27, 2025
|
Prompt caching in pipelines
|
|
1
|
31
|
May 27, 2025
|
GETTING ERROR >> AttributeError: 'InferenceClient' object has no attribute 'post'
|
|
5
|
243
|
May 27, 2025
|
How does Llama For Sequence Classification determine what class corresponds to what label?
|
|
10
|
4796
|
May 25, 2025
|
Best practice for usage of Data Collator For CompletionOnlyLM in multi-turn chat
|
|
2
|
602
|
May 25, 2025
|
How to merge fine-tuned LLaMA-3.1-8B (via LLaMA-Factory) into a single GGUF for LM Studio?
|
|
1
|
27
|
May 25, 2025
|
Generate keeps increasing memory usage on ubuntu
|
|
6
|
33
|
May 25, 2025
|
How does Transformers Library work under the hood?
|
|
1
|
15
|
May 22, 2025
|
Identical Evaluation Metrics for SFT & DPO–Fine-Tuned LoRA Adapter on SeaLLMs-v3-7B
|
|
1
|
14
|
May 22, 2025
|
Create a weighted loss function to handle imbalance?
|
|
3
|
1005
|
May 21, 2025
|
Incorrect total train batch size when using tp_size > 1 and deepspeed
|
|
1
|
33
|
May 20, 2025
|
How do I load a trained checkpoint model?
|
|
1
|
31
|
May 20, 2025
|
Fine tuning on qwen3
|
|
2
|
308
|
May 19, 2025
|
TokenClassificationPipeline produce entities with "##" characters
|
|
6
|
24
|
May 19, 2025
|
PPO Training does not improve SFT model outputs (Metrics identical before and after PPO)
|
|
1
|
34
|
May 19, 2025
|