Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreifecstjpb7qxc7hewcjcgcrtehjyo5dsyux4ptemvmsv3422r3hlm",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mjek6ns6jw32"
  },
  "path": "/t/continous-increase-in-memory-usage/127891#post_14",
  "publishedAt": "2026-04-13T07:36:33.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "@router.post"
  ],
  "textContent": "I have the same issue with Wav2vec2 model which is deployed on my local GPU system the continuous growth of RAM causes the system to crash, and model is loaded using huggingface pipeline.below is the code that I used\n\n\n    @router.post(\"/transcribe\")\n\n    async def quran(audio_file: UploadFile = File(...)):\n\n        process = psutil.Process()\n\n        start_ram = process.memory_info().rss / (1024**2)\n\n    # Track paths for strict cleanup\n\n        temp_wav_path = None\n\n\n\n\n    try:\n\n            audio_bytes = await audio_file.read()\n\n    # 2. Load with Torchaudio\n\n    # waveform shape: [channels, time]\n\n            waveform, sample_rate = torchaudio.load(io.BytesIO(audio_bytes))\n\n\n\n\n    # 3. Pre-processing: Convert to Mono if Stereo\n\n    if waveform.shape[0] > 1:\n\n                waveform = torch.mean(waveform, dim=0, keepdim=True)\n\n\n\n\n    # 4. Resample to 16kHz (Standard for most AI speech models)\n\n            target_sample_rate = 16000\n\n    if sample_rate != target_sample_rate:\n\n                resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=target_sample_rate)\n\n                waveform = resampler(waveform)\n\n\n\n\n    # 5. Squeeze to 1D if your model expects [samples] instead of [1, samples]\n\n    # Most transformers models prefer a flat 1D array/tensor\n\n            input_tensor = waveform.squeeze()\n\n\n\n\n    # 6. Inference\n\n    with torch.no_grad(): # Reduce memory usage during inference\n\n                transcript_raw = ai_models[\"wav2vec\"](input_tensor)\n\n                transcript = transcript_raw.get(\"text\", \"\")\n\n\n\n\n    return JSONResponse(content={\"transcript\": transcript}, status_code=200)\n\n\n\n\n    except Exception as e:\n\n            logger.error(f\"Transcription Error: {e}\")\n\n    raise HTTPException(status_code=500, detail=\"Internal processing error\")\n\n\n\n\n    finally:\n\n    # --- AGGRESSIVE CLEANUP ---\n\n    # 1. Delete file immediately\n\n    if temp_wav_path and os.path.exists(temp_wav_path):\n\n    try:\n\n                    os.remove(temp_wav_path)\n\n    except: pass\n\n\n\n\n    # 2. Clear local tensors\n\n    if 'waveform' in locals():\n\n    del waveform\n\n    del audio_bytes\n\n    # 3. GPU and Python GC\n\n    if torch.cuda.is_available():\n\n                torch.cuda.empty_cache()\n\n            gc.collect()\n\n\n\n\n    # 4. FORCE OS RELEASE (The \"Malloc Trim\")\n\n    # This tells Linux to actually take the memory back\n\n    if libc:\n\n                libc.malloc_trim(0)\n\n\n\n\n            final_ram = process.memory_info().rss / (1024**2)\n\n            logger.info(f\"RAM Status: {start_ram:.1f}MB -> {final_ram:.1f}MB\")\n\n\nAnyone can help.",
  "title": "Continous increase in Memory usage"
}