External Publication
Visit Post

Help with a Local Document RAG System (Storage + Ingestion + Query + Highlighting)

Hugging Face Forums [Unofficial] June 22, 2026
Source

Thank you very very much for the in-depth explanation on what to do,

I have another question where in we are talking about structural extraction right for “Compare this”, “Which is FY did I earn more” and other such queries, we need to extract text into a typed model right?

so we first send the “text” of the chunk to the LLM and then ask it to extract such information into a typed model and then store everything into embeddings?

or should we use a parser to extract info into a typed model

and during retrieval we query based on the embeddings in pgvector and we will fetch the column and also the typed models then send it for answering?

one more question how would .xlsv and .csv work? you mentioned, storing rows and cols with cell_id etc but in excel such comparison queries will occur more frequently right?

so is it recommend to make generalized typed model for them as well? or to parse them into a dataframe or something similar?

will locally run models be good enough for such extraction into typed models?

do we have to worry about chunk overlapping in case of .xlsv? as sending long sheets to the model will raise token errors right?

Sorry for continuing to bug you regarding this

Discussion in the ATmosphere

Loading comments...