Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibs6upw2h3un6ad43yn3wn2ltoa3nwuxv2apgiclkeoh2s6tdlyn4",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mj45ml7qhx22"
  },
  "path": "/t/hcae-v1-1-bridging-local-context-and-global-attention-in-efficient-text-embeddings/175127#post_1",
  "publishedAt": "2026-04-09T18:19:54.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "**Note: The Show and Tell # HCAE v1.1 Technical Report: Advancing Hybrid Architectures for Efficient Text Embeddings**\n\n**## 1. Abstract**\n\nThe HCAE (Hybrid Convolutional-Attention Encoder) series investigates the synergy between local feature extraction and global contextual modeling. The v1.1 release represents a significant architectural stabilization over the v1.0 baseline. By reconfiguring the layer distribution to a symmetric 4+4 structure and implementing robust normalization techniques, we demonstrate a measurable improvement in semantic representation, achieving a Spearman correlation of 0.656 on the STS Benchmark and an NDCG@10 of 0.413 on the SciFact dataset with only 21.1 million parameters.\n\n**## 2. Introduction and Motivation**\n\nContemporary text embedding models often rely on pure Self-Attention mechanisms (Transformers), which, while powerful, exhibit quadratic complexity and can be parameter-inefficient when deployed at sub-100M scales for specific retrieval tasks. HCAE v1.1 addresses these constraints by leveraging Depthwise Separable Convolutions in the initial stages to capture local structural dependencies, followed by Self-Attention blocks to refine global semantic relations. This hybrid approach significantly reduces the computational overhead while maintaining high fidelity in the embedding space.\n\n**## 3. Architectural Refinement**\n\nThe transition from v1.0 to v1.1 involved several critical design decisions aimed at improving gradient flow and representational capacity:\n\n**### 3.1 Symmetric Layer Distribution**\n\nIn HCAE v1.1, we transitioned from a 5-layer Convolution / 3-layer Attention split to a symmetric ****4+4 configuration****. This adjustment ensures that the model devotes sufficient capacity to both low-level linguistic features (phonetic/syntactic patterns) and high-level semantic abstractions.\n\n**### 3.2 Stability and Non-linearity**\n\n- ****LayerScale Integration:**** We implemented LayerScale with an initial value of 1e-5. This gating mechanism allows for deeper gradient penetration during the early phases of training, preventing the vanishing gradient issues common in hybrid models with heterogeneous layer types.\n\n- ****SwiGLU Activation:**** Replacing standard GELU with SwiGLU (Shazeer, 2020) allowed the model to achieve more precise non-linear mapping. The gated linear unit structure provides a better approximation of complex semantic boundaries, which is reflected in the improved performance on the SciFact technical retrieval task.\n\n**## 4. Empirical Evaluation**\n\nThe models were evaluated using the Massive Text Embedding Benchmark (MTEB) across several key dimensions: Semantic Textual Similarity (STS) and Information Retrieval.\n\n**### 4.1 Performance Analysis (STSBenchmark)**\n\nHCAE v1.1-Instruct achieved a ****0.656 Spearman coefficient**** , representing an 11% relative improvement over the v1.0 baseline (0.591). This gain suggests that the architectural refinements successfully resolved previous bottlenecks in linear semantic mapping.\n\n**### 4.2 Retrieval Performance (SciFact)**\n\nOn the SciFact dataset, which requires high precision in scientific domain retrieval, HCAE v1.1-Instruct reached an ****NDCG@10 of 0.413**** and a ****Recall@10 of 0.523****. For a 21M parameter model, this performance is highly competitive, approaching results typically seen in models with 100M+ parameters.\n\n**## 5. Training Methodology and Instruction Tuning**\n\nHCAE v1.1 utilizes a multi-stage curriculum learning approach:\n\n1. ****Base Pre-training:**** Optimized for general-purpose semantic similarity using massive corpora.\n\n2. ****Instruction Tuning:**** Fine-tuned on a curated set of NLI and domain-specific technical datasets (SciFact, Med-Tech).\n\n3. ****Task-Specific Prefixes:**** Integration of `query:` and `passage:` instructions allows the model to differentiate between asymmetric roles in a retrieval pipeline, effectively orienting its vector space based on the user’s intent.\n\n**## 6. Implementation and Deployment**\n\nTo ensure seamless integration with modern research workflows:\n\n- ****Serialization:**** Models are provided in the `safetensors` format, ensuring rapid loading and enhanced security against arbitrary code execution.\n\n- ****Transformers API:**** Native support for `AutoModel` is provided through a custom mapping, allowing for integration with a single line of code (`trust_remote_code=True`).\n\n- ****Standardized Tokenization:**** Utilization of the BERT-base-uncased vocabulary ensures compatibility with existing pre-processing pipelines.\n\n-–\n\n****HeavensHackDev Research****\n\n**Technical Pre-Release Note - HCAE v1.1** **category is for sharing and discussing projects, showcasing your Spaces, Models, Datasets and more. We value open-source and technical details over promotional content, so focus on sharing the intricate aspects of your work.**",
  "title": "HCAE v1.1: Bridging Local Context and Global Attention in Efficient Text Embeddings"
}