Update README.md
Browse files
README.md
CHANGED
|
@@ -14,18 +14,8 @@ pipeline_tag: video-text-to-text
|
|
| 14 |
<img src="https://github.com/allenai/SAGE/blob/main/assets/sage.png" alt="SAGE Teaser" width="800"/>
|
| 15 |
</div>
|
| 16 |
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
* **Developed by:** SHI Labs @ Georgia Tech, Allen Institute for AI (AllenAI), University of Washington
|
| 20 |
-
* **Model Type:** Multimodal Agent Orchestrator / MLLM
|
| 21 |
-
* **Base Architectures:**
|
| 22 |
-
* Qwen2.5-VL-7B-Instruct
|
| 23 |
-
* Qwen3-VL-4B-Instruct
|
| 24 |
-
* Qwen3-VL-8B-Instruct
|
| 25 |
-
* **Language(s):** English
|
| 26 |
-
* **License:** Apache 2.0 (Subject to base model license constraints)
|
| 27 |
-
* **Paper:** [SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning](https://arxiv.org/abs/25xx.xxxxx)
|
| 28 |
-
* **Code Repository:** [https://github.com/allenai/SAGE](https://github.com/allenai/SAGE)
|
| 29 |
|
| 30 |
## System Capabilities
|
| 31 |
|
|
@@ -42,9 +32,7 @@ The model is trained to generate JSON-formatted actions to invoke the following
|
|
| 42 |
* `ground-event`: Locate start/end timestamps for specific visual events.
|
| 43 |
* `extract-video-parts`: Extract high-resolution frames or subclips from specific timestamps.
|
| 44 |
* `analyze`: Perform detailed visual analysis on extracted media.
|
| 45 |
-
|
| 46 |
-
* **Efficiency:** Despite being agentic, the inference runtime is roughly **8.6s/sample**, comparable to standard VLMs processing 512 frames, but with significantly higher accuracy.
|
| 47 |
-
|
| 48 |
## Usage
|
| 49 |
|
| 50 |
**Note:** SAGE-MM outputs JSON action strings. It requires a runtime environment (provided in our [GitHub repo](https://github.com/allenai/SAGE)) to parse these strings, execute the tools, and feed the observation back to the model.
|
|
|
|
| 14 |
<img src="https://github.com/allenai/SAGE/blob/main/assets/sage.png" alt="SAGE Teaser" width="800"/>
|
| 15 |
</div>
|
| 16 |
|
| 17 |
+
* **GitHub Repo:** [https://github.com/allenai/SAGE](https://github.com/allenai/SAGE)
|
| 18 |
+
* **Project Page:** [https://praeclarumjj3.github.io/sage/](https://praeclarumjj3.github.io/sage/)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
## System Capabilities
|
| 21 |
|
|
|
|
| 32 |
* `ground-event`: Locate start/end timestamps for specific visual events.
|
| 33 |
* `extract-video-parts`: Extract high-resolution frames or subclips from specific timestamps.
|
| 34 |
* `analyze`: Perform detailed visual analysis on extracted media.
|
| 35 |
+
|
|
|
|
|
|
|
| 36 |
## Usage
|
| 37 |
|
| 38 |
**Note:** SAGE-MM outputs JSON action strings. It requires a runtime environment (provided in our [GitHub repo](https://github.com/allenai/SAGE)) to parse these strings, execute the tools, and feed the observation back to the model.
|