praeclarumjj3 commited on
Commit
0c8f333
·
verified ·
1 Parent(s): 9d2fad0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -15
README.md CHANGED
@@ -14,18 +14,8 @@ pipeline_tag: video-text-to-text
14
  <img src="https://github.com/allenai/SAGE/blob/main/assets/sage.png" alt="SAGE Teaser" width="800"/>
15
  </div>
16
 
17
- ## Model Details
18
-
19
- * **Developed by:** SHI Labs @ Georgia Tech, Allen Institute for AI (AllenAI), University of Washington
20
- * **Model Type:** Multimodal Agent Orchestrator / MLLM
21
- * **Base Architectures:**
22
- * Qwen2.5-VL-7B-Instruct
23
- * Qwen3-VL-4B-Instruct
24
- * Qwen3-VL-8B-Instruct
25
- * **Language(s):** English
26
- * **License:** Apache 2.0 (Subject to base model license constraints)
27
- * **Paper:** [SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning](https://arxiv.org/abs/25xx.xxxxx)
28
- * **Code Repository:** [https://github.com/allenai/SAGE](https://github.com/allenai/SAGE)
29
 
30
  ## System Capabilities
31
 
@@ -42,9 +32,7 @@ The model is trained to generate JSON-formatted actions to invoke the following
42
  * `ground-event`: Locate start/end timestamps for specific visual events.
43
  * `extract-video-parts`: Extract high-resolution frames or subclips from specific timestamps.
44
  * `analyze`: Perform detailed visual analysis on extracted media.
45
- * **Long Video Expert:** Achieves an **8.2% improvement** on videos longer than 10 minutes compared to direct inference.
46
- * **Efficiency:** Despite being agentic, the inference runtime is roughly **8.6s/sample**, comparable to standard VLMs processing 512 frames, but with significantly higher accuracy.
47
-
48
  ## Usage
49
 
50
  **Note:** SAGE-MM outputs JSON action strings. It requires a runtime environment (provided in our [GitHub repo](https://github.com/allenai/SAGE)) to parse these strings, execute the tools, and feed the observation back to the model.
 
14
  <img src="https://github.com/allenai/SAGE/blob/main/assets/sage.png" alt="SAGE Teaser" width="800"/>
15
  </div>
16
 
17
+ * **GitHub Repo:** [https://github.com/allenai/SAGE](https://github.com/allenai/SAGE)
18
+ * **Project Page:** [https://praeclarumjj3.github.io/sage/](https://praeclarumjj3.github.io/sage/)
 
 
 
 
 
 
 
 
 
 
19
 
20
  ## System Capabilities
21
 
 
32
  * `ground-event`: Locate start/end timestamps for specific visual events.
33
  * `extract-video-parts`: Extract high-resolution frames or subclips from specific timestamps.
34
  * `analyze`: Perform detailed visual analysis on extracted media.
35
+
 
 
36
  ## Usage
37
 
38
  **Note:** SAGE-MM outputs JSON action strings. It requires a runtime environment (provided in our [GitHub repo](https://github.com/allenai/SAGE)) to parse these strings, execute the tools, and feed the observation back to the model.