About 32 results
Open links in new tab
  1. safearena/README.md at main · McGill-NLP/safearena · GitHub

    SafeArena is a benchmark for assessing the harmful capabilities of web agents - safearena/README.md at main · McGill-NLP/safearena

  2. Network Graph · McGill-NLP/safearena · GitHub

    SafeArena is a benchmark for assessing the harmful capabilities of web agents - Network Graph · McGill-NLP/safearena

  3. Releases · McGill-NLP/safearena · GitHub

    SafeArena is a benchmark for assessing the harmful capabilities of web agents - Releases · McGill-NLP/safearena

  4. Code frequency · McGill-NLP/safearena · GitHub

    SafeArena is a benchmark for assessing the harmful capabilities of web agents - Code frequency · McGill-NLP/safearena

  5. Community Standards · GitHub

    SafeArena is a benchmark for assessing the harmful capabilities of web agents - Community Standards · McGill-NLP/safearena

  6. Forks · McGill-NLP/safearena · GitHub

    SafeArena is a benchmark for assessing the harmful capabilities of web agents - Forks · McGill-NLP/safearena

  7. Timeout error · Issue #11 · McGill-NLP/safearena · GitHub

    Jun 27, 2025 · Hi there, thanks for the great work! When I was running evaluation I found a lot of TimeoutErrors like the following one. Do you have some intuition about why this is happening? If this …

  8. Issues · McGill-NLP/safearena - GitHub

    SafeArena is a benchmark for assessing the harmful capabilities of web agents - McGill-NLP/safearena

  9. Could the author share the AMI of the container used in SafeArena ...

    Jun 15, 2025 · Hi, I noticed that SafeArena uses Docker containers that are different from those used in WebArena. Would it be possible to share the specific AMI or other setup details used for the …

  10. Milestones - McGill-NLP/safearena · GitHub

    SafeArena is a benchmark for assessing the harmful capabilities of web agents - Milestones - McGill-NLP/safearena