A curated collection of open-source legal AI and legal technology tools, organized by category — the only bilingual (Hebrew/English) directory of its kind. Curated by Lawcal.
A curated collection of open-source LegalTech tools suitable for government, legal, and institutional use. Tools are organized by category, jurisdiction, and license. All tools are free and open-source.
104 tools
Tool
Category
License
Stars
n8n
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
Agentic AI & Automation
NOASSERTION
182.7K
Ollama
Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
RAG & AI Infrastructure
MIT
167.5K
Hugging Face Transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
NLP & Models
Apache-2.0
158.9K
LangChain
The agent engineering platform
RAG & AI Infrastructure
MIT
132.5K
Open WebUI
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
RAG & AI Infrastructure
NOASSERTION
130.3K
Whisper
General-purpose speech recognition by OpenAI
Transcription
MIT
97.2K
markitdown
Convert documents to Markdown
PDF & OCR
MIT
93.3K
Browser Use
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
Agentic AI & Automation
MIT
86.2K
AFFiNE
There can be more than Notion and Miro. AFFiNE(pronounced [ə‘fain]) is a next-gen knowledge base that brings planning, sorting and creating all together. Privacy first, open-source, customizable and ready to use.
Knowledge Management
NOASSERTION
67K
Memos
Open-source, self-hosted note-taking tool built for quick capture. Markdown-native, lightweight, and fully yours.
Knowledge Management
MIT
58.6K
Tesseract
Industry-standard OCR engine
PDF & OCR
Apache-2.0
57.9K
Docling
Get your documents ready for gen AI
Doc Intelligence
MIT
57.1K
AutoGen
A programming framework for agentic AI
Agentic AI & Automation
CC-BY-4.0
56.8K
Stirling-PDF
Local web-based PDF toolbox
PDF & OCR
GPL-3.0
49.6K
LlamaIndex
LlamaIndex is the leading document agent and OCR platform
RAG & AI Infrastructure
MIT
48.3K
CrewAI
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
Agentic AI & Automation
MIT
48.2K
SiYuan
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
Knowledge Management
AGPL-3.0
42.4K
Logseq
A privacy-first, open-source platform for knowledge management and collaboration. Download link: http://github.com/logseq/logseq/releases. roadmap: https://discuss.logseq.com/t/logseq-product-roadmap/34267
Knowledge Management
AGPL-3.0
41.9K
spaCy
💫 Industrial-strength Natural Language Processing (NLP) in Python
NLP & Models
MIT
33.4K
Qdrant
Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
RAG & AI Infrastructure
Apache-2.0
30.1K
Chroma
Data infrastructure for AI
RAG & AI Infrastructure
Apache-2.0
27.2K
Label Studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
E-Discovery
Apache-2.0
26.9K
Haystack
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.
RAG & AI Infrastructure
Apache-2.0
24.7K
EasyOCR
Ready-to-use OCR with 80+ languages
PDF & OCR
Apache-2.0
23.9K
paperless-ngx
Self-hosted document management system
PDF & OCR
GPL-3.0
23.9K
Marker
Fast PDF to Markdown conversion
Doc Intelligence
Other
23.2K
Activepieces
AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents
Agentic AI & Automation
NOASSERTION
21.6K
pgvector
Open-source vector similarity search for Postgres
RAG & AI Infrastructure
NOASSERTION
20.6K
Sentence Transformers
State-of-the-Art Text Embeddings
NLP & Models
Apache-2.0
18.5K
Obsidian
Community plugins list, theme list, and releases of Obsidian.
Knowledge Management
—
16.1K
Notesnook
A fully open source & end-to-end encrypted note taking alternative to Evernote.
Signing
GPL-3.0
13.9K
WhisperX
Fast ASR with word-level timestamps and speaker diarization
Transcription
BSD-4-Clause
13.8K
OCRmyPDF
Add OCR text layer to scanned PDFs
PDF & OCR
MPL-2.0
12.7K
Nougat
Neural OCR for academic documents
Doc Intelligence
MIT
12.7K
Gotenberg
A developer-friendly API for converting many document formats into PDF files, and more!
PDF & OCR
MIT
11.7K
Docmost
Collaborative wiki and documentation software
Signing
AGPL-3.0
11.3K
doccano
Open source annotation tool for machine learning practitioners.
E-Discovery
MIT
10.6K
pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
PDF & OCR
MIT
10K
faster-whisper
Optimized Whisper implementation
Transcription
MIT
9.7K
Unstructured
Pre-processing for RAG pipelines
Doc Intelligence
Apache-2.0
9.5K
insanely-fast-whisper
Ultra-fast Whisper implementation
Transcription
MIT
9.4K
PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
PDF & OCR
AGPL-3.0
9.4K
WeasyPrint
The awesome document factory
PDF & OCR
BSD-3-Clause
8.8K
Documenso
Open-source DocuSign alternative
Signing
AGPL-3.0
8.8K
DocuSeal
Document filling and signing platform
Signing
AGPL-3.0
7.9K
OpenSign
🔥 The free & Open Source DocuSign alternative
Signing
NOASSERTION
6.2K
docTR
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
PDF & OCR
Apache-2.0
6K
ExifTool
Read/write metadata in files
PDF & OCR
Other
5.3K
Layout Parser
Deep learning layout detection
Doc Intelligence
Apache-2.0
4K
Vibe
Desktop transcription app with Whisper
Transcription
MIT
4K
Apache Tika
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
E-Discovery
Apache-2.0
3.7K
GROBID
ML extraction of document structure
Doc Intelligence
Apache-2.0
3K
python-docx-template
Use a docx as a jinja2 template
Assembly
LGPL-2.1
2.6K
whisper-diarization
Speaker diarization with Whisper
Transcription
MIT
2.3K
Paperless-AI
AI addon for paperless-ngx
PDF & OCR
MIT
1.2K
Catala
Programming language for statute implementation
Assembly
Apache-2.0
1.2K
Scriberr
Transcription and note-taking tool
Transcription
MIT
1.2K
docassemble
A free, open-source expert system for guided interviews and document assembly, based on Python, YAML, and Markdown.
Assembly
MIT
931
LexNLP
LexNLP by LexPredict
NLP & Models
AGPL-3.0
771
Blackstone
spaCy pipeline for long-form legal text processing
NLP & Models
Apache-2.0
680
CUAD
Contract clause annotations
Datasets
CC-BY-4.0
673
Awesome Legal NLP
Curated academic research
Communities
—
527
paperless-gpt
ChatGPT integration for paperless-ngx
PDF & OCR
MIT
431
CourtListener
Primary legal data & research platform
Research & APIs
AGPL-3.0
401
Juriscraper
Scrapers for opinions, oral arguments, PACER content
Research & APIs
BSD-2-Clause
385
ContraxSuite
Full contract analytics & document platform (AGPL)
Analytics
AGPL-3.0
369
OpenContracts
Enterprise document analytics with AI-powered analysis (GPL-3)
Analytics
Apache-2.0
328
LegalBench
Legal reasoning tasks
Datasets
MIT
277
LexGLUE
Multi-task benchmark
Datasets
Apache-2.0
239
OpenFisca
OpenFisca core engine. See other repositories for countries-specific code & data.
Assembly
AGPL-3.0
218
Awesome Legal Data
A collection of datasets and other resources for legal text processing.
Communities
—
210
Pile of Law
Legal/administrative texts
Datasets
MIT
143
OpenEDGAR
Framework for searchable EDGAR filings databases
Analytics
MIT
143
FreeDiscovery
Information retrieval engine based on scikit-learn
E-Discovery
MIT
143
OpenAlex
A collection of Jupyter notebooks, each walking you through a common example of bibliometric analysis using scholarly data from the OpenAlex API.
A project aimed at making the Israeli Knesset more transparent. Python and Django based
Research & APIs
BSD-3-Clause
106
Legal ML Datasets
Comprehensive collection of legal ML datasets and tasks
Communities
—
95
Open Legal Data
German legal data platform & API
Research & APIs
MIT
92
hebrew_whisper
GUI for Hebrew transcription using ivrit.ai Whisper models
Transcription
MIT
86
Eyecite
Fast, robust legal citation extractor
Research & APIs
BSD-3-Clause
67
Blawx
Visual Rules-as-Code environment
Assembly
Apache-2.0
65
CaseHOLD
Case holdings analysis
Datasets
Apache-2.0
65
WhisperLiveKit
Real-time speech recognition with Whisper
Transcription
MIT
55
FOIAMachine
Manage and send FOIA requests with agency directory
E-Discovery
AGPL-3.0
52
LawGlance
Free, open-source RAG-based AI legal assistant
Analytics
MIT
40
LegalBench-RAG
Contract retrieval benchmark
Datasets
Apache-2.0
35
ivrit.ai datasets
Hebrew speech dataset creation platform
Datasets
MIT
23
AssemblyLine
Court-form automation toolkit
Assembly
MIT
22
LeXLMs
Corpora and probing tasks for legal language models
NLP & Models
—
22
InLegalBERT
BERT models and recipes for Indian law corpora
NLP & Models
MIT
14
Legal-HeBERT
BERT model for Hebrew legal and legislative domains
NLP & Models
MIT
9
S
Stanford CodeX FutureLaw
Stanford CodeX FutureLaw — open source tool
Communities
—
—
E
EUR-Lex SPARQL
EUR-Lex SPARQL — open source tool
Research & APIs
—
—
Free Law Project
Open legal data ecosystem
Communities
—
—
L
LEGAL-BERT
Pretrained BERT variants for legal corpora (contracts, ECHR, EU law)
NLP & Models
—
—
C
Caselaw Access Project
6.7M+ U.S. court decisions with API
Research & APIs
—
—
U
UK National Archives
Public API for UK court judgments
Research & APIs
—
—
I
ivrit.ai Whisper Turbo
Optimized Hebrew Whisper model with 388 hours training data
Transcription
—
—
L
LEOS
Legislative editing platform for AkomaNtoso XML format
Assembly
—
—
M
MultiLegalPile
Multilingual legal corpus
Datasets
—
—
L
LEXTREME
Multilingual legal tasks
Datasets
—
—
C
crowd-transcribe-v5
Hebrew speech dataset with 388 hours transcribed data
Datasets
—
—
E
EOLE Conference
European Open Source & Free Software Law Event
Communities
—
—
n8n
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
Ollama
Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
Hugging Face Transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
LangChain
The agent engineering platform
Open WebUI
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Whisper
General-purpose speech recognition by OpenAI
markitdown
Convert documents to Markdown
Browser Use
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
AFFiNE
There can be more than Notion and Miro. AFFiNE(pronounced [ə‘fain]) is a next-gen knowledge base that brings planning, sorting and creating all together. Privacy first, open-source, customizable and ready to use.
Memos
Open-source, self-hosted note-taking tool built for quick capture. Markdown-native, lightweight, and fully yours.
Tesseract
Industry-standard OCR engine
Docling
Get your documents ready for gen AI
AutoGen
A programming framework for agentic AI
Stirling-PDF
Local web-based PDF toolbox
LlamaIndex
LlamaIndex is the leading document agent and OCR platform
CrewAI
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
SiYuan
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
Logseq
A privacy-first, open-source platform for knowledge management and collaboration. Download link: http://github.com/logseq/logseq/releases. roadmap: https://discuss.logseq.com/t/logseq-product-roadmap/34267
spaCy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Qdrant
Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
Chroma
Data infrastructure for AI
Label Studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Haystack
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.
EasyOCR
Ready-to-use OCR with 80+ languages
paperless-ngx
Self-hosted document management system
Marker
Fast PDF to Markdown conversion
Activepieces
AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents
pgvector
Open-source vector similarity search for Postgres
Sentence Transformers
State-of-the-Art Text Embeddings
Obsidian
Community plugins list, theme list, and releases of Obsidian.
Notesnook
A fully open source & end-to-end encrypted note taking alternative to Evernote.
WhisperX
Fast ASR with word-level timestamps and speaker diarization
OCRmyPDF
Add OCR text layer to scanned PDFs
Nougat
Neural OCR for academic documents
Gotenberg
A developer-friendly API for converting many document formats into PDF files, and more!
Docmost
Collaborative wiki and documentation software
doccano
Open source annotation tool for machine learning practitioners.
pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
faster-whisper
Optimized Whisper implementation
Unstructured
Pre-processing for RAG pipelines
insanely-fast-whisper
Ultra-fast Whisper implementation
PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
WeasyPrint
The awesome document factory
Documenso
Open-source DocuSign alternative
DocuSeal
Document filling and signing platform
OpenSign
🔥 The free & Open Source DocuSign alternative
docTR
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
ExifTool
Read/write metadata in files
Layout Parser
Deep learning layout detection
Vibe
Desktop transcription app with Whisper
Apache Tika
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
GROBID
ML extraction of document structure
python-docx-template
Use a docx as a jinja2 template
whisper-diarization
Speaker diarization with Whisper
Paperless-AI
AI addon for paperless-ngx
Catala
Programming language for statute implementation
Scriberr
Transcription and note-taking tool
docassemble
A free, open-source expert system for guided interviews and document assembly, based on Python, YAML, and Markdown.
LexNLP
LexNLP by LexPredict
Blackstone
spaCy pipeline for long-form legal text processing
CUAD
Contract clause annotations
Awesome Legal NLP
Curated academic research
paperless-gpt
ChatGPT integration for paperless-ngx
CourtListener
Primary legal data & research platform
Juriscraper
Scrapers for opinions, oral arguments, PACER content
ContraxSuite
Full contract analytics & document platform (AGPL)
OpenContracts
Enterprise document analytics with AI-powered analysis (GPL-3)
LegalBench
Legal reasoning tasks
LexGLUE
Multi-task benchmark
OpenFisca
OpenFisca core engine. See other repositories for countries-specific code & data.
Awesome Legal Data
A collection of datasets and other resources for legal text processing.
Pile of Law
Legal/administrative texts
OpenEDGAR
Framework for searchable EDGAR filings databases
FreeDiscovery
Information retrieval engine based on scikit-learn
OpenAlex
A collection of Jupyter notebooks, each walking you through a common example of bibliometric analysis using scholarly data from the OpenAlex API.