hugging-mapper Documentation#
Hugging-Mapper
A lightweight python tool for easy text similarity scoring using Hugging Face models
Table of Contents#
Installation#
pip install hugging-mapper
Features#
Easily compare how similar two pieces of text are
Customizable model selection at initialization
Works with Hugging Face models that create sentence embeddings
Batch scoring for lists of sentence pairs
Usage#
Embedding text using huggingface models
from hugger.mapper import HuggingMapper
# init
# default model_name is 'cambridgeltl/SapBERT-from-PubMedBERT-fulltext'
mapper = HuggingMapper(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
# generate embedding
embedding = mapper.embed_text("I hope you'll find this helpful.")
Similarity search of given data
from hugger.mapper import NodeMapper
import pandas as pd
# demo data
data = pd.DataFrame({
"id": ["node1", "node2", "node3"],
"text": ["Disease", "Gene", "Drug"]
})
# generate embeddings for data using (default) huggingface model
node_mapper = NodeMapper(data)
# get most similar
# threshold 0 returns all data sorted by similarity to the given term
most_similar = node_mapper.get_similar("protein", threshold=0)
# get matching node
node_id, metadata = node_mapper.get_match("genetics", threshold=0.7)
Documentation#
Tutorials and documentation are available on Read the Docs :)
License#
This project is licensed under the MIT License.
Tutorials
Reference