Logo-amall

What open-source datasets are there that would also have a good sentence-encoder on hugging face?

Last active 5 months ago

5 replies

20 views

  • GE

    What open-source datasets are there that would also have a good sentence-encoder on hugging face?

  • JO

    Try out https://huggingface.co/datasets/agnews with distilbert-base-uncased. It's my go-to combination for development/trying out some things πŸ™‚ Generally, however, I go for a reduced set. agnews can be huge, so I randomly sample 3k. You can pick that example from our sample projects in the application.

  • JO

    That combination is even already available in our sample projects, including the encoding.

  • GE

    @mention it was just this mini piece https://medium.com/@george.pearse/vector-databases-for-data-centric-ai-part-2-ba995053ce05

  • JO

    Awesome, really cool demo showcase! πŸ™‚
    And major thanks for the shout-out in the article, that caught me by surprise and put a smile on my face πŸ˜„

Last active 5 months ago

5 replies

20 views