Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(elastic-search): improved default search #3284

Conversation

martijnvdbrug
Copy link
Collaborator

@martijnvdbrug martijnvdbrug commented Dec 20, 2024

Description

Minor tweaks to improve the out-of-the-box search results from elastic search.

It was a bit demotivating to see that my search results were worse than with the default plugin, while ES is such a powerful engine. In my case this was due to:

  1. Description being just as important as name fields
  2. No type tolerance (fuzziness)

Most consumers probably define their own queries, but for those starting with the defaults this gives them a better experience.

Breaking changes

No

Checklist

📌 Always:

  • I have set a clear title
  • My PR is small and contains a single feature
  • I have checked my own PR

👍 Most of the time:

  • I have added or updated test cases
  • I have updated the README if needed

Copy link

vercel bot commented Dec 20, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
docs ✅ Ready (Inspect) Visit Preview Dec 24, 2024 11:10am

@martijnvdbrug
Copy link
Collaborator Author

@monrostar I know you guys have done a lot of work on this plugin, so perhaps you can take a look if this doesn't conflict with any of your use cases?

{ productId: 'T_3', enabled: false },
]);
const t3 = result.search.items.find(i => i.productId === 'T_3');
expect(t3?.enabled).toEqual(false);
Copy link
Collaborator Author

@martijnvdbrug martijnvdbrug Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fuzzy matching returns multiple results now, but this test only cares about if T3 is disabled, so we should ignore the other results

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

'Camera Lens',
'Instant Camera',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Camera Lens is now the first result because name is more important. In most cases this is desired, but this test case is debatable... WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine.

@monrostar
Copy link
Contributor

monrostar commented Dec 25, 2024

@monrostar I know you guys have done a lot of work on this plugin, so perhaps you can take a look if this doesn't conflict with any of your use cases?

Hi, sorry for long reply. You can do what you want. Currently we are using our own plugin for Elasticsearch. We use one document for 1 variant at a time for all channels and all translations and currencies. Unfortunately I had to completely rewrite the original plugin. I'd like to make a contribute of this code, but we don't have plans for that yet...

Here's a small example of the new structure

const defaultAvailableLanguages = [LanguageCode.en]

const languageAnalyzerMap: Partial<Record<LanguageCode, string>> & { default: string } = {
  [LanguageCode.ar]: 'arabic',
  [LanguageCode.hy]: 'armenian',
  [LanguageCode.eu]: 'basque',
  [LanguageCode.bn]: 'bengali',
  [LanguageCode.pt_BR]: 'brazilian',
  [LanguageCode.bg]: 'bulgarian',
  [LanguageCode.ca]: 'catalan',
  [LanguageCode.cs]: 'czech',
  [LanguageCode.da]: 'danish',
  [LanguageCode.nl]: 'dutch',
  [LanguageCode.en]: 'english',
  [LanguageCode.en_AU]: 'english',
  [LanguageCode.en_CA]: 'english',
  [LanguageCode.en_GB]: 'english',
  [LanguageCode.en_US]: 'english',
  [LanguageCode.et]: 'estonian',
  [LanguageCode.fi]: 'finnish',
  [LanguageCode.fr]: 'french',
  [LanguageCode.gl]: 'galician',
  [LanguageCode.de]: 'german',
  [LanguageCode.el]: 'greek',
  [LanguageCode.hu]: 'hungarian',
  [LanguageCode.id]: 'indonesian',
  [LanguageCode.ga]: 'irish',
  [LanguageCode.it]: 'italian',
  [LanguageCode.lv]: 'latvian',
  [LanguageCode.lt]: 'lithuanian',
  [LanguageCode.nb]: 'norwegian',
  [LanguageCode.nn]: 'norwegian',
  [LanguageCode.pt]: 'portuguese',
  [LanguageCode.ro]: 'romanian',
  [LanguageCode.ru]: 'russian',
  [LanguageCode.sr]: 'serbian',
  [LanguageCode.es]: 'spanish',
  [LanguageCode.sv]: 'swedish',
  default: 'standard',
}

function getAnalyzerForLanguage(languageCode: LanguageCode): string {
  return languageAnalyzerMap[languageCode] || languageAnalyzerMap.default
}

export const buildIndexName = (prefix: string, name: string, postfix = ''): estypes.IndexName => `${prefix}${name}${postfix}`
export const buildAliasName = (prefix: string, name: string, postfix = ''): estypes.IndexAlias => `${prefix}${name}${postfix}`

export function TranslatedTextKeywordMappingField(): estypes.MappingObjectProperty {
  return {
    type: 'object',
    properties: defaultAvailableLanguages.reduce((acc, lang) => {
      acc[lang] = {
        type: 'text',
        analyzer: `${getAnalyzerForLanguage(lang)}_analyzer`,
        fields: {
          keyword: {
            type: 'keyword',
          },
        },
      }
      return acc
    }, {} as Record<LanguageCode, estypes.MappingProperty>),
  }
}

export function TranslatedTextMappingField(): estypes.MappingObjectProperty {
  return {
    type: 'object',
    properties: defaultAvailableLanguages.reduce((acc, lang) => {
      acc[lang] = {
        type: 'text',
        analyzer: `${getAnalyzerForLanguage(lang)}_analyzer`,
      }
      return acc
    }, {} as Record<LanguageCode, estypes.MappingProperty>),
  }
}

const priceMappingField: estypes.MappingProperty = {
  type: 'nested',
  properties: {
    id: { type: 'keyword' },
    channelId: { type: 'keyword' },
    currencyCode: { type: 'keyword' },
    price: { type: 'integer' },
  },
}

function generateDynamicTemplatesAndAnalyzers() {
  const dynamicTemplates: Record<string, MappingDynamicTemplate> | Record<string, MappingDynamicTemplate>[] = []
  const analyzers: Record<string, AnalysisAnalyzer> = {
    standard_analyzer: {
      type: 'custom',
      tokenizer: 'standard',
      filter: ['lowercase', 'asciifolding'],
    },
  }
  const filters: Record<string, AnalysisTokenFilter> = {}

  for (const langCode of Object.values(LanguageCode)) {
    const analyzerName = getAnalyzerForLanguage(langCode)
    const effectiveAnalyzer = analyzerName ? `${analyzerName}_analyzer` : 'standard_analyzer'

    dynamicTemplates.push({
      [`language_analyzer_${langCode}_productName`]: {
        match_mapping_type: 'string',
        path_match: `productName.${langCode}`,
        mapping: {
          type: 'text',
          analyzer: effectiveAnalyzer,
          fields: {
            keyword: {
              type: 'keyword',
              ignore_above: 256,
            },
          },
        },
      },
    })

    dynamicTemplates.push({
      [`language_analyzer_${langCode}_variantName`]: {
        match_mapping_type: 'string',
        path_match: `variantName.${langCode}`,
        mapping: {
          type: 'text',
          analyzer: effectiveAnalyzer,
          fields: {
            keyword: {
              type: 'keyword',
              ignore_above: 256,
            },
          },
        },
      },
    })

    dynamicTemplates.push({
      [`language_analyzer_${langCode}_productDescription`]: {
        match_mapping_type: 'string',
        path_match: `productDescription.${langCode}`,
        mapping: {
          type: 'text',
          analyzer: effectiveAnalyzer,
        },
      },
    })

    if (analyzerName && analyzerName !== 'standard') {
      analyzers[`${analyzerName}_analyzer`] = {
        type: 'custom',
        tokenizer: 'standard',
        filter: ['lowercase', 'asciifolding', `${analyzerName}_stemmer`],
      }
      filters[`${analyzerName}_stemmer`] = {
        type: 'stemmer',
        language: analyzerName,
      }
    }
  }

  return { dynamicTemplates, analyzers, filters }
}

const ProductVariantIndexMappingProperties: { [key in keyof VariantIndexItem]: estypes.MappingProperty } = {
  // index date
  lastSyncedAt: { type: 'date' },
  productUpdatedAt: { type: 'date' },
  productCreatedAt: { type: 'date' },
  // product fields
  productId: { type: 'keyword' },

  productChannelIds: { type: 'keyword' },
  productCollectionIds: { type: 'keyword' },
  productFacetValueIds: { type: 'keyword' },
  productFacetIds: { type: 'keyword' },

  productOptions: { type: 'flattened' },
  productOptionsGroups: {
    type: 'nested',
    properties: {
      code: { type: 'keyword' },
      id: { type: 'keyword' },
      name: TranslatedTextKeywordMappingField(),
      options: {
        type: 'nested',
        properties: {
          id: { type: 'keyword' },
          name: TranslatedTextKeywordMappingField(),
          code: { type: 'keyword' },
        },
      },
    },
  },
  productEnabled: { type: 'boolean' },
  productInStock: { type: 'boolean' },

  productName: TranslatedTextKeywordMappingField(),
  productSlug: TranslatedTextKeywordMappingField(),
  productDescription: TranslatedTextMappingField(),

  productPriceMax: priceMappingField,
  productPriceMin: priceMappingField,

  productAssetId: { type: 'keyword' },
  productPreview: { type: 'keyword' },
  productPreviewFocalPoint: { type: 'flattened' },
  productAssets: { type: 'flattened' },

  // variant fields
  variantUpdatedAt: { type: 'date' },
  variantCreatedAt: { type: 'date' },
  variantId: { type: 'keyword' },

  variantChannelIds: { type: 'keyword' },
  variantCollectionIds: { type: 'keyword' },
  variantFacetIds: { type: 'keyword' },
  variantFacetValueIds: { type: 'keyword' },

  variantEnabled: { type: 'boolean' },
  variantInStock: { type: 'boolean' },
  variantDisplayStockLevel: { type: 'keyword' },

  variantName: TranslatedTextKeywordMappingField(),
  variantSku: { type: 'keyword' },

  variantOptions: {
    type: 'nested',
    properties: {
      code: { type: 'keyword' },
      id: { type: 'keyword' },
      name: TranslatedTextKeywordMappingField(),
      group: {
        type: 'object',
        properties: {
          id: { type: 'keyword' },
          name: TranslatedTextKeywordMappingField(),
          code: { type: 'keyword' },
        },
      },
    },
  },

  variantPrice: priceMappingField,

  variantAssetId: { type: 'keyword' },
  variantPreview: { type: 'keyword' },
  variantPreviewFocalPoint: { type: 'flattened' },
  variantAssets: { type: 'flattened' },
}

@martijnvdbrug
Copy link
Collaborator Author

@monrostar Thanks for your reply! In that case, I might also take a look at the indexing process: I see that it was optimized quite a bit over the past years for product with very large amount of variants. This does seem to slow down the indexing process for 'normal' stores.

It takes almost 10 minutes to index a store with ~600 variants and ~200 products. I am no ES expert, but doesn't that seem to be a bit long?

Copy link
Member

@michaelbromley michaelbromley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! Potential optimizations to the overall architecture and indexing performance can be tackled separately.

'Camera Lens',
'Instant Camera',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine.

{ productId: 'T_3', enabled: false },
]);
const t3 = result.search.items.find(i => i.productId === 'T_3');
expect(t3?.enabled).toEqual(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@michaelbromley michaelbromley merged commit b8112be into vendure-ecommerce:master Jan 22, 2025
31 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jan 22, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Status: 💯 Ready
Development

Successfully merging this pull request may close these issues.

3 participants