Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to automatic README translation #2053

Open
rickstaa opened this issue Sep 25, 2022 · 27 comments
Open

Migrate to automatic README translation #2053

rickstaa opened this issue Sep 25, 2022 · 27 comments
Labels
enhancement New feature or request. help wanted Extra attention is needed.

Comments

@rickstaa
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

Keeping the documentation up to date and managing the PRs would be more manageable if we switched from manual to automatic README translations (see https://github.com/dephraiim/translate-readme). The downside is that there might be some errors, but this shouldn't matter for understanding how to use GRS. Google has become quite good at translating languages in the last few years. The upside is that we no longer need to look at translation PRs, we can support more language, and the translations are up to date. We can add flags to the readme for people to choose their language.

@rickstaa rickstaa added enhancement New feature or request. help wanted Extra attention is needed. labels Sep 25, 2022
@Pranav2612000
Copy link
Contributor

Hey @rickstaa
I would like to work on this one. Is it fine?

@rickstaa
Copy link
Collaborator Author

rickstaa commented Oct 1, 2022

@Pranav2612000 First of all, welcome to the commuExcellent! Amazing that you want to help us improve the maintainability of the repository. I am unsure how hard it is to implement this feature and whether https://github.com/dephraiim/translate-readme serves our needs. The implementation found in translate-readme is quite basic (see https://github.com/dephraiim/translate-readme/blob/main/index.js). It could therefore be that it does not filter the query parameters found in the code blocks.

image

It could therefore be that we need to improve this action or create a new action.

My original idea

  • I wanted to eliminate the manual translations that users add since these are outdated and hard to maintain.
  • I wanted a GitHub action that translates the README into other languages and adds these translations to the repository.
  • This action should keep code blocks that contain the query parameters in English.
  • I planned to add these translations to the main readme using flags on mac and Linux and country codes on windows (see Issue with country flag and some emojis are not visible  missive/emoji-mart#559).
🇳🇱 🇫🇷 🇺🇸 🇩🇪 🇮🇹

@rickstaa
Copy link
Collaborator Author

rickstaa commented Oct 1, 2022

@Pranav2612000 Looks like the https://github.com/dephraiim/translate-readme does not yet support HTML it could therefore be that it does not suit our need (see ephraimduncan/translate-readme#1). In that case, we might need to improve this action or build our own action. This would likely require us to add some regex to filter out the HTML and code blocks and put them back in. If you want, you can run some tests to see how good the action is and decide if you still want to take on this challenge 👍🏻.

@Pranav2612000
Copy link
Contributor

Yeah. Took a look at https://github.com/dephraiim/translate-readme and I agree we'll need to modify this a bit. I'll see if I can come up with something so that we don't translate the query params and only translate the non-code text.

@rickstaa
Copy link
Collaborator Author

rickstaa commented Oct 2, 2022

@Pranav2612000 I did some research, and the paid google translate API does handle HTML code (see https://cloud.google.com/translate/docs/advanced/translating-text-v3). However, it does not filter markdown code blocks and will likely translate code in those blocks. These blocks, therefore, have to be filtered and injected using regex. Further, google will charge $10 per million characters after the 500000 chars per month have been used up. Users have to set up an API key to get it to work.

In contrast https://github.com/dephraiim/translate-readme uses https://github.com/iamtraction/google-translate/blob/master/src/tokenGenerator.js#L73 which simply makes a call to the translate.google.com. The results are therefore more unstable, limited to 5000 characters and require more regex filtering before they can be used.

Therefore, I think this should be possible both with the free and paid versions, but it does require significant development time to filter out markdown code blocks and HTML.

@rickstaa
Copy link
Collaborator Author

rickstaa commented Oct 2, 2022

Still, feel free to try to tackle this if you think it can be done in the time you had set for implementing this feature. 🤔 I think both versions (paid and free) would require some parsing to ensure that markdown code blocks and HTML code are still valid. I did not search yet, but there might be some packages that can already provide this ability.

@andrii-bodnar
Copy link

Hey everyone,

How about using a localization platform? I like Crowdin - a cloud-based solution that streamlines localization management for your team. It's free for open-source. Crowdin allows the community to collaborate on content translation and there is a possibility to set up an automatic translations synchronization using Crowdin's native GitHub integration or GitHub Actions.

Node.js CLI Apps Best Practices - an excellent example of a project using Crowdin for translating content by a community + GH Action for automatic synchronization.

I would be happy to help with the setup.

@parinzee
Copy link

parinzee commented Oct 8, 2022

Hey there! @rickstaa I think I can tackle this. If you could assign this to me that would be great. Also you would be willing to use paid solutions right?

@rickstaa
Copy link
Collaborator Author

rickstaa commented Oct 8, 2022

Hey everyone,

How about using a localization platform? I like Crowdin - a cloud-based solution that streamlines localization management for your team. It's free for open-source. Crowdin allows the community to collaborate on content translation, and there is possible to set up an automatic translations synchronization using Crowdin's native GitHub integration or GitHub Actions.

Node.js CLI Apps Best Practices - an excellent example of a project using Crowdin for translating content by a community + GH Action for automatic synchronization.

I would be happy to help with the setup.

@andrii-bodnar Thanks for your message. I appreciate you trying to provide me with a solution. 👍🏻

I checked your profile and see you are a software engineer at Crowdin. I don't care since you offer a valid solution, but some people might fall over that. Maybe next time, add a disclaimer to your comment.

Having that said, I checked your documentation, videos and platform, and I have to say that I'm impressed by the tool you created. I think it is beneficial for streamlining translations for big projects. Thanks for bringing it to my attention. For our small project, however, I think it does not offer too much improvement over the translations.js we are currently using.

The main thing I am trying to solve with #2053 is to eliminate the manual translations of the readme we currently use since these are often incorrect and outdated and clutter our PR backlog. I am therefore looking for an action that uses a service like a google translation API or the free google translation website to do the translation. I found https://github.com/dephraiim/translate-readme, but as explained above, it does not support our readme because of the HTML and markdown code blocks.

@rickstaa
Copy link
Collaborator Author

rickstaa commented Oct 8, 2022

Hey there! @rickstaa, I think I can tackle this. If you could assign this to me, that would be great. Also, you would be willing to use paid solutions, right?

@parinzee Thanks for offering to help implement this feature. Since https://github.com/anuraghazra/github-readme-stats is a free, open-source project, we can, unfortunately, not rely on paid solutions. The reason I mention the google translation API API is that it offers 500000 free translation characters per month, which should be enough to translate the readme (which has 21707 chars) into 23 languages every month.

@rickstaa
Copy link
Collaborator Author

rickstaa commented Oct 9, 2022

@parinzee, @Pranav2612000 I just removed the hacktoberfest label since this issue is not self-contained (it requires the building of a new translation action) and does, therefore, not adhere to the Hacktoberfest maintainer guidelines. That does not mean that I do not accept pull requests for this issue as Hacktoberfest submissions, but simply that this issue is quite involved, and I want to prevent people from seeing it when they search for Hacktoberfest issues. If you still want to tackle this issue feel free to let me know, and I will assign you.

@rickstaa
Copy link
Collaborator Author

rickstaa commented Nov 20, 2022

Hey everyone,

How about using a localization platform? I like Crowdin - a cloud-based solution that streamlines localization management for your team. It's free for open-source. Crowdin allows the community to collaborate on content translation, and there is a possibility to set up an automatic translations synchronization using Crowdin's native GitHub integration or GitHub Actions.

Node.js CLI Apps Best Practices - an excellent example of a project using Crowdin for translating content by a community + GH Action for automatic synchronization.

I would be happy to help with the setup.

@andrii-bodnar I had some time to look at Crowdin and implemented it on one of my other OS repositories. For the card translations, I think it is a significant improvement over the manual translation PR. If @anuraghazra is okay with it, we can use Crowdin for the card translations (i.e. https://github.com/anuraghazra/github-readme-stats/blob/master/src/translations.js). Maybe we can also add the README translations later, as I'm still thinking about creating an automated solution using the Google translate API.

If you could set it up, that would be great 🎉. We can then add a note to both the README.md and CONTRIBUTING.md to explain how users can add card translations. My Crowdin account is rickstaa, or do you need @anuraghazra's account to set it up?

TODOs

@rickstaa
Copy link
Collaborator Author

@anuraghazra, What are your thoughts about using Crowdin for our card translations? I think it improves the translation procedure or do you think it is a bit overkill for only the https://github.com/anuraghazra/github-readme-stats/blob/master/src/translations.js file 🤔?

@andrii-bodnar
Copy link

andrii-bodnar commented Nov 22, 2022

Hi @rickstaa, happy to hear about your success with Crowdin implementation in the GitHub Emoji Picker project!

Just checked the translations.js and it seems like it requires some refactoring to be ready for automatic localization.

The main issue here is that all the languages are located inside a single file. It would be great to split these languages into separate files and ideally store them in JSON files.

From my perspective, Crowdin could be used here for translating both card texts and Readme. Readme files could be translated through the automatic workflows via MT engines. That will also give the possibility for translators to suggest better translations since MT engines might provide bad results.

@rickstaa
Copy link
Collaborator Author

rickstaa commented Nov 22, 2022

Hi @rickstaa, happy to hear about your success with Crowdin implementation in the GitHub Emoji Picker project!

I Just checked the translations. Js, and it seems like it requires some refactoring to be ready for automatic localization.

The main issue is that all the languages are located in a single file. It would be great to split these languages into separate files and, ideally, store them in JSON files.

From my perspective, Crowdin could be used here for translating bots card texts and Readme. Readme files could be solved through automatic workflows via MT engines. That will also allow translators to suggest better translations since MT engines might provide wrong results.

I'm okay with splitting the files into multiple files as I did for the GitHub Emoji Picker. We can try it out for both the card translations and READMEs. 🔥 I, however, will leave the ultimate decision to @anuraghazra, so let's wait for his thoughts on the change.

@anuraghazra
Copy link
Owner

Crowdin seems good! Yeah i think storing locale files as JSON will be the standard way to go.

@rickstaa
Copy link
Collaborator Author

rickstaa commented Jan 24, 2023

@andrii-bodnar, does Crowdin also offer a way to automatically translate the README into other languages using third-party translators like the Google Translate API while keeping code blocks and HTML from being translated? 🤔

@andrii-bodnar
Copy link

@rickstaa sure, the best option here - is an automated workflow in Crowdin Enterprise. There is an MT Pre-translation step that can be configured to use some MT engine. New strings will be translated automatically in this case. In addition, it's possible to manually translate or correct strings. Crowdin Workflows are very flexible.

A similar flow is possible in crowdin.com also - Custom Workflows. It's simpler than Crowdin Enterprise Workflows but it also has an automatic MT Pre-Translation feature.

@rickstaa
Copy link
Collaborator Author

@rickstaa sure, the best option here - is an automated workflow in Crowdin Enterprise. There is an MT Pre-translation step that can be configured to use some MT engine. New strings will be translated automatically in this case. In addition, it's possible to manually translate or correct strings. Crowdin Workflows are very flexible.

A similar flow is possible in crowdin.com also - Custom Workflows. It's simpler than Crowdin Enterprise Workflows but it also has an automatic MT Pre-Translation feature.

@andrii-bodnar amazing to hear that Crowdin enterprise provides this possibility. Maybe we can arrange a partnership between your company and GRS if you and @anuraghazra are open to that.

Such a partnership can benefit both parties since it will give more exposure to your service and makes the GRS repository easier to maintain. 🚀 I don't think the load on your systems would be extreme since we update the README.md or card translations maybe once every two months. 🤔

@andrii-bodnar
Copy link

andrii-bodnar commented Jan 26, 2023

@rickstaa Crowdin is free for Open-Source projects 🙂

It's very easy to submit to the Open-Source plan. First, the project owner needs to create a Crowdin or Crowdin enterprise account.

And then, submit an Open-source project setup request form.

Of course, we would be extremely happy if you add some badge to your project Readme 🙂 (but it's up to you)

@andrii-bodnar
Copy link

@rickstaa the only thing I'm worried about - is the upload of the existing translation to Crowdin.

The point here is that Crowdin uses ML technology to upload translations of HTML-based files. Sometimes it still requires some manual work to do. For more details see this article. As I can see, the Readme is already translated into a bunch of languages.

By the way, how it's going with the JS translation extraction into separate JSON files?

@rickstaa
Copy link
Collaborator Author

@rickstaa the only thing I'm worried about - is the upload of the existing translation to Crowdin.

The point here is that Crowdin uses ML technology to upload translations of HTML-based files. Sometimes it still requires some manual work to do. For more details see this article. As I can see, the Readme is already translated into a bunch of languages.

By the way, how it's going with the JS translation extraction into separate JSON files?

@andrii-bodnar, unfortunately, I haven't had the time to perform the JS translation extraction.

I just discussed this with @anuraghazra, and if you are willing to implement the automatic README translations for us, that would be amazing! We are more than willing to put a Crowdin badge somewhere on the readme. 👍🏻 As explained above, this might be a very beneficial (symbiotic) partnership. 🚀

If you think these automatic README translations are currently impossible or you don't have resources available to implement then, no problem. 👍🏻 I think in that case, we will likely remove the translated READMEs since maintaining them manually is not double anymore, given the scale of this project. 😅

@rickstaa
Copy link
Collaborator Author

rickstaa commented Jan 28, 2023

@andrii-bodnar Feel free to enter my discord server, which can be found on my GitHub README if you want an easier way to discuss 👍.

@andrii-bodnar
Copy link

@rickstaa @anuraghazra I'll try to prepare a demo Crowdin project and GH Actions Workflow for you 🙂

@rickstaa
Copy link
Collaborator Author

rickstaa commented Jan 29, 2023

@rickstaa @anuraghazra I'll try to prepare a demo Crowdin project and GH Actions Workflow for you 🙂

Amazing, thanks! I'm looking forward to seeing your solution. 🚀

@andrii-bodnar
Copy link

Hi @rickstaa @anuraghazra,

Just prepared a Demo Crowdin project and created a PR with integration - #2489

Please check it out 🙂

@rickstaa
Copy link
Collaborator Author

Related to #3364.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. help wanted Extra attention is needed.
Projects
None yet
Development

No branches or pull requests

5 participants