What does that process gain over, say, just removing all of the old translations when a diagnostic message changes? Coming up with a new ID for a minor change in text seems like overkill, and will potentially pollute the ID namespace with my_diag_2
or my_diag_updated_5_13_2020
.
Yeah, it seems like changing diagnostic format would effectively invalidate diagnostics of all languages which means we could just remove translations in one place and re-translate. But in practice this is not that common, argument types could be adjusted e.g. from Identifier
to DeclName
but mostly once diagnostic is added it doesn't get changed that often.
Sure, that also works. I was just giving an example of a flow - leaving around dead translations serves no useful purpose. Having a decoupling like I described is mostly useful if the translations get updated asynchronously or separately. If they are all in a single monorepo, then there is no advantage I can see.
-Chris
Your translators will ask "One hump or two" if you start talking about yaml files. You need to use a simple and standard format for the translations themselves, like simple strings file or XLIFF or something else. You then need to have a tool that generates that file format for the translator to use and another tool to import the translations back from the translator into either your yaml files or whatever is your canonical file or files for the translations.
Asking translators to edit a yaml file in a text editor is only asking for trouble. Translators have software that lets them edit XLIFF files without the possibility of messing up the xml. Even simple strings files will come back from the translators with formatting errors.
Firstly, I think this would be overkill. Secondly, I don't think "Asking translators to edit a yaml file in a text editor is only asking for trouble." Because if we used a YAML
schema like @Chris_Lattner3 and @xedin proposed. Which is:
The translator will clearly know what he should do. What do you think?
I already thought of that. I came to the conclusion that formatting errors will most likely occur only in RTL languages, and supporting RTL languages is currently out of the project's scope.
Continuing the discussion from Localization of Compiler Diagnostic Messages:
I think however using only one file that has all diagnostics without splitting them down would be better user experience. Because if you are someone that knows English and French for example and want to translate diagnostic messages, I don't think that it's convenient to open two files and just to look up the original message in English and then translate it in the other file. What do you think?
Sure! I think we can start there and decide as we go.
IMHO it is more straightforward and convenient to open 2 files in an editor and have them side-by-side than figure out how to have side-by-side sections from the same file.
This is 100% true. In game development this is a no-brainer. You might say in a coder only software development environment you handle this more "hardcore" to avoid the tooling efforts. After all there are people who think an IDE is for losers ; )
Not a chance. The translator is not a technical person. They don't know what is important in that file and they have no compiler giving them error messages when they make a mistake. They don't need to know about yaml file formats. That's your job.
My experience with strings files and translators is that when I get them back I always have to verify that the format is correct. Often enough a quote is missing or some other part of the formatting is wrong.
Why not go with strings files like this:
"english string 1" = "english string 1" // To the translator A
"english string 2" = "english string 2"
"english string 1" = "translated string 1" // From the translator B
"english string 2" = "translated string 2"
You generate a file containing lines like A with all the english strings and send it to the translator. They translate all the strings and send it back. You import B into your yaml file. You need automated ways to export A and import B. (For strings that were previously translated file A contains those translated strings on the right side instead of the English strings.)
Oh, and notice the translator doesn't need to open two text editor windows.
If you get only developers to translate it can work, but still decent tooling improves the process, regardless who does it.
I'm sorry, I don't think I fully understand you. How is it going to be "side-by-side sections from the same file" when you are editing in the same file as the following:
- <diagnostic_id>:
en: "..."
fr: "..."
...
You will only be looking at the same diagnostic and won't need to open any thing side by side... Is that right or I'm missing something?
My bad, I misunderstood the format, though I'd consider it worthwhile to avoid loading all the translations unnecessarily (as @xedin mentioned).
Week 1 Progress Update
Hi everyone, I enjoyed reading your comments about my approach regarding my project.
Here's what I've done last week:
- Opened a discussion on the forums.
-
Decided a
YAML
file format. -
Ported diagnostics from
.def
files to theYAML
file. -
Implemented
-locale
flag to use diagnostic messages fromYAML
file.
Currently doing the following:
-
Implementing a
diagnostic-messages-path
flag to get the directory of the diagnostic message folders. For development purposes. -
Implementing a diagnostic message retrieval method that works for both
.def
and.yaml
files.
Here's my approach regarding what I'm currently working on, I'd love to take your opinion on it:
Implementing a diagnostic-messages-path
flag
Currently, I have created a directory at include/swift/AST/Diagnostics
then I created two files there (en.yaml
& fr.yaml
). My current method of retrieving the files is by writing down their full path on my hard disk, which of course, isn't the right way.
After discussing the best approach for this with @owenv. We think the best approach for this is to add a frontend flag to the compiler to override the path to the YAML
files directory and use this when developing to prevent the compiler from looking relative to the main executable. This is like the -diagnostic-documentation-path
option does for educational notes.
Implementing a diagnostic message retrieval method
The current way of retrieving diagnostic messages is by querying by position in diagnosticStrings[]
. However, this won't work with the YAML
file, because when adding a new diagnostic message you will need to add it to the right index in both the .def
and the .yaml
files.
My solution for this is to retrieve the diagnostic message not by position but with ID
. So, maybe create a map
that maps ID
s to messages.
Finally, I'd love if you can give me any feedback on my approach and my progress so far!
Great progress, @HassanElDesouky!
If you look at DiagnosticEngine.h
you'll see that DiagID
is defined as enum : uint32_t
so when diagnostic ids are loaded from .def
(in DiagnosticList.cpp
) they are going to be stored as cases in that enum that's why it's easy to just build an array of them at the moment.
I think we can extended that scheme to YAML as well since, as we discussed, diagnostic ids and signatures are going to be loaded/verified using .def
file(s). I seems like abstract interface for diagnostic retrieval should be based on DiagID
because it's easy to convert it to a number when needed...
Week 2 Progress Update
Hi everyone, first of all, Eid Fitr Mubarak for you all!
This will be a very fast weekly update to just keep you updated. I successfully completed the two things I wanted to do last week which they were:
-
Implemented a
diagnostic-messages-path
flag to get the directory of the diagnostic message folder. -
Implemented a diagnostic message retrieval method that works for both
.def
and.yaml
files.
Because of Eid El-Fitr, I won't do much this week. So, if I have time I'll try to do the following:
- Refactor
DiagnosticEngine
and implementYAMLDiagnosticProvider
field inDiagnosticEngine
.
Community period
First of all, I've had a really good time engaging with the community and I do love it! I think I did a pretty good progress in the first month, and I'm excited to continue working on the project!
In brief, here's what I did last month:
I think I'm almost done with the main deliverables
- Engaged with the Swift community more. e.g. posted on the forums weekly updates, and communicated effectively with the people in the community.
- Got familiarized more with the code base, especially how the
DiagnosticEngine
works! - Learned new OOP concepts in C++.
- Implemented compiler's frontend flags, e.g.
-locale
and-diagnostic-messages-path
. - Implemented a way of retrieval of diagnostic messages from the
YAML
file. - Took care of diagnostic messages fallbacks if the diagnostic message's language isn't supported.
- Refactored
DiagnosticEngine
to support the newYAML
diagnostics format.
Plans for this month:
Disclaimer, this is my own stretch goals which I may not be able to finish this month (but I'll try).
- Start working on PRs, and try to get the changes merged.
- Handle tests. e.g. how to test and what to test.
- Get familiar with LLVM data structure libraries.
- Tackle the stretch goals.
- Start implementing the YAML serialization.
Hi everyone,
Here's an update about for what I've been doing in the last two weeks:
-
I've been focusing on improving my C++ skills.
-
Worked mainly on PR #32239 for refactoring the
DiagnosticEngine
to useYAML
files. -
Discussed testing, lint and prune tools with my mentor.
-
Got a little bit familer with LLVM data structure liberaries and LLVM code base in general.
Plans for the rest of the month:
-
Write tests for localization.
-
Open a PR for frontend flags and hopefully get it to be merged.
Finally, I'd like to thank my mentor. He's been a great help for me and he's always been very responsive and nice. So, thank you @xedin :)
Hi everyone, the first month of the GSoC coding period is almost finished so here's what I did.
Most of my time was working on getting the first PR merged which was PR #32483, in which I introduced localization support for diagnostics via file-per-language store in YAML
format.
And now I'm working on my second PR #32568 which will introduce frontend flags for localization as well as writing some tests to make sure we are getting something other than the normal English messages.
For the next month of the coding period, I think, I'll work on creating lint and prune tools for localization and maybe starting tackling the stretch goals if I had time.
Hi everyone, this month (second coding period) I was mainly working on creating a serialized format for the yaml
files.
https://github.com/apple/swift/pull/33022
In this PR I'm serializing YAML to an LLVM::OnDiskHashTable format. I also created a tool that will handle serialization of YAML file to the .db
OnDiskHashTable format. I think this PR should be merged today or tomorrow.
What's still remaining in the project is:
- Removing text messages from
.def
files and only use the new formats for retrieving diagnostics. - Create prune and lint tools for the YAML file.