Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove is_bare_repository_cfg global state #1826

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

john-cai
Copy link
Contributor

@john-cai john-cai commented Nov 4, 2024

This patch series removes the global state introduced by the is_bare_repository_cfg variable by moving it into the repository struct. Most of the refactor is done by patch 1. Patch 2 initializes the member in places that left it unInitialized, while patch 3 adds a safety measure by BUG()ing when the variable has not been properly initialized.

cc: shejialuo shejialuo@gmail.com

@john-cai
Copy link
Contributor Author

john-cai commented Nov 4, 2024

/preview

Copy link

Preview email sent as pull.1826.git.git.1730755181.gitgitgadget@gmail.com

@john-cai john-cai changed the title Put is_bare_repository_cfg global into repository struct Remove is_bare_repository_cfg global state Nov 4, 2024
The is_bare_repository_cfg global variable is used for storing a bare
repository setting, either through the config, an env var, or the
commandline. This variable is global, and hence introduces global state
everywhere it is used.

In order to reduce global state, add a member to the repository struct
to keep track of the setting there. For now, the_repository is what's
used to set the member, which still represents global state. However,
there is a parallel effort to replace calls to the_repository with a
repository struct that is passed into builtins, see [1]. Hence, this
change will help the overall effort in reducing global state.

1. 9b1cb50 (builtin: add a repository parameter for builtin
   functions, Fri Sep 13 21:16:14 2024 +0000)

Signed-off-by: John Cai <johncai86@gmail.com>
@john-cai john-cai force-pushed the jc/remove_is_bare_global branch from d1b0026 to cc002ee Compare November 4, 2024 21:26
John Cai added 2 commits November 4, 2024 20:36
A subsequent commit will BUG() when the is_bare_cfg member is
uninitialized. Since there are still some codepaths that initializing the
is_bare_cfg variable, initialize them.

Signed-off-by: John Cai <johncai86@gmail.com>
The is_bare_cfg member of the repository struct should be properly
initiated when setting up a repository. BUG when repo_is_bare() sees
that the flag has not been set.

Signed-off-by: John Cai <johncai86@gmail.com>
@john-cai john-cai force-pushed the jc/remove_is_bare_global branch from cc002ee to 749ba7f Compare November 5, 2024 01:44
@john-cai
Copy link
Contributor Author

john-cai commented Nov 6, 2024

/submit

@dscho
Copy link
Member

dscho commented Nov 6, 2024

Aargh. This is another instance of gitgitgadget/gitgitgadget#1747. I'll investigate immediately.

Copy link

Submitted as pull.1826.git.git.1730926082.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1826/john-cai/jc/remove_is_bare_global-v1

To fetch this version to local tag pr-git-1826/john-cai/jc/remove_is_bare_global-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1826/john-cai/jc/remove_is_bare_global-v1

@@ -716,7 +716,7 @@ static enum git_attr_direction direction;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> The is_bare_repository_cfg global variable is used for storing a bare
> repository setting, either through the config, an env var, or the
> commandline.

I found it curious that the above enumeration does not include the
case where we go through the repository discovery process and find
that we are in a bare repository.  Looking at the original
implementation of is_bare_repository() call, we do check if the
directory structure does have a working tree when these three
sources you listed above say "we are bare" or "we do not know yet".

So it would be be helpful if we made these two points

 - The above enumeration is not meant to be exhausitive.

 - The answer to anybody who asks "is this repository bare?" is more
   subtle than just reading the variable.

clear to readers.

> This variable is global, and hence introduces global state
> everywhere it is used.
>
> In order to reduce global state, add a member to the repository struct
> to keep track of the setting there. For now, the_repository is what's
> used to set the member, which still represents global state. However,
> there is a parallel effort to replace calls to the_repository with a
> repository struct that is passed into builtins, see [1]. Hence, this
> change will help the overall effort in reducing global state.

OK.

> diff --git a/attr.c b/attr.c
> index c605d2c1703..053cd59af26 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -716,7 +716,7 @@ static enum git_attr_direction direction;
>  
>  void git_attr_set_direction(enum git_attr_direction new_direction)
>  {
> -	if (is_bare_repository() && new_direction != GIT_ATTR_INDEX)
> +	if (repo_is_bare(the_repository) && new_direction != GIT_ATTR_INDEX)
>  		BUG("non-INDEX attr direction in a bare repo");

So everybody called is_bare_repository() which implicitly relied on
the global variable now calls repo_is_bare() on the_repository,
where the new member in the struct serves the purpose of the old
global variable.

This replacement to repo_is_bare(the_repository) from
is_bare_repository() is a recurring pattern in this patch, so I'll
remove them from my quoting.

I've used coccinelle to apply this semantic patchlet

    - is_bare_repository()
    + repo_is_bare(the_repository)

and then compared the result with applying this patch to see what
else this patch contains, so I can comment on them.

> diff --git a/builtin/clone.c b/builtin/clone.c
> index 59fcb317a68..80b594c6011 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -1415,7 +1415,7 @@ int cmd_clone(int argc,
>  		repo_clear(the_repository);
>  
>  		/* At this point, we need the_repository to match the cloned repo. */
> -		if (repo_init(the_repository, git_dir, work_tree))
> +		if (repo_init(the_repository, git_dir, work_tree, -1))
>  			warning(_("failed to initialize the repo, skipping bundle URI"));
>  		else if (fetch_bundle_uri(the_repository, bundle_uri, &has_heuristic))
>  			warning(_("failed to fetch objects from bundle URI '%s'"),
> @@ -1446,7 +1446,7 @@ int cmd_clone(int argc,
>  			repo_clear(the_repository);
>  
>  			/* At this point, we need the_repository to match the cloned repo. */
> -			if (repo_init(the_repository, git_dir, work_tree))
> +			if (repo_init(the_repository, git_dir, work_tree, -1))
>  				warning(_("failed to initialize the repo, skipping bundle URI"));
>  			else if (fetch_bundle_list(the_repository,
>  						   transport->bundles))

OK, so our repo_init() now takes one extra parameter.  We'll see what
the new parameter means when we look at the changes to repository.c.

> diff --git a/builtin/init-db.c b/builtin/init-db.c
> index 7e00d57d654..901bf30b508 100644
> --- a/builtin/init-db.c
> +++ b/builtin/init-db.c
> @@ -89,7 +89,7 @@ int cmd_init_db(int argc,
>  	const struct option init_db_options[] = {
>  		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
>  				N_("directory from which templates will be used")),
> -		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
> +		OPT_SET_INT(0, "bare", &the_repository->is_bare_cfg,
>  				N_("create a bare repository"), 1),
>  		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
>  			N_("permissions"),

As you said, this depends on the fact that the_repository is a
pointer pointing at the static singleton variable the_repo at the
compile time, so while it is already safe to take the address of
the_repository->is_bare_cfg to prepare the array of options here, we
haven't really solved the "we shouldn't be using this global
variable" yet.  But we can go one step at a time.

The remaining hunks in the file now all access the "global variable"
via the_repository pointer, but the fact remains that the address of
the thing being accessed is determined at the compile time, so it is
just like accessing a global variable.

Which is naturally expected.

> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index b6b5f1ebde7..7bff99bf08f 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -1591,7 +1591,7 @@ static int add_possible_reference_from_superproject(
>  		struct strbuf err = STRBUF_INIT;
>  		strbuf_add(&sb, odb->path, len);
>  
> -		if (repo_init(&alternate, sb.buf, NULL) < 0)
> +		if (repo_init(&alternate, sb.buf, NULL, the_repository->is_bare_cfg) < 0)

OK.  I do not recall what the original repo_init() did, but I
presume that it initialized the new one depending on what the global
variable said.  We now propagate the setting from the superproject
down to the submodule, which amounts to the same thing but probably
is better as the "inheritance" is more explicitly visible here?

> diff --git a/config.c b/config.c
> index a11bb85da30..c1b14c89947 100644
> --- a/config.c
> +++ b/config.c
> @@ -1441,7 +1441,7 @@ static int git_default_core_config(const char *var, const char *value,
>  	}
>  
>  	if (!strcmp(var, "core.bare")) {
> -		is_bare_repository_cfg = git_config_bool(var, value);
> +		the_repository->is_bare_cfg = git_config_bool(var, value);
>  		return 0;
>  	}

OK.  This is the same as what init-db did.

> diff --git a/dir.c b/dir.c
> index e3ddd5b5296..c995668e54c 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -4008,7 +4008,7 @@ static void connect_wt_gitdir_in_nested(const char *sub_worktree,
>  	const struct submodule *sub;
>  
>  	/* If the submodule has no working tree, we can ignore it. */
> -	if (repo_init(&subrepo, sub_gitdir, sub_worktree))
> +	if (repo_init(&subrepo, sub_gitdir, sub_worktree, the_repository->is_bare_cfg))
>  		return;

Same logic for submodule inheriting from the superproject?

> diff --git a/environment.c b/environment.c
> index a2ce9980818..9af20d5e34e 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -34,7 +34,6 @@ int has_symlinks = 1;
>  int minimum_abbrev = 4, default_abbrev = -1;
>  int ignore_case;
>  int assume_unchanged;
> -int is_bare_repository_cfg = -1; /* unspecified */

This is now gone.  We'll see a corresponding change in repository.c
where the_repo instance is initialized, I presume.

> @@ -146,12 +145,6 @@ const char *getenv_safe(struct strvec *argv, const char *name)
>  	return argv->v[argv->nr - 1];
>  }
>  
> -int is_bare_repository(void)
> -{
> -	/* if core.bare is not 'false', let's see if there is a work tree */
> -	return is_bare_repository_cfg && !repo_get_work_tree(the_repository);
> -}

This is now gone, and we'll see a corresponding change in
repository.c, I presume.

It is somewhat curious that in a repository where core.bare says
true, we countermand it if we cannot figure out where its working
tree is and say "core.bare is lying; we are in a bare repository".

The curiousity is not the fault of this patch, of course.  The
updated code in repository.c would hopefully have the same
curiousity (or we'd be looking at an unintended behaviour change,
if it didn't).

> diff --git a/environment.h b/environment.h
> index 923e12661e1..23f29a4df05 100644
> --- a/environment.h
> +++ b/environment.h
> @@ -144,8 +144,7 @@ void set_shared_repository(int value);
>  int get_shared_repository(void);
>  void reset_shared_repository(void);
>  
> -extern int is_bare_repository_cfg;
> -int is_bare_repository(void);
> +int is_bare_repository(struct repository *repo);

Curious.  I somehow thought is_bare_repository() will be gone, and
everybody is supposed to call repo_is_bare(the_repository), instead.

What makes a caller pick one over the other?


> diff --git a/git.c b/git.c
> index c2c1b8e22c2..c8ed29b2295 100644
> --- a/git.c
> +++ b/git.c
> @@ -251,7 +251,7 @@ static int handle_options(const char ***argv, int *argc, int *envchanged)
>  				*envchanged = 1;
>  		} else if (!strcmp(cmd, "--bare")) {
>  			char *cwd = xgetcwd();
> -			is_bare_repository_cfg = 1;
> +			the_repository->is_bare_cfg = 1;

OK.  The same as how init-db and config now access the new member of
the global singleton the_repository instead of the global variable.

> diff --git a/repository.c b/repository.c
> index f988b8ae68a..96608058b61 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -25,7 +25,9 @@
>  extern struct repository *the_repository;
>  
>  /* The main repository */
> -static struct repository the_repo;
> +static struct repository the_repo = {
> +	.is_bare_cfg = -1,
> +};

OK, this is just as expected by reading the patch so far.

> @@ -263,10 +265,13 @@ static int read_and_verify_repository_format(struct repository_format *format,
>  /*
>   * Initialize 'repo' based on the provided 'gitdir'.
>   * Return 0 upon success and a non-zero value upon failure.
> + * is_bare can be passed to indicate whether or not the repository should be
> + * treated as bare when repo_init() is used to initiate a secondary repository.

"initiate" -> "initialize" perhaps?

>  int repo_init(struct repository *repo,
>  	      const char *gitdir,
> -	      const char *worktree)
> +	      const char *worktree,
> +	      int is_bare)
>  {
>  	struct repository_format format = REPOSITORY_FORMAT_INIT;
>  	memset(repo, 0, sizeof(*repo));
> @@ -283,6 +288,8 @@ int repo_init(struct repository *repo,
>  	repo_set_compat_hash_algo(repo, format.compat_hash_algo);
>  	repo_set_ref_storage_format(repo, format.ref_storage_format);
>  	repo->repository_format_worktree_config = format.worktree_config;
> +	if (is_bare > 0)
> +		repo->is_bare_cfg = is_bare;

When repo_init() is called with anything other than &the_repo, who
initializes repo->is_bare_cfg?  If the answer is "nobody", shouldn't
this function be doing something like

	repo->is_bare_cfg = (0 <= is_bare) ? is_bare : -1;

which actually amounts to an unconditional

	repo->is_bare_cfg = is_bare;

as is_bare can only take one of (-1, 0, 1).

Perhaps I am missing some subtleties in the original construction
you wrote?  You leave repo->is_bare_cfg unset when is_bare parameter
explicitly says "false", which I suspect might be related to the
source of confusion I am having.

> +int repo_is_bare(struct repository *repo)
> +{
> +	/* if core.bare is not 'false', let's see if there is a work tree */
> +	return repo->is_bare_cfg && !repo_get_work_tree(repo);
> +}

The curiosity we saw in the original implementation of
is_bare_repository() above is faithfully reproduced, which is good.

> diff --git a/repository.h b/repository.h
> index 24a66a496a6..c243653492b 100644
> --- a/repository.h
> +++ b/repository.h
> @@ -153,6 +153,14 @@ struct repository {
>  
>  	/* Indicate if a repository has a different 'commondir' from 'gitdir' */
>  	unsigned different_commondir:1;
> +
> +	/*
> +	 * Indicates if the repository is set to be treated as a bare repository,
> +	 * through a command line argument, configuration, or environment
> +	 * variable.
> +	 * -1 means unspecified, 0 indicates non-bare, and 1 indicates bare.
> +	 */
> +	int is_bare_cfg;
>  };

I am very happy with the above phrasing.  The member tells us what
the repo is "set to be treated as", which implies that the code does
a bit more on top of that setting.

> diff --git a/scalar.c b/scalar.c
> index ac0cb579d3f..c2ec1f3e745 100644
> --- a/scalar.c
> +++ b/scalar.c
> @@ -722,7 +722,7 @@ static int cmd_reconfigure(int argc, const char **argv)
>  
>  		git_config_clear();
>  
> -		if (repo_init(&r, gitdir.buf, commondir.buf))
> +		if (repo_init(&r, gitdir.buf, commondir.buf, the_repository->is_bare_cfg))
>  			goto loop_end;

Given this caller, if is_bare_cfg of the current state says "I am
set to be treated as a non-bare repository" by having 0, shouldn't
repo_init() copy it to the repository at r?  IOW, this yells at me
saying that repo_init() patch we saw earlier is somewhat buggy.

> diff --git a/setup.c b/setup.c
> index 7b648de0279..6bc4aef3a8b 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -766,8 +766,8 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
>  
>  	if (!has_common) {
>  		if (candidate->is_bare != -1) {
> -			is_bare_repository_cfg = candidate->is_bare;
> -			if (is_bare_repository_cfg == 1)
> +			the_repository->is_bare_cfg = candidate->is_bare;
> +			if (the_repository->is_bare_cfg == 1)
>  				inside_work_tree = -1;

OK, this is as expected.

All other hunks to this file, except for the last one, follow the
same pattern as init-db and config to access the member of the
singleton struct instead of global variable.  And the last one is to
call repo_is_bare(the_repository) instead of is_bare_repository().

Makes sense.

> diff --git a/transport.c b/transport.c
> index 47fda6a7732..d72b8380846 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -1428,7 +1428,7 @@ int transport_push(struct repository *r,
>  
>  	if ((flags & (TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND |
>  		      TRANSPORT_RECURSE_SUBMODULES_ONLY)) &&
> -	    !is_bare_repository()) {
> +	    !repo_is_bare(r)) {
>  		struct ref *ref = remote_refs;
>  		struct oid_array commits = OID_ARRAY_INIT;
>  
> @@ -1455,7 +1455,7 @@ int transport_push(struct repository *r,
>  	if (((flags & TRANSPORT_RECURSE_SUBMODULES_CHECK) ||
>  	     ((flags & (TRANSPORT_RECURSE_SUBMODULES_ON_DEMAND |
>  			TRANSPORT_RECURSE_SUBMODULES_ONLY)) &&
> -	      !pretend)) && !is_bare_repository()) {
> +	      !pretend)) && !repo_is_bare(r)) {
>  		struct ref *ref = remote_refs;
>  		struct string_list needs_pushing = STRING_LIST_INIT_DUP;
>  		struct oid_array commits = OID_ARRAY_INIT;

OK, these are better than the mechanical "singleton the_repository
is the new home for the global".  It makes it even more important
for us to answer "Who initializes the repository r?  Is is_bare_cfg
initialized to -1 just like the_repo.is_bare_cfg is?  Is repo_init()
doing the right thing to update it by doing only when it is set to 1
but ignoring -1 and 0 as incoming parameter?" questions, though.

> diff --git a/worktree.c b/worktree.c
> index 77ff484d3ec..c9d5b228959 100644
> --- a/worktree.c
> +++ b/worktree.c
> @@ -85,8 +85,8 @@ static struct worktree *get_main_worktree(int skip_reading_head)
>  	 * This means that worktree->is_bare may be set to 0 even if the main
>  	 * worktree is configured to be bare.
>  	 */
> -	worktree->is_bare = (is_bare_repository_cfg == 1) ||
> -		is_bare_repository();
> +
> +	worktree->is_bare = the_repository->is_bare_cfg == 1;

If this changes the behaviour subtly without explaining, it needs to
be justified, I suspect.

We used to pay attention to what is_bare_repository() says, which is
a bit more than "set to bbe treated as" with config.  We no longer
do so, and also unconfigured case is always treated as non-bare.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, shejialuo wrote (reply to this):

On Thu, Nov 07, 2024 at 02:46:15PM +0900, Junio C Hamano wrote:

[snip]

> >  int repo_init(struct repository *repo,
> >  	      const char *gitdir,
> > -	      const char *worktree)
> > +	      const char *worktree,
> > +	      int is_bare)
> >  {
> >  	struct repository_format format = REPOSITORY_FORMAT_INIT;
> >  	memset(repo, 0, sizeof(*repo));
> > @@ -283,6 +288,8 @@ int repo_init(struct repository *repo,
> >  	repo_set_compat_hash_algo(repo, format.compat_hash_algo);
> >  	repo_set_ref_storage_format(repo, format.ref_storage_format);
> >  	repo->repository_format_worktree_config = format.worktree_config;
> > +	if (is_bare > 0)
> > +		repo->is_bare_cfg = is_bare;
> 
> When repo_init() is called with anything other than &the_repo, who
> initializes repo->is_bare_cfg?

I also want to ask this question. Actually, I feel quite strange about
why we need to add a new parameter `is_bare` to `repo_init` function.

For this call:

    repo_init(the_repository, git_dir, work_tree, -1);

We add a new field "is_bare_cfg" to the "struct repository". So, at now,
`the_repository` variable should contain the information about whether
the repo is bare(1), is not bare(0) or unknown(-1). However, in this
call, we pass "-1" to the parameter `is_bare` for "repo_init" function.

When I first look at this code, I have thought that we will set
"repo->is_bare_cfg = -1" to indicate that we cannot tell whether the
repo is bare or not. But it just sets the "repo->is_bare_cfg = is_bare"
if `bare > 0`. Junio has already commented on this.

This raises a question: why we need to set up `is_bare_cfg` in the
`repo_init` function? I guess this is because we need to set up other
"struct repository" parameter like the following:

    if (repo_init(&alternate, sb.buf, NULL, the_repository->is_bare_cfg) < 0)

And I think it's better for us to use the following way.

    alternate->is_bare_cfg = the_repository->is_bare_cfg;
    if (repo_init(&alternate, sb.buf, NULL))

And we may create a function called `repo_copy_settings` to set up the
common setting inherited from an existing repo:

    repo_copy_settings(alternate, the_repository);
    if (repo_init(&alternate, sb.buf, NULL))

I agree that we could put `is_bare_cfg` to "struct repository *". But I
don't agree with the idea that we need to pass `is_bare` to `repo_init`.
I think we should know whether the repo is bare or not before calling
`repo_init`. And from my understanding, this is what we are doing now.

Also, I think we may add a enum type instead of using (-1, 0, 1).
(However, this is not the main point of this patch).

Thanks,
Jialuo

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

shejialuo <shejialuo@gmail.com> writes:

> I also want to ask this question. Actually, I feel quite strange about
> why we need to add a new parameter `is_bare` to `repo_init` function.
>
> For this call:
>
>     repo_init(the_repository, git_dir, work_tree, -1);
>
> We add a new field "is_bare_cfg" to the "struct repository". So, at now,
> `the_repository` variable should contain the information about whether
> the repo is bare(1), is not bare(0) or unknown(-1). However, in this
> call, we pass "-1" to the parameter `is_bare` for "repo_init" function.

Isn't this merely trying to be faithful to the original to avoid
unintended behaviour change?  We initialize the global variable
is_bare_repository_cfg to unspecified(-1) in the original, and
for a rewrite to move the global to a member in the singleton
instance of the_repo, it would need to be able to do the same.

And for callers of repo_init() that prepares _another_ in-core
repository instance, which is different from the_repository, because
the original has a process-wide singleton global variable, copying
the value from the_repository->is_bare to a newly initialized one
would hopefully give us the most faithful rewrite to avoid
unintended behaviour change.

At least, that is how I understood why the patch does it this way.
As you noticed, too, there are ...

> When I first look at this code, I have thought that we will set
> "repo->is_bare_cfg = -1" to indicate that we cannot tell whether the
> repo is bare or not. But it just sets the "repo->is_bare_cfg = is_bare"
> if `bare > 0`. Junio has already commented on this.

... places in the updated code that makes it unclear what the
is_bare member really means.  The corresponding global variable used
to be "this is what we were told by config or env or command line",
but it is unclear, with conditional assignments like the above, what
it means in the updated code.

Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, shejialuo wrote (reply to this):

On Fri, Nov 08, 2024 at 10:24:31AM +0900, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > I also want to ask this question. Actually, I feel quite strange about
> > why we need to add a new parameter `is_bare` to `repo_init` function.
> >
> > For this call:
> >
> >     repo_init(the_repository, git_dir, work_tree, -1);
> >
> > We add a new field "is_bare_cfg" to the "struct repository". So, at now,
> > `the_repository` variable should contain the information about whether
> > the repo is bare(1), is not bare(0) or unknown(-1). However, in this
> > call, we pass "-1" to the parameter `is_bare` for "repo_init" function.
> 
> Isn't this merely trying to be faithful to the original to avoid
> unintended behaviour change?  We initialize the global variable
> is_bare_repository_cfg to unspecified(-1) in the original, and
> for a rewrite to move the global to a member in the singleton
> instance of the_repo, it would need to be able to do the same.
> 
> And for callers of repo_init() that prepares _another_ in-core
> repository instance, which is different from the_repository, because
> the original has a process-wide singleton global variable, copying
> the value from the_repository->is_bare to a newly initialized one
> would hopefully give us the most faithful rewrite to avoid
> unintended behaviour change.
> 

Yes, I agree that this is the most faithful way to make sure the
consistency when we want to create a new `repo` instead of letting the
caller do this itself.

So, I think what I feel strange is that we need to do this assignment.
Because we make a global variable not global by incorporating this into
"struct repository *", we have to maintain this state whenever we create
a new "repo".

It lets me think whether we should place "is_bare_cfg" into "struct
repository" in the first place. I will explain why in the later
comments.

> At least, that is how I understood why the patch does it this way.
> As you noticed, too, there are ...
> 
> > When I first look at this code, I have thought that we will set
> > "repo->is_bare_cfg = -1" to indicate that we cannot tell whether the
> > repo is bare or not. But it just sets the "repo->is_bare_cfg = is_bare"
> > if `bare > 0`. Junio has already commented on this.
> 
> ... places in the updated code that makes it unclear what the
> is_bare member really means.  The corresponding global variable used
> to be "this is what we were told by config or env or command line",
> but it is unclear, with conditional assignments like the above, what
> it means in the updated code.
> 

Yes, John has changed the corresponding code paths by setting the global
variable "the_repository->is_bare_cfg". So, we will refactor this later.

In the previous days, Kousik wanted to make "builtin/mailinfo" not to
reply on "the_repository". I have commented in

    https://lore.kernel.org/git/Zw6SsUyZ0oA0XqMK@ArchLinux/

In this thread, I do not agree that we should not incorporate the global
variables in "git_commit_encoding" and "git_log_output_encoding" in
"environment.c" into "struct repository *" because we could use these
two configs outside of the repo.

So, I don't think it's a good idea to put into "is_bare_cfg" into
"struct repository". Put it further more, we should not put the global
variables in "struct repository" structure for the following reasons:

  1. These variables are used across the whole lifecycle. Not just only
     related to the repository. Some variables could be used outside of
     the repo.
  2. Currently, the config functions which set up these variables don't
     have parameters to access the "struct repository *". Of course, we
     could add the parameter, but as 1 shows, some variables could be
     used outside of the repo. We may need many efforts for such
     situation.
  3. We need to maintain the consistency if we create a new "struct
     repository", because we will make global variables not global.

So, in my perspective, we may just create a new structure called "struct
env" to incorporate all these variables in "environment.c" just like
what we have done for "struct repository *". But we also introduced
another overhead, we may pass this structure to every function when
setting up.

> Thanks.

Thanks,
Jialuo

@@ -741,6 +741,7 @@ static int check_repository_format_gently(const char *gitdir, struct repository_

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: John Cai <jcai@gitlab.com>
>
> A subsequent commit will BUG() when the is_bare_cfg member is
> uninitialized. Since there are still some codepaths that initializing the
> is_bare_cfg variable, initialize them.
>
> Signed-off-by: John Cai <johncai86@gmail.com>
> ---
>  setup.c | 7 +++++++
>  1 file changed, 7 insertions(+)

I am not sure about the wisdom of this step (and the next one).
Before this step, it used to be that the global variable (or
the_repository->is_bare_cfg) can be inspected to see if there is an
explicit "set to be treated as", or nobody told us if the repository
ought to be bare (or not).  With this and the next step, that is no
longer possible, yet we still do the "core.bare says it is either
true or unconfigured, so we ask repo_get_work_tree() and it returns
NULL, so it is bare", which feels awfully inconsistent.  Especially
the change from the next patch

> diff --git a/repository.c b/repository.c
> index 96608058b61..cd1d59ea1b9 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -464,5 +464,7 @@ int repo_hold_locked_index(struct repository *repo,
>  int repo_is_bare(struct repository *repo)
>  {
>  	/* if core.bare is not 'false', let's see if there is a work tree */
> +	if (repo->is_bare_cfg < 0 )
> +		BUG("is_bare_cfg unspecified");
>  	return repo->is_bare_cfg && !repo_get_work_tree(repo);
>  }

the returned value does not make much sense anymore.  One half of it
used to be "if not configured, we can ask if there is worktree and
the lack of one by definition means we are bare", which made perfect
sense, but now what remains is "the configuration says it is, but
when we ask if there is a worktree, there is, so it is not bare
after all", which is somewhat dubious.

And if the goal of steps 2 & 3 is to redefine what is_bare_cfg means
and make it "this is the only thing we need to check if the
repository is bare" (which by itself is not a bad thing), shouldn't
the checking of worktree be done where the code assigns true to the
repo->is_bare_cfg, no?

> diff --git a/setup.c b/setup.c
> index 6bc4aef3a8b..5680976c598 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -741,6 +741,7 @@ static int check_repository_format_gently(const char *gitdir, struct repository_
>  
>  	if (verify_repository_format(candidate, &err) < 0) {
>  		if (nongit_ok) {
> +			the_repository->is_bare_cfg = 1;

It is unclear how we can be certain that we are looking at a bare
repository in this case.  We do not even understand the repository
format, GIT_DIR we were given to decide which file called "config"
may not even be a repository.  We are losing a bit of information
(i.e. nobody has told us if we ought to treat the repository as a
bare one, or a non-bare one") by overriding the value here.

> @@ -1017,6 +1018,7 @@ static const char *setup_explicit_git_dir(const char *gitdirenv,
>  		if (nongit_ok) {
>  			*nongit_ok = 1;
>  			free(gitfile);
> +			the_repository->is_bare_cfg = 0;
>  			return NULL;

Ditto.

> @@ -1069,6 +1071,7 @@ static const char *setup_explicit_git_dir(const char *gitdirenv,
>  
>  	/* set_git_work_tree() must have been called by now */
>  	worktree = repo_get_work_tree(the_repository);
> +	the_repository->is_bare_cfg = 0;

What if worktree is NULL?  Wouldn't it be more meaningful to say
is_bare_cfg is true in such a case?

> @@ -1125,6 +1128,9 @@ static const char *setup_discovered_git_dir(const char *gitdir,
>  
>  	/* #0, #1, #5, #8, #9, #12, #13 */
>  	set_git_work_tree(".");
> +
> +	if (the_repository->is_bare_cfg < 0)
> +		the_repository->is_bare_cfg = 0;

OK.  We did discovery, is_bare_cfg did not say true (it would have
returned before we got here if is_bare_cfg were set to true).  We
decided to treat the current directory as the top of the working tree,
so by definition, we are not treating the repository as bare.

But this makes me wonder what should happen
the_repository->is_bare_cfg is already set to true.  Shouldn't that
be a BUG()?

> @@ -1767,6 +1773,7 @@ const char *setup_git_directory_gently(int *nongit_ok)
>  			die(_("not a git repository (or any of the parent directories): %s"),
>  			    DEFAULT_GIT_DIR_ENVIRONMENT);
>  		*nongit_ok = 1;
> +		the_repository->is_bare_cfg = 1;

This is not bare nor non-bare---simply we did not find any usable
git repository, and we lose the single bit of information "nobody
told us to treat the repository as bare or non-bare".

Not that the loss of information is a huge deal.  But having to make
an arbitrary choice like the above (and similar ones in previous
hunks where we didn't have any repository to begin with) is an
indication that the entire "is_bare_cfg must mean if our repository
is bare or non-bare" premise patch 3/3 wants to enforce may be
misguided, I am afraid.

>  		break;
>  	case GIT_DIR_HIT_MOUNT_POINT:
>  		if (!nongit_ok)

Copy link

This patch series was integrated into seen via 4013ec9.

@gitgitgadget-git gitgitgadget-git bot added the seen label Nov 7, 2024
Copy link

User shejialuo <shejialuo@gmail.com> has been added to the cc: list.

Copy link

This patch series was integrated into seen via cf16a00.

Copy link

This patch series was integrated into seen via 43c4349.

Copy link

This patch series was integrated into seen via 9babb06.

Copy link

This patch series was integrated into seen via 5739e7d.

Copy link

This patch series was integrated into seen via 63c999f.

Copy link

This patch series was integrated into seen via 0bd9e19.

Copy link

This patch series was integrated into seen via f638992.

Copy link

This patch series was integrated into seen via ae1e149.

Copy link

This patch series was integrated into seen via 4f6b5f1.

Copy link

This patch series was integrated into seen via e2c9d75.

Copy link

This patch series was integrated into seen via 2ffdeee.

Copy link

This patch series was integrated into seen via 4c9e2ba.

Copy link

This patch series was integrated into seen via 19b0ea3.

Copy link

This patch series was integrated into seen via 206a680.

Copy link

This patch series was integrated into seen via efbbd8c.

Copy link

This patch series was integrated into seen via 6c507d3.

Copy link

On the Git mailing list, Junio C Hamano wrote (reply to this):

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This patch series removes the global state introduced by the
> is_bare_repository_cfg variable by moving it into the repository struct.
> Most of the refactor is done by patch 1. Patch 2 initializes the member in
> places that left it unInitialized, while patch 3 adds a safety measure by
> BUG()ing when the variable has not been properly initialized.

I think these patches go in the right direction in general, but the
topic hasn't seen much activity for a few weeks since they received
review messages.  Is a new revision being worked on, or is the topic
being backburnered?

Thanks.

Copy link

This patch series was integrated into seen via 10454b4.

Copy link

This patch series was integrated into seen via f7c4c41.

Copy link

This patch series was integrated into seen via 62cee9f.

Copy link

This patch series was integrated into seen via e159ae2.

Copy link

This patch series was integrated into seen via 269a0bf.

Copy link

This patch series was integrated into seen via f713e1c.

Copy link

This patch series was integrated into seen via 63617d3.

Copy link

This patch series was integrated into seen via 4e74555.

@BernaUschi BernaUschi mentioned this pull request Dec 4, 2024
Copy link

This patch series was integrated into seen via 6de2fa4.

Copy link

This patch series was integrated into seen via da95eaf.

@a0983627353

This comment was marked as off-topic.

Copy link

This patch series was integrated into seen via d7d4df0.

Copy link

This patch series was integrated into seen via 963a1dc.

Copy link

This patch series was integrated into seen via b445e3d.

Copy link

This patch series was integrated into seen via cbae1fe.

Copy link

This patch series was integrated into seen via 656b03d.

Copy link

This patch series was integrated into seen via b3f3f60.

Copy link

This patch series was integrated into seen via 6b7b494.

Copy link

This patch series was integrated into seen via 2819ac8.

Copy link

On the Git mailing list, Junio C Hamano wrote (reply to this):

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This patch series removes the global state introduced by the
> is_bare_repository_cfg variable by moving it into the repository struct.
> Most of the refactor is done by patch 1. Patch 2 initializes the member in
> places that left it unInitialized, while patch 3 adds a safety measure by
> BUG()ing when the variable has not been properly initialized.
>
> John Cai (3):
>   git: remove is_bare_repository_cfg global variable
>   setup: initialize is_bare_cfg
>   repository: BUG when is_bare_cfg is not initialized

We've been seeing a job "win test (5)" fail on 'seen' for a while,
and I happened to have rebuilt 'seen' without this topic (first by
accident) and the job started passing.

The topic coming from GGG, I'd assume that it byitself will pass the
tests (including Windows ones), so I suspect it is some interaction
with other topics in 'seen'.

As I do not have Windows environment to test and dig into any
problem, often pushing 'seen' with suspect topic(s) removed is the
only way for me to isolate which topic might be causing a problem,
and after doing so, I'll have to leave it up to the author of the
topic to dig further with help from others.

(failing) https://github.com/git/git/actions/runs/12279217687/job/34263221584
(passing) https://github.com/git/git/actions/runs/12286174648/job/34286039276

The difference between these is that the former (failing) one has
this topic with three patches merged at the tip of 'seen', and the
latter (passing) one is the result of tentatively dropping this
topic from the CI run.

Thanks.

Copy link

This branch is now known as jc/move-is-bare-repository-cfg-variable-to-repo.

Copy link

There was a status update in the "Cooking" section about the branch jc/move-is-bare-repository-cfg-variable-to-repo on the Git mailing list:

Code rewrite to turn the is_bare_repository_cfg global variable
into a member in the the_repo singleton repository object.

Waiting for response to reviews.
cf. <xmqqy116xvr3.fsf@gitster.g>
Seems to break t0021-conversion on Windows.
cf. https://lore.kernel.org/git/xmqqzfl1hl52.fsf@gitster.g/
source: <pull.1826.git.git.1730926082.gitgitgadget@gmail.com>

Copy link

There was a status update in the "Cooking" section about the branch jc/move-is-bare-repository-cfg-variable-to-repo on the Git mailing list:

Code rewrite to turn the is_bare_repository_cfg global variable
into a member in the the_repo singleton repository object.

Waiting for response to reviews.
cf. <xmqqy116xvr3.fsf@gitster.g>
Seems to break t0021-conversion on Windows.
cf. https://lore.kernel.org/git/xmqqzfl1hl52.fsf@gitster.g/
source: <pull.1826.git.git.1730926082.gitgitgadget@gmail.com>

Copy link

There was a status update in the "Cooking" section about the branch jc/move-is-bare-repository-cfg-variable-to-repo on the Git mailing list:

Code rewrite to turn the is_bare_repository_cfg global variable
into a member in the the_repo singleton repository object.

Waiting for response to reviews.
cf. <xmqqy116xvr3.fsf@gitster.g>
Seems to break t0021-conversion on Windows.
cf. https://lore.kernel.org/git/xmqqzfl1hl52.fsf@gitster.g/
source: <pull.1826.git.git.1730926082.gitgitgadget@gmail.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants