Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enh : Create physical string descriptor to be used with allocatable strings #4918

Merged
merged 26 commits into from
Oct 31, 2024

Conversation

assem2002
Copy link
Contributor

@assem2002 assem2002 commented Sep 26, 2024

Towards #2257
In separate PR I'll utilize size and capacity in lfortran_alloc() , lfortran_strcpy(), lfortran_string_write()

@assem2002 assem2002 force-pushed the descriptor_string_FEEDBACK branch from 06b154f to fba28db Compare September 26, 2024 21:42
@assem2002 assem2002 marked this pull request as ready for review September 26, 2024 23:27
@assem2002 assem2002 changed the title enh : Create physical string descriptor to be use with allocatable strings enh : Create physical string descriptor to be used with allocatable strings Sep 26, 2024
@gxyd
Copy link
Contributor

gxyd commented Sep 27, 2024

Can you please help me understand the below ASR changes? (I've put in comments in ASR itself for what my question is)

On your branch, for the below program:

program main
    implicit none
    character(len=:), allocatable :: str1
    character(len=:), allocatable :: str2(:)
end program main

below is the ASR generated using lfortran test.f90 --show-asr:

(TranslationUnit
    (SymbolTable
        1
        {
            main:
                (Program
                    (SymbolTable
                        2
                        {
                            str1:
                                (Variable
                                    2
                                    str1
                                    []
                                    Local
                                    ()
                                    ()
                                    Default
                                    (Allocatable
                                        (Character 1 -2 () DescriptorString)  ;; this seems alright to me
                                    )
                                    ()
                                    Source
                                    Public
                                    Required
                                    .false.
                                ),
                            str2:
                                (Variable
                                    2
                                    str2
                                    []
                                    Local
                                    ()
                                    ()
                                    Default
                                    (Allocatable
                                        (Array
                                            (Character 1 -2 () PointerString)   ;; is this expected to be DescriptorArray or PointerString?
                                            [(()
                                            ())]
                                            DescriptorArray
                                        )
                                    )
                                    ()
                                    Source
                                    Public
                                    Required
                                    .false.
                                )
                        })
                    main
                    []
                    [(ImplicitDeallocate
                        [(Var 2 str1)
                        (Var 2 str2)]
                    )]
                )
        })
    []
)

@@ -196,6 +196,12 @@ static inline ASR::ttype_t* type_get_past_array_pointer_allocatable(
type_get_past_pointer(f)));
}

static inline bool is_allocatable_string(ASR::ttype_t* t){
return ASR::is_a<ASR::Character_t>(*ASRUtils::type_get_past_array_pointer_allocatable(t)) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also maybe put an assert that it's an Allocatable?

Copy link
Contributor Author

@assem2002 assem2002 Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's much safer. I'll do that.
Do you think We should add such an assertion in ASR verify? to make sure any character type with physicalType = DescriptorString is an allocatable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think We should add such an assertion in ASR verify? to make sure any character type with physicalType = DescriptorString is an allocatable

Yes, that would be nice. If my understanding is right, that should be the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes definitely put this into verify(). All "assumptions" should be checked by verify().

src/libasr/codegen/asr_to_llvm.cpp Outdated Show resolved Hide resolved
src/libasr/codegen/asr_to_llvm.cpp Outdated Show resolved Hide resolved
src/libasr/codegen/asr_to_llvm.cpp Outdated Show resolved Hide resolved
@assem2002
Copy link
Contributor Author

assem2002 commented Sep 27, 2024

(Allocatable
                                        (Array
                                            (Character 1 -2 () PointerString)   ;; is this expected to be DescriptorArray or PointerString?
                                            [(()
                                            ())]
                                            DescriptorArray
                                        )
                                    )
                                    ()

It's a design detail. but I didn't see a benefit for size and capacity here as you would do something like that to allocate the array allocate(character(2) :: str2(2) ) which allocates a an array of 2 strings and each string is allocated to be of fixed length 2 and not to be extended unless you reallocate the array with new string length. If we use a DescriptorString instead of PointerString, I think it won't benefit us in terms of performance.

@certik
Copy link
Contributor

certik commented Sep 27, 2024

@assem2002 let me know when this is ready for review from me.

@assem2002
Copy link
Contributor Author

assem2002 commented Sep 27, 2024

@certik it's ready. the new physical type works, I'll need to utilize size and capacity in another PR.
maybe you can wait till I finish some of @gxyd suggestions and resolve the conflicts.

@certik
Copy link
Contributor

certik commented Sep 27, 2024

Go ahead and resolve the conflicts.

@certik
Copy link
Contributor

certik commented Sep 27, 2024

Ok, so there is one fundamental problem here that we need to fix. Consider, let's say, this test: https://github.com/assem2002/lfortran/edit/descriptor_string_FEEDBACK/tests/reference/asr-string_04-2ebc0e6.stdout?pr=%2Flfortran%2Flfortran%2Fpull%2F4918

In here, we have the str variable, declared like this:

                            str:
                                (Variable
                                    2
                                    str
                                    []
                                    Local
                                    ()
                                    ()
                                    Default
                                    (Allocatable
                                        (Character 1 -2 () DescriptorString)
                                    )
                                    ()
                                    Source
                                    Public
                                    Required
                                    .false.
                                ),

And then we assign to it like this:

                    (Assignment
                        (Var 2 str)
                        (StringConstant
                            "abcd"
                            (Character 1 4 () PointerString)
                        )
                        ()
                    )

The types here do not match. The LHS of the assignment is of type (Allocatable (Character 1 -2 () DescriptorString)), the RHS is of type (Character 1 4 () PointerString), these are fundamentally not compatible types. There is one issue with the Allocatable handling, let's ignore that for now, I think we allow the mismatch (LHS can be Allocatable, while RHS is not), that's a separate issue. But the types themselves are (Character 1 -2 () DescriptorString) and (Character 1 4 () PointerString) and these are not compatible. The -2 is compatible with 4, but DescriptorString is not compatible with PointerString. You can't just assign it. You have to insert a cast from a PointerString to DescriptorString.

So there are two issues to fix:

  • check in verify() that the types are compatible and that LHS and RHS always agrees with the physical type. Including passing strings to functions.
  • And then fix the PR to pass verify() by inserting casts.

@assem2002
Copy link
Contributor Author

assem2002 commented Sep 27, 2024

@certik so the key goal here is to make clean ASR, right? as using assignment statement with incompatible sides aren't an issue in the backend we just use runtime function lfortran_strcpy(char** lhs, char* rhs, int64 size_lhs[if descriptorString], int64 capacity_lhs[if descriptorString]). and we usually fetch the char* out of the stringDescriptor to be used freely in the codebase and with the runtime function unless we encounter lfortran_alloc() , lfortran_strcpy(), lfortran_string_write() which require fetching size and capacity to be extended.

@certik
Copy link
Contributor

certik commented Sep 27, 2024

Clean ASR and clean backend. When converting descriptor to pointer, you just grab the pointer, but you do it in the visit_cast in the backend (when casting from pointer to descriptor, you need to create a descriptor, in the visit_cast). So in assignment you only assign either two pointers or two descriptors, which is straightforward.

@assem2002
Copy link
Contributor Author

assem2002 commented Sep 28, 2024

Yeah I think explicit ASR casting is better way to to assignments and argument passing. This may even can get rid of the visit_expr_wrapper changes that Gaurav and I were talking about above.
So..

  • I'll make casting node that's similar to how array cast node is structured.
  • I'll do casting when needed. AST To ASR should be sufficient to catch the needed places where I should cast, right?

@certik
Copy link
Contributor

certik commented Sep 28, 2024

Yes, exactly.

AST To ASR should be sufficient to catch the needed places where I should cast, right?

Yes.

@assem2002 assem2002 marked this pull request as draft September 28, 2024 20:54
@assem2002 assem2002 force-pushed the descriptor_string_FEEDBACK branch from efd1f1f to 929c571 Compare October 3, 2024 15:41
ASRUtils::type_get_past_array_pointer_allocatable(t))->m_physical_type == ASR::string_physical_typeType::DescriptorString;
}

static inline ASR::expr_t* cast_string_pointer_to_descriptor(Allocator& al, ASR::expr_t* string){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add documentation:

This functions casts if necessary.

@certik
Copy link
Contributor

certik commented Oct 3, 2024

Let's document in the String type in doc/... when each physical type is used. Currently:

  • if it is allocatalbe, it is DescriptorString
  • otherwise it is PointerString

@assem2002 assem2002 force-pushed the descriptor_string_FEEDBACK branch 3 times, most recently from 3457583 to 4123d10 Compare October 8, 2024 05:16
@assem2002 assem2002 marked this pull request as ready for review October 8, 2024 06:24
@assem2002
Copy link
Contributor Author

assem2002 commented Oct 8, 2024

The PR is ready for review. I'm pushing documentation, It'd be helpful to read it before reviewing this.

@gxyd
Copy link
Contributor

gxyd commented Oct 8, 2024

I'm sorry I forgot about this today, I'll take a look tomorrow morning.

@assem2002 assem2002 force-pushed the descriptor_string_FEEDBACK branch from d23cf94 to 63e376f Compare October 9, 2024 05:24

- **DescriptorString :**
- It's an LLVM struct to hold information about data, size and capacity.
- It's represented in LLVM IR as : `{char*, int64, int64}`. The benefit from having `size` and `capacity` is that it gives us the ability to have dynamic string that's not computationally expensive, just like `std::vector` in c++. So the key points are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- It's represented in LLVM IR as : `{char*, int64, int64}`. The benefit from having `size` and `capacity` is that it gives us the ability to have dynamic string that's not computationally expensive, just like `std::vector` in c++. So the key points are:
- It's represented in LLVM IR as : `{char*, int64, int64}`. The benefit from having `size` and `capacity` is that it gives us the ability to have dynamic string that's not computationally expensive, just like `std::string` (and `std::vector`) in c++. So the key points are:

- It's represented in LLVM IR as : `{char*, int64, int64}`. The benefit from having `size` and `capacity` is that it gives us the ability to have dynamic string that's not computationally expensive, just like `std::vector` in c++. So the key points are:
- `size` avoids calling `strlen` when we want to know the size of a string.
- `capacity` gives us the flexibility of extending the string without losing performance by doubling the memory location every time we run out of allocated memory space.
- DescriptorString must always be an allocatable.
Copy link
Contributor

@certik certik Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add here when in ASR the DescriptorString is (or should be) used, and when PointerString should be used. See #4918 (comment).

Currently:

  • if it is allocatable, it is DescriptorString
  • otherwise it is PointerString

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a fortran example would be nice as well, just the way we've it for StringPhysicalCast above (that was quite easy to understand)

@certik certik requested review from gxyd and Pranavchiku October 10, 2024 16:02
@certik
Copy link
Contributor

certik commented Oct 10, 2024

This is ready for review. This PR does one change: for allocatable strings it uses DescriptorString physical type now.

@gxyd, @Pranavchiku can you please carefully review this PR? At high level it looks good to me, we just need to check all details.

…g Types + minor change in string_section_assign
…not value (function return by value not pointer to avoid dangling pointer)
@assem2002 assem2002 force-pushed the descriptor_string_FEEDBACK branch from 7b2a524 to 5e64d08 Compare October 28, 2024 17:56
@assem2002
Copy link
Contributor Author

ready to review.

@certik
Copy link
Contributor

certik commented Oct 30, 2024

@gxyd, @Pranavchiku can you please review this?

Copy link
Contributor

@certik certik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is good. We can improve upon it later as needed.

@certik certik enabled auto-merge (squash) October 31, 2024 16:11
@certik certik merged commit edb2255 into lfortran:main Oct 31, 2024
36 checks passed
@gxyd
Copy link
Contributor

gxyd commented Nov 1, 2024

Sorry, I didn't get the time to review it.

@certik
Copy link
Contributor

certik commented Nov 1, 2024

No worries, we reviewed it at the phone call yesterday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants