ValueError raised in pandas when lazy validating DataFrame with MultiIndexed Columns

**Describe the bug**

A `ValueError` is raised in pandas when a `pandas.DataFrame` object with MultiIndexed Columns is lazily validated (using the parameter `lazy=True`) by a `pandera.DataFrameSchema` object, and there is at least one failed check for the columns.

Running the code below, the following exception is raised:

```
Traceback (most recent call last):
  line 18, in <module>
    print(schema.validate(df, lazy=True))
  File "Y:\Python39\lib\site-packages\pandera\schemas.py", line 613, in validate
    raise errors.SchemaErrors(
  File "Y:\Python39\lib\site-packages\pandera\errors.py", line 87, in __init__
    error_counts, failure_cases = self._parse_schema_errors(schema_errors)
  File "Y:\Python39\lib\site-packages\pandera\errors.py", line 172, in _parse_schema_errors
    failure_cases = err.failure_cases.assign(
  File "Y:\Python39\lib\site-packages\pandas\core\frame.py", line 3699, in assign
    data[k] = com.apply_if_callable(v, data)
  File "Y:\Python39\lib\site-packages\pandas\core\frame.py", line 3044, in __setitem__
    self._set_item(key, value)
  File "Y:\Python39\lib\site-packages\pandas\core\frame.py", line 3120, in _set_item
    value = self._sanitize_column(key, value)
  File "Y:\Python39\lib\site-packages\pandas\core\frame.py", line 3768, in _sanitize_column
    value = sanitize_index(value, self.index)
  File "Y:\Python39\lib\site-packages\pandas\core\internals\construction.py", line 747, in sanitize_index
    raise ValueError(
ValueError: Length of values (2) does not match length of index (1)
```

Checking the line 172 in `errors.py` in `pandera`, i.e.

```python
failure_cases = err.failure_cases.assign(
                    schema_context=err.schema.__class__.__name__,
                    check=check_identifier,
                    check_number=err.check_index,
                    column=column,
                )
```

It could be seen that the MultiIndexed Column with the name `("foo", "baz")` , which has the type `tuple`, would not be interpreted as a single value by `pandas`, which then failed to be broadcasted to `err.failure_cases` and causing the `ValueError` from `pandas` during the `assign` method call.

- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the latest version of pandera.
- [ ] (optional) I have confirmed this bug exists on the master branch of pandera.

#### Code Sample, a copy-pastable example

```python
import pandas as pd
from pandera import Column, DataFrameSchema

schema = DataFrameSchema({
    ("foo", "bar"): Column(int),
    ("foo", "baz"): Column(int)
})

df = pd.DataFrame({
    ("foo", "bar"): [1, 2, 3],
    ("foo", "baz"): ["a", "b", "c"],
})

print(schema.validate(df, lazy=True))
```

#### Expected behavior

A `pandera.SchemasError` should be raised with the type mismatch on the column `("foo", "baz")` logged, which has the value `("foo", "baz")` in the column **Column**.

#### Desktop:

 - OS: Windows 10
 - Version: Python 3.9.0, with pandera 0.7.0, pandas 1.1.4 installed

#### Additional context
If we change the code above to

```python
import pandas as pd
from pandera import Column, DataFrameSchema, Check

schema = DataFrameSchema({
    ("foo", "bar"): Column(int, checks=Check(lambda s: s == 1)),
    ("foo", "baz"): Column(str, name="b")
})

df = pd.DataFrame({
    ("foo", "bar"): [1, 2, 3],
    ("foo", "baz"): ["a", "b", "c"],
})

try:
    schema.validate(df, lazy=True)
except Exception as e:
    print(e.failure_cases)
```

The output would be

```
  schema_context column     check  check_number  failure_case  index
0         Column    foo  <lambda>             0             2      1
1         Column    bar  <lambda>             0             3      2
```

which shows that the column name `("foo", "bar")` is incorrectly interpreted as a `pandas.Series`-like object and treated as a column object when calling the method `assign` from `err.failure_cases`.

#### Potential Fix

A band-aid fix would be manually broadcast the input for the column **Column** before assigning the column to `err.failure_cases`, i.e. 

```python
failure_cases = err.failure_cases.assign(
                    schema_context=err.schema.__class__.__name__,
                    check=check_identifier,
                    check_number=err.check_index,
                    column=[column] * len(err.failure_cases),
                )
```

which seems to have fixed the problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError raised in pandas when lazy validating DataFrame with MultiIndexed Columns #589

Code Sample, a copy-pastable example

Expected behavior

Desktop:

Additional context

Potential Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development