forked from python/mypy
-
Notifications
You must be signed in to change notification settings - Fork 0
/
build.py
1878 lines (1600 loc) · 76.7 KB
/
build.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
"""Facilities to analyze entire programs, including imported modules.
Parse and analyze the source files of a program in the correct order
(based on file dependencies), and collect the results.
This module only directs a build, which is performed in multiple passes per
file. The individual passes are implemented in separate modules.
The function build() is the main interface to this module.
"""
# TODO: More consistent terminology, e.g. path/fnam, module/id, state/file
import binascii
import collections
import contextlib
import hashlib
import json
import os
import os.path
import sys
import time
from os.path import dirname, basename
from typing import (AbstractSet, Dict, Iterable, Iterator, List,
NamedTuple, Optional, Set, Tuple, Union, Mapping)
from mypy.types import Type
from mypy.nodes import (MypyFile, Node, Import, ImportFrom, ImportAll,
SymbolTableNode, MODULE_REF)
from mypy.semanal import FirstPass, SemanticAnalyzer, ThirdPass
from mypy.checker import TypeChecker
from mypy.indirection import TypeIndirectionVisitor
from mypy.errors import Errors, CompileError, DecodeError, report_internal_error
from mypy import fixup
from mypy.report import Reports
from mypy import defaults
from mypy import moduleinfo
from mypy import util
from mypy.fixup import fixup_module_pass_one, fixup_module_pass_two
from mypy.options import Options
from mypy.parse import parse
from mypy.stats import dump_type_stats
from mypy.version import __version__
# We need to know the location of this file to load data, but
# until Python 3.4, __file__ is relative.
__file__ = os.path.realpath(__file__)
PYTHON_EXTENSIONS = ['.pyi', '.py']
Graph = Dict[str, 'State']
class BuildResult:
"""The result of a successful build.
Attributes:
manager: The build manager.
files: Dictionary from module name to related AST node.
types: Dictionary from parse tree node to its inferred type.
errors: List of error messages.
"""
def __init__(self, manager: 'BuildManager') -> None:
self.manager = manager
self.files = manager.modules
self.types = manager.type_checker.type_map
self.errors = manager.errors.messages()
class BuildSource:
def __init__(self, path: Optional[str], module: Optional[str],
text: Optional[str]) -> None:
self.path = path
self.module = module or '__main__'
self.text = text
@property
def effective_path(self) -> str:
"""Return the effective path (ie, <string> if its from in memory)"""
return self.path or '<string>'
class BuildSourceSet:
"""Efficiently test a file's membership in the set of build sources."""
def __init__(self, sources: List[BuildSource]) -> None:
self.source_text_present = False
self.source_modules = set() # type: Set[str]
self.source_paths = set() # type: Set[str]
for source in sources:
if source.text is not None:
self.source_text_present = True
elif source.path:
self.source_paths.add(source.path)
else:
self.source_modules.add(source.module)
def is_source(self, file: MypyFile) -> bool:
if file.path and file.path in self.source_paths:
return True
elif file._fullname in self.source_modules:
return True
elif file.path is None and self.source_text_present:
return True
else:
return False
def build(sources: List[BuildSource],
options: Options,
alt_lib_path: str = None,
bin_dir: str = None) -> BuildResult:
"""Analyze a program.
A single call to build performs parsing, semantic analysis and optionally
type checking for the program *and* all imported modules, recursively.
Return BuildResult if successful or only non-blocking errors were found;
otherwise raise CompileError.
Args:
sources: list of sources to build
options: build options
alt_lib_path: an additional directory for looking up library modules
(takes precedence over other directories)
bin_dir: directory containing the mypy script, used for finding data
directories; if omitted, use '.' as the data directory
"""
data_dir = default_data_dir(bin_dir)
find_module_clear_caches()
# Determine the default module search path.
lib_path = default_lib_path(data_dir, options.python_version)
if options.use_builtins_fixtures:
# Use stub builtins (to speed up test cases and to make them easier to
# debug). This is a test-only feature, so assume our files are laid out
# as in the source tree.
root_dir = dirname(dirname(__file__))
lib_path.insert(0, os.path.join(root_dir, 'test-data', 'unit', 'lib-stub'))
else:
for source in sources:
if source.path:
# Include directory of the program file in the module search path.
dir = remove_cwd_prefix_from_path(dirname(source.path))
if dir not in lib_path:
lib_path.insert(0, dir)
# Do this even if running as a file, for sanity (mainly because with
# multiple builds, there could be a mix of files/modules, so its easier
# to just define the semantics that we always add the current director
# to the lib_path
lib_path.insert(0, os.getcwd())
# Add MYPYPATH environment variable to front of library path, if defined.
lib_path[:0] = mypy_path()
# If provided, insert the caller-supplied extra module path to the
# beginning (highest priority) of the search path.
if alt_lib_path:
lib_path.insert(0, alt_lib_path)
reports = Reports(data_dir, options.report_dirs)
source_set = BuildSourceSet(sources)
# Construct a build manager object to hold state during the build.
#
# Ignore current directory prefix in error messages.
manager = BuildManager(data_dir, lib_path,
ignore_prefix=os.getcwd(),
source_set=source_set,
reports=reports,
options=options,
version_id=__version__,
)
try:
dispatch(sources, manager)
return BuildResult(manager)
finally:
manager.log("Build finished in %.3f seconds with %d modules, %d types, and %d errors" %
(time.time() - manager.start_time,
len(manager.modules),
len(manager.type_checker.type_map),
manager.errors.num_messages()))
# Finish the HTML or XML reports even if CompileError was raised.
reports.finish()
def default_data_dir(bin_dir: Optional[str]) -> str:
"""Returns directory containing typeshed directory
Args:
bin_dir: directory containing the mypy script
"""
if not bin_dir:
mypy_package = os.path.dirname(__file__)
parent = os.path.dirname(mypy_package)
if (os.path.basename(parent) == 'site-packages' or
os.path.basename(parent) == 'dist-packages'):
# Installed in site-packages or dist-packages, but invoked with python3 -m mypy;
# __file__ is .../blah/lib/python3.N/site-packages/mypy/build.py
# or .../blah/lib/python3.N/dist-packages/mypy/build.py (Debian)
# or .../blah/lib/site-packages/mypy/build.py (Windows)
# blah may be a virtualenv or /usr/local. We want .../blah/lib/mypy.
lib = parent
for i in range(2):
lib = os.path.dirname(lib)
if os.path.basename(lib) == 'lib':
return os.path.join(lib, 'mypy')
subdir = os.path.join(parent, 'lib', 'mypy')
if os.path.isdir(subdir):
# If installed via buildout, the __file__ is
# somewhere/mypy/__init__.py and what we want is
# somewhere/lib/mypy.
return subdir
# Default to directory containing this file's parent.
return parent
base = os.path.basename(bin_dir)
dir = os.path.dirname(bin_dir)
if (sys.platform == 'win32' and base.lower() == 'scripts'
and not os.path.isdir(os.path.join(dir, 'typeshed'))):
# Installed, on Windows.
return os.path.join(dir, 'Lib', 'mypy')
elif base == 'scripts':
# Assume that we have a repo check out or unpacked source tarball.
return dir
elif base == 'bin':
# Installed to somewhere (can be under /usr/local or anywhere).
return os.path.join(dir, 'lib', 'mypy')
elif base == 'python3':
# Assume we installed python3 with brew on os x
return os.path.join(os.path.dirname(dir), 'lib', 'mypy')
elif dir.endswith('python-exec'):
# Gentoo uses a python wrapper in /usr/lib to which mypy is a symlink.
return os.path.join(os.path.dirname(dir), 'mypy')
else:
# Don't know where to find the data files!
raise RuntimeError("Broken installation: can't determine base dir")
def mypy_path() -> List[str]:
path_env = os.getenv('MYPYPATH')
if not path_env:
return []
return path_env.split(os.pathsep)
def default_lib_path(data_dir: str, pyversion: Tuple[int, int]) -> List[str]:
"""Return default standard library search paths."""
# IDEA: Make this more portable.
path = [] # type: List[str]
auto = os.path.join(data_dir, 'stubs-auto')
if os.path.isdir(auto):
data_dir = auto
# We allow a module for e.g. version 3.5 to be in 3.4/. The assumption
# is that a module added with 3.4 will still be present in Python 3.5.
versions = ["%d.%d" % (pyversion[0], minor)
for minor in reversed(range(pyversion[1] + 1))]
# E.g. for Python 3.2, try 3.2/, 3.1/, 3.0/, 3/, 2and3/.
# (Note that 3.1 and 3.0 aren't really supported, but we don't care.)
for v in versions + [str(pyversion[0]), '2and3']:
for lib_type in ['stdlib', 'third_party']:
stubdir = os.path.join(data_dir, 'typeshed', lib_type, v)
if os.path.isdir(stubdir):
path.append(stubdir)
# Add fallback path that can be used if we have a broken installation.
if sys.platform != 'win32':
path.append('/usr/local/lib/mypy')
return path
CacheMeta = NamedTuple('CacheMeta',
[('id', str),
('path', str),
('mtime', float),
('size', int),
('dependencies', List[str]), # names of imported modules
('data_mtime', float), # mtime of data_json
('data_json', str), # path of <id>.data.json
('suppressed', List[str]), # dependencies that weren't imported
('child_modules', List[str]), # all submodules of the given module
('options', Optional[Dict[str, bool]]), # build options
('dep_prios', List[int]),
('interface_hash', str), # hash representing the public interface
('version_id', str), # mypy version for cache invalidation
])
# NOTE: dependencies + suppressed == all reachable imports;
# suppressed contains those reachable imports that were prevented by
# --silent-imports or simply not found.
# Priorities used for imports. (Here, top-level includes inside a class.)
# These are used to determine a more predictable order in which the
# nodes in an import cycle are processed.
PRI_HIGH = 5 # top-level "from X import blah"
PRI_MED = 10 # top-level "import X"
PRI_LOW = 20 # either form inside a function
PRI_INDIRECT = 30 # an indirect dependency
PRI_ALL = 99 # include all priorities
class BuildManager:
"""This class holds shared state for building a mypy program.
It is used to coordinate parsing, import processing, semantic
analysis and type checking. The actual build steps are carried
out by dispatch().
Attributes:
data_dir: Mypy data directory (contains stubs)
lib_path: Library path for looking up modules
modules: Mapping of module ID to MypyFile (shared by the passes)
semantic_analyzer:
Semantic analyzer, pass 2
semantic_analyzer_pass3:
Semantic analyzer, pass 3
type_checker: Type checker
errors: Used for reporting all errors
options: Build options
missing_modules: Set of modules that could not be imported encountered so far
stale_modules: Set of modules that needed to be rechecked
version_id: The current mypy version (based on commit id when possible)
"""
def __init__(self, data_dir: str,
lib_path: List[str],
ignore_prefix: str,
source_set: BuildSourceSet,
reports: Reports,
options: Options,
version_id: str) -> None:
self.start_time = time.time()
self.data_dir = data_dir
self.errors = Errors(options.hide_error_context, options.show_column_numbers)
self.errors.set_ignore_prefix(ignore_prefix)
self.lib_path = tuple(lib_path)
self.source_set = source_set
self.reports = reports
self.options = options
self.version_id = version_id
self.semantic_analyzer = SemanticAnalyzer(lib_path, self.errors)
self.modules = self.semantic_analyzer.modules
self.semantic_analyzer_pass3 = ThirdPass(self.modules, self.errors)
self.type_checker = TypeChecker(self.errors, self.modules)
self.indirection_detector = TypeIndirectionVisitor()
self.missing_modules = set() # type: Set[str]
self.stale_modules = set() # type: Set[str]
self.rechecked_modules = set() # type: Set[str]
def maybe_swap_for_shadow_path(self, path: str) -> str:
if (self.options.shadow_file and
os.path.samefile(self.options.shadow_file[0], path)):
path = self.options.shadow_file[1]
return path
def get_stat(self, path: str) -> os.stat_result:
return os.stat(self.maybe_swap_for_shadow_path(path))
def all_imported_modules_in_file(self,
file: MypyFile) -> List[Tuple[int, str, int]]:
"""Find all reachable import statements in a file.
Return list of tuples (priority, module id, import line number)
for all modules imported in file; lower numbers == higher priority.
"""
def correct_rel_imp(imp: Union[ImportFrom, ImportAll]) -> str:
"""Function to correct for relative imports."""
file_id = file.fullname()
rel = imp.relative
if rel == 0:
return imp.id
if os.path.basename(file.path).startswith('__init__.'):
rel -= 1
if rel != 0:
file_id = ".".join(file_id.split(".")[:-rel])
new_id = file_id + "." + imp.id if imp.id else file_id
return new_id
res = [] # type: List[Tuple[int, str, int]]
for imp in file.imports:
if not imp.is_unreachable:
if isinstance(imp, Import):
pri = PRI_MED if imp.is_top_level else PRI_LOW
for id, _ in imp.ids:
ancestor_parts = id.split(".")[:-1]
ancestors = []
for part in ancestor_parts:
ancestors.append(part)
res.append((PRI_LOW, ".".join(ancestors), imp.line))
res.append((pri, id, imp.line))
elif isinstance(imp, ImportFrom):
cur_id = correct_rel_imp(imp)
pos = len(res)
all_are_submodules = True
# Also add any imported names that are submodules.
pri = PRI_MED if imp.is_top_level else PRI_LOW
for name, __ in imp.names:
sub_id = cur_id + '.' + name
if self.is_module(sub_id):
res.append((pri, sub_id, imp.line))
else:
all_are_submodules = False
# If all imported names are submodules, don't add
# cur_id as a dependency. Otherwise (i.e., if at
# least one imported name isn't a submodule)
# cur_id is also a dependency, and we should
# insert it *before* any submodules.
if not all_are_submodules:
pri = PRI_HIGH if imp.is_top_level else PRI_LOW
res.insert(pos, ((pri, cur_id, imp.line)))
elif isinstance(imp, ImportAll):
pri = PRI_HIGH if imp.is_top_level else PRI_LOW
res.append((pri, correct_rel_imp(imp), imp.line))
return res
def is_module(self, id: str) -> bool:
"""Is there a file in the file system corresponding to module id?"""
return find_module(id, self.lib_path) is not None
def parse_file(self, id: str, path: str, source: str) -> MypyFile:
"""Parse the source of a file with the given name.
Raise CompileError if there is a parse error.
"""
num_errs = self.errors.num_messages()
tree = parse(source, path, self.errors, options=self.options)
tree._fullname = id
if self.errors.num_messages() != num_errs:
self.log("Bailing due to parse errors")
self.errors.raise_error()
self.errors.set_file_ignored_lines(path, tree.ignored_lines)
return tree
def module_not_found(self, path: str, line: int, id: str) -> None:
self.errors.set_file(path)
stub_msg = "(Stub files are from https://github.com/python/typeshed)"
if ((self.options.python_version[0] == 2 and moduleinfo.is_py2_std_lib_module(id)) or
(self.options.python_version[0] >= 3 and moduleinfo.is_py3_std_lib_module(id))):
self.errors.report(
line, 0, "No library stub file for standard library module '{}'".format(id))
self.errors.report(line, 0, stub_msg, severity='note', only_once=True)
elif moduleinfo.is_third_party_module(id):
self.errors.report(line, 0, "No library stub file for module '{}'".format(id))
self.errors.report(line, 0, stub_msg, severity='note', only_once=True)
else:
self.errors.report(line, 0, "Cannot find module named '{}'".format(id))
self.errors.report(line, 0, '(Perhaps setting MYPYPATH '
'or using the "--silent-imports" flag would help)',
severity='note', only_once=True)
def report_file(self, file: MypyFile) -> None:
if self.source_set.is_source(file):
self.reports.file(file, type_map=self.type_checker.type_map)
def log(self, *message: str) -> None:
if self.options.verbosity >= 1:
print('LOG: ', *message, file=sys.stderr)
sys.stderr.flush()
def trace(self, *message: str) -> None:
if self.options.verbosity >= 2:
print('TRACE:', *message, file=sys.stderr)
sys.stderr.flush()
def remove_cwd_prefix_from_path(p: str) -> str:
"""Remove current working directory prefix from p, if present.
Also crawl up until a directory without __init__.py is found.
If the result would be empty, return '.' instead.
"""
cur = os.getcwd()
# Add separator to the end of the path, unless one is already present.
if basename(cur) != '':
cur += os.sep
# Compute root path.
while (p and
(os.path.isfile(os.path.join(p, '__init__.py')) or
os.path.isfile(os.path.join(p, '__init__.pyi')))):
dir, base = os.path.split(p)
if not base:
break
p = dir
# Remove current directory prefix from the path, if present.
if p.startswith(cur):
p = p[len(cur):]
# Avoid returning an empty path; replace that with '.'.
if p == '':
p = '.'
return p
# Cache find_module: (id, lib_path) -> result.
find_module_cache = {} # type: Dict[Tuple[str, Tuple[str, ...]], str]
# Cache some repeated work within distinct find_module calls: finding which
# elements of lib_path have even the subdirectory they'd need for the module
# to exist. This is shared among different module ids when they differ only
# in the last component.
find_module_dir_cache = {} # type: Dict[Tuple[str, Tuple[str, ...]], List[str]]
# Cache directory listings. We assume that while one os.listdir()
# call may be more expensive than one os.stat() call, a small number
# of os.stat() calls is quickly more expensive than caching the
# os.listdir() outcome, and the advantage of the latter is that it
# gives us the case-correct filename on Windows and Mac.
find_module_listdir_cache = {} # type: Dict[str, Optional[List[str]]]
def find_module_clear_caches() -> None:
find_module_cache.clear()
find_module_dir_cache.clear()
find_module_listdir_cache.clear()
def list_dir(path: str) -> Optional[List[str]]:
"""Return a cached directory listing.
Returns None if the path doesn't exist or isn't a directory.
"""
if path in find_module_listdir_cache:
return find_module_listdir_cache[path]
try:
res = os.listdir(path)
except OSError:
res = None
find_module_listdir_cache[path] = res
return res
def is_file(path: str) -> bool:
"""Return whether path exists and is a file.
On case-insensitive filesystems (like Mac or Windows) this returns
False if the case of the path's last component does not exactly
match the case found in the filesystem.
"""
head, tail = os.path.split(path)
if not tail:
return False
names = list_dir(head)
if not names:
return False
if tail not in names:
return False
return os.path.isfile(path)
def find_module(id: str, lib_path_arg: Iterable[str]) -> str:
"""Return the path of the module source file, or None if not found."""
lib_path = tuple(lib_path_arg)
def find() -> Optional[str]:
# If we're looking for a module like 'foo.bar.baz', it's likely that most of the
# many elements of lib_path don't even have a subdirectory 'foo/bar'. Discover
# that only once and cache it for when we look for modules like 'foo.bar.blah'
# that will require the same subdirectory.
components = id.split('.')
dir_chain = os.sep.join(components[:-1]) # e.g., 'foo/bar'
if (dir_chain, lib_path) not in find_module_dir_cache:
dirs = []
for pathitem in lib_path:
# e.g., '/usr/lib/python3.4/foo/bar'
dir = os.path.normpath(os.path.join(pathitem, dir_chain))
if os.path.isdir(dir):
dirs.append(dir)
find_module_dir_cache[dir_chain, lib_path] = dirs
candidate_base_dirs = find_module_dir_cache[dir_chain, lib_path]
# If we're looking for a module like 'foo.bar.baz', then candidate_base_dirs now
# contains just the subdirectories 'foo/bar' that actually exist under the
# elements of lib_path. This is probably much shorter than lib_path itself.
# Now just look for 'baz.pyi', 'baz/__init__.py', etc., inside those directories.
seplast = os.sep + components[-1] # so e.g. '/baz'
sepinit = os.sep + '__init__'
for base_dir in candidate_base_dirs:
base_path = base_dir + seplast # so e.g. '/usr/lib/python3.4/foo/bar/baz'
# Prefer package over module, i.e. baz/__init__.py* over baz.py*.
for extension in PYTHON_EXTENSIONS:
path = base_path + sepinit + extension
if is_file(path) and verify_module(id, path):
return path
# No package, look for module.
for extension in PYTHON_EXTENSIONS:
path = base_path + extension
if is_file(path) and verify_module(id, path):
return path
return None
key = (id, lib_path)
if key not in find_module_cache:
find_module_cache[key] = find()
return find_module_cache[key]
def find_modules_recursive(module: str, lib_path: List[str]) -> List[BuildSource]:
module_path = find_module(module, lib_path)
if not module_path:
return []
result = [BuildSource(module_path, module, None)]
if module_path.endswith(('__init__.py', '__init__.pyi')):
# Subtle: this code prefers the .pyi over the .py if both
# exists, and also prefers packages over modules if both x/
# and x.py* exist. How? We sort the directory items, so x
# comes before x.py and x.pyi. But the preference for .pyi
# over .py is encoded in find_module(); even though we see
# x.py before x.pyi, find_module() will find x.pyi first. We
# use hits to avoid adding it a second time when we see x.pyi.
# This also avoids both x.py and x.pyi when x/ was seen first.
hits = set() # type: Set[str]
for item in sorted(os.listdir(os.path.dirname(module_path))):
abs_path = os.path.join(os.path.dirname(module_path), item)
if os.path.isdir(abs_path) and \
(os.path.isfile(os.path.join(abs_path, '__init__.py')) or
os.path.isfile(os.path.join(abs_path, '__init__.pyi'))):
hits.add(item)
result += find_modules_recursive(module + '.' + item, lib_path)
elif item != '__init__.py' and item != '__init__.pyi' and \
item.endswith(('.py', '.pyi')):
mod = item.split('.')[0]
if mod not in hits:
hits.add(mod)
result += find_modules_recursive(
module + '.' + mod, lib_path)
return result
def verify_module(id: str, path: str) -> bool:
"""Check that all packages containing id have a __init__ file."""
if path.endswith(('__init__.py', '__init__.pyi')):
path = dirname(path)
for i in range(id.count('.')):
path = dirname(path)
if not any(os.path.isfile(os.path.join(path, '__init__{}'.format(extension)))
for extension in PYTHON_EXTENSIONS):
return False
return True
def read_with_python_encoding(path: str, pyversion: Tuple[int, int]) -> str:
"""Read the Python file with while obeying PEP-263 encoding detection"""
source_bytearray = bytearray()
encoding = 'utf8' if pyversion[0] >= 3 else 'ascii'
with open(path, 'rb') as f:
# read first two lines and check if PEP-263 coding is present
source_bytearray.extend(f.readline())
source_bytearray.extend(f.readline())
# check for BOM UTF-8 encoding and strip it out if present
if source_bytearray.startswith(b'\xef\xbb\xbf'):
encoding = 'utf8'
source_bytearray = source_bytearray[3:]
else:
_encoding, _ = util.find_python_encoding(source_bytearray, pyversion)
# check that the coding isn't mypy. We skip it since
# registering may not have happened yet
if _encoding != 'mypy':
encoding = _encoding
source_bytearray.extend(f.read())
try:
source_bytearray.decode(encoding)
except LookupError as lookuperr:
raise DecodeError(str(lookuperr))
return source_bytearray.decode(encoding)
def get_cache_names(id: str, path: str, cache_dir: str,
pyversion: Tuple[int, int]) -> Tuple[str, str]:
"""Return the file names for the cache files.
Args:
id: module ID
path: module path (used to recognize packages)
cache_dir: cache directory
pyversion: Python version (major, minor)
Returns:
A tuple with the file names to be used for the meta JSON and the
data JSON, respectively.
"""
prefix = os.path.join(cache_dir, '%d.%d' % pyversion, *id.split('.'))
is_package = os.path.basename(path).startswith('__init__.py')
if is_package:
prefix = os.path.join(prefix, '__init__')
return (prefix + '.meta.json', prefix + '.data.json')
def find_cache_meta(id: str, path: str, manager: BuildManager) -> Optional[CacheMeta]:
"""Find cache data for a module.
Args:
id: module ID
path: module path
manager: the build manager (for pyversion, log/trace, and build options)
Returns:
A CacheMeta instance if the cache data was found and appears
valid; otherwise None.
"""
# TODO: May need to take more build options into account
meta_json, data_json = get_cache_names(
id, path, manager.options.cache_dir, manager.options.python_version)
manager.trace('Looking for {} {}'.format(id, data_json))
if not os.path.exists(meta_json):
manager.trace('Could not load cache for {}: could not find {}'.format(id, meta_json))
return None
with open(meta_json, 'r') as f:
meta_str = f.read()
manager.trace('Meta {} {}'.format(id, meta_str.rstrip()))
meta = json.loads(meta_str) # TODO: Errors
if not isinstance(meta, dict):
manager.trace('Could not load cache for {}: meta cache is not a dict'.format(id))
return None
path = os.path.abspath(path)
m = CacheMeta(
meta.get('id'),
meta.get('path'),
meta.get('mtime'),
meta.get('size'),
meta.get('dependencies', []),
meta.get('data_mtime'),
data_json,
meta.get('suppressed', []),
meta.get('child_modules', []),
meta.get('options'),
meta.get('dep_prios', []),
meta.get('interface_hash', ''),
meta.get('version_id'),
)
if (m.id != id or m.path != path or
m.mtime is None or m.size is None or
m.dependencies is None or m.data_mtime is None):
manager.trace('Metadata abandoned for {}: attributes are missing'.format(id))
return None
# Ignore cache if generated by an older mypy version.
if (m.version_id != manager.version_id
or m.options is None
or len(m.dependencies) != len(m.dep_prios)):
manager.trace('Metadata abandoned for {}: new attributes are missing'.format(id))
return None
# Ignore cache if (relevant) options aren't the same.
cached_options = m.options
current_options = manager.options.select_options_affecting_cache()
if cached_options != current_options:
manager.trace('Metadata abandoned for {}: options differ'.format(id))
return None
return m
def is_meta_fresh(meta: CacheMeta, id: str, path: str, manager: BuildManager) -> bool:
if meta is None:
return False
# TODO: Share stat() outcome with find_module()
st = manager.get_stat(path) # TODO: Errors
if st.st_mtime != meta.mtime or st.st_size != meta.size:
manager.log('Metadata abandoned for {}: file {} is modified'.format(id, path))
return None
# It's a match on (id, path, mtime, size).
# Check data_json; assume if its mtime matches it's good.
# TODO: stat() errors
if os.path.getmtime(meta.data_json) != meta.data_mtime:
manager.log('Metadata abandoned for {}: data cache is modified'.format(id))
return False
manager.log('Found {} {} (metadata is fresh)'.format(id, meta.data_json))
return True
def random_string() -> str:
return binascii.hexlify(os.urandom(8)).decode('ascii')
def compute_hash(text: str) -> str:
# We use md5 instead of the builtin hash(...) function because the output of hash(...)
# can differ between runs due to hash randomization (enabled by default in Python 3.3).
# See the note in https://docs.python.org/3/reference/datamodel.html#object.__hash__.
return hashlib.md5(text.encode('utf-8')).hexdigest()
def write_cache(id: str, path: str, tree: MypyFile,
dependencies: List[str], suppressed: List[str],
child_modules: List[str], dep_prios: List[int],
old_interface_hash: str, manager: BuildManager) -> str:
"""Write cache files for a module.
Args:
id: module ID
path: module path
tree: the fully checked module data
dependencies: module IDs on which this module depends
suppressed: module IDs which were suppressed as dependencies
dep_prios: priorities (parallel array to dependencies)
old_interface_hash: the hash from the previous version of the data cache file
manager: the build manager (for pyversion, log/trace)
Return:
The new interface hash based on the serialized tree
"""
# Obtain file paths
path = os.path.abspath(path)
meta_json, data_json = get_cache_names(
id, path, manager.options.cache_dir, manager.options.python_version)
manager.log('Writing {} {} {} {}'.format(id, path, meta_json, data_json))
# Make sure directory for cache files exists
parent = os.path.dirname(data_json)
if not os.path.isdir(parent):
os.makedirs(parent)
assert os.path.dirname(meta_json) == parent
# Construct temp file names
nonce = '.' + random_string()
data_json_tmp = data_json + nonce
meta_json_tmp = meta_json + nonce
# Serialize data and analyze interface
data = tree.serialize()
if manager.options.debug_cache:
data_str = json.dumps(data, indent=2, sort_keys=True)
else:
data_str = json.dumps(data, sort_keys=True)
interface_hash = compute_hash(data_str)
# Write data cache file, if applicable
if old_interface_hash == interface_hash:
# If the interface is unchanged, the cached data is guaranteed
# to be equivalent, and we only need to update the metadata.
data_mtime = os.path.getmtime(data_json)
manager.trace("Interface for {} is unchanged".format(id))
else:
with open(data_json_tmp, 'w') as f:
f.write(data_str)
f.write('\n')
data_mtime = os.path.getmtime(data_json_tmp)
os.replace(data_json_tmp, data_json)
manager.trace("Interface for {} has changed".format(id))
# Obtain and set up metadata
st = manager.get_stat(path) # TODO: Handle errors
mtime = st.st_mtime
size = st.st_size
options = manager.options.clone_for_file(path)
meta = {'id': id,
'path': path,
'mtime': mtime,
'size': size,
'data_mtime': data_mtime,
'dependencies': dependencies,
'suppressed': suppressed,
'child_modules': child_modules,
'options': options.select_options_affecting_cache(),
'dep_prios': dep_prios,
'interface_hash': interface_hash,
'version_id': manager.version_id,
}
# Write meta cache file
with open(meta_json_tmp, 'w') as f:
if manager.options.debug_cache:
json.dump(meta, f, indent=2, sort_keys=True)
else:
json.dump(meta, f)
os.replace(meta_json_tmp, meta_json)
return interface_hash
"""Dependency manager.
Design
======
Ideally
-------
A. Collapse cycles (each SCC -- strongly connected component --
becomes one "supernode").
B. Topologically sort nodes based on dependencies.
C. Process from leaves towards roots.
Wrinkles
--------
a. Need to parse source modules to determine dependencies.
b. Processing order for modules within an SCC.
c. Must order mtimes of files to decide whether to re-process; depends
on clock never resetting.
d. from P import M; checks filesystem whether module P.M exists in
filesystem.
e. Race conditions, where somebody modifies a file while we're
processing. I propose not to modify the algorithm to handle this,
but to detect when this could lead to inconsistencies. (For
example, when we decide on the dependencies based on cache
metadata, and then we decide to re-parse a file because of a stale
dependency, if the re-parsing leads to a different list of
dependencies we should warn the user or start over.)
Steps
-----
1. For each explicitly given module find the source file location.
2. For each such module load and check the cache metadata, and decide
whether it's valid.
3. Now recursively (or iteratively) find dependencies and add those to
the graph:
- for cached nodes use the list of dependencies from the cache
metadata (this will be valid even if we later end up re-parsing
the same source);
- for uncached nodes parse the file and process all imports found,
taking care of (a) above.
Step 3 should also address (d) above.
Once step 3 terminates we have the entire dependency graph, and for
each module we've either loaded the cache metadata or parsed the
source code. (However, we may still need to parse those modules for
which we have cache metadata but that depend, directly or indirectly,
on at least one module for which the cache metadata is stale.)
Now we can execute steps A-C from the first section. Finding SCCs for
step A shouldn't be hard; there's a recipe here:
http://code.activestate.com/recipes/578507/. There's also a plethora
of topsort recipes, e.g. http://code.activestate.com/recipes/577413/.
For single nodes, processing is simple. If the node was cached, we
deserialize the cache data and fix up cross-references. Otherwise, we
do semantic analysis followed by type checking. We also handle (c)
above; if a module has valid cache data *but* any of its
dependencies was processed from source, then the module should be
processed from source.
A relatively simple optimization (outside SCCs) we might do in the
future is as follows: if a node's cache data is valid, but one or more
of its dependencies are out of date so we have to re-parse the node
from source, once we have fully type-checked the node, we can decide
whether its symbol table actually changed compared to the cache data
(by reading the cache data and comparing it to the data we would be
writing). If there is no change we can declare the node up to date,
and any node that depends (and for which we have cached data, and
whose other dependencies are up to date) on it won't need to be
re-parsed from source.
Import cycles
-------------
Finally we have to decide how to handle (c), import cycles. Here
we'll need a modified version of the original state machine
(build.py), but we only need to do this per SCC, and we won't have to
deal with changes to the list of nodes while we're processing it.
If all nodes in the SCC have valid cache metadata and all dependencies
outside the SCC are still valid, we can proceed as follows:
1. Load cache data for all nodes in the SCC.
2. Fix up cross-references for all nodes in the SCC.
Otherwise, the simplest (but potentially slow) way to proceed is to
invalidate all cache data in the SCC and re-parse all nodes in the SCC
from source. We can do this as follows:
1. Parse source for all nodes in the SCC.
2. Semantic analysis for all nodes in the SCC.
3. Type check all nodes in the SCC.