@@ -3670,22 +3670,27 @@ into a .dta file. The format version of this file is always 115 (Stata 12).
36703670 df.to_stata(' stata.dta' )
36713671
36723672 *Stata * data files have limited data type support; only strings with 244 or
3673- fewer characters, ``int8 ``, ``int16 ``, ``int32 `` and ``float64 `` can be stored
3674- in ``.dta `` files. *Stata * reserves certain values to represent
3675- missing data. Furthermore, when a value is encountered outside of the
3676- permitted range, the data type is upcast to the next larger size. For
3677- example, ``int8 `` values are restricted to lie between -127 and 100, and so
3678- variables with values above 100 will trigger a conversion to ``int16 ``. ``nan ``
3679- values in floating points data types are stored as the basic missing data type
3680- (``. `` in *Stata *). It is not possible to indicate missing data values for
3681- integer data types.
3673+ fewer characters, ``int8 ``, ``int16 ``, ``int32 ``, ``float32` and ``float64 ``
3674+ can be stored
3675+ in ``.dta `` files. Additionally, *Stata * reserves certain values to represent
3676+ missing data. Exporting a non-missing value that is outside of the
3677+ permitted range in Stata for a particular data type will retype the variable
3678+ to the next larger size. For example, ``int8 `` values are restricted to lie
3679+ between -127 and 100 in Stata, and so variables with values above 100 will
3680+ trigger a conversion to ``int16 ``. ``nan `` values in floating points data
3681+ types are stored as the basic missing data type (``. `` in *Stata *).
3682+
3683+ .. note ::
3684+
3685+ It is not possible to export missing data values for integer data types.
3686+
36823687
36833688The *Stata * writer gracefully handles other data types including ``int64 ``,
3684- ``bool ``, ``uint8 ``, ``uint16 ``, ``uint32 `` and `` float32 `` by upcasting to
3689+ ``bool ``, ``uint8 ``, ``uint16 ``, ``uint32 `` by casting to
36853690the smallest supported type that can represent the data. For example, data
36863691with a type of ``uint8 `` will be cast to ``int8 `` if all values are less than
36873692100 (the upper bound for non-missing ``int8 `` data in *Stata *), or, if values are
3688- outside of this range, the data is cast to ``int16 ``.
3693+ outside of this range, the variable is cast to ``int16 ``.
36893694
36903695
36913696.. warning ::
@@ -3701,50 +3706,41 @@ outside of this range, the data is cast to ``int16``.
37013706 115 dta file format. Attempting to write *Stata * dta files with strings
37023707 longer than 244 characters raises a ``ValueError ``.
37033708
3704- .. warning ::
3705-
3706- *Stata * data files only support text labels for categorical data. Exporting
3707- data frames containing categorical data will convert non-string categorical values
3708- to strings.
3709-
3710- Writing data to/from Stata format files with a ``category `` dtype was implemented in 0.15.2.
3711-
37123709.. _io.stata_reader :
37133710
3714- Reading from STATA format
3711+ Reading from Stata format
37153712~~~~~~~~~~~~~~~~~~~~~~~~~
37163713
3717- The top-level function ``read_stata `` will read a dta format file
3718- and return a DataFrame:
3719- The class :class: `~pandas.io.stata.StataReader ` will read the header of the
3720- given dta file at initialization. Its method
3721- :func: `~pandas.io.stata.StataReader.data ` will read the observations,
3722- converting them to a DataFrame which is returned:
3714+ The top-level function ``read_stata `` will read a dta files
3715+ and return a DataFrame. Alternatively, the class :class: `~pandas.io.stata.StataReader `
3716+ can be used if more granular access is required. :class: `~pandas.io.stata.StataReader `
3717+ reads the header of the dta file at initialization. The method
3718+ :func: `~pandas.io.stata.StataReader.data ` reads and converts observations to a DataFrame.
37233719
37243720.. ipython :: python
37253721
37263722 pd.read_stata(' stata.dta' )
37273723
3728- Currently the ``index `` is retrieved as a column on read back .
3724+ Currently the ``index `` is retrieved as a column.
37293725
37303726The parameter ``convert_categoricals `` indicates whether value labels should be
37313727read and used to create a ``Categorical `` variable from them. Value labels can
37323728also be retrieved by the function ``variable_labels ``, which requires data to be
3733- called before (see ``pandas.io.stata.StataReader ``).
3729+ called before use (see ``pandas.io.stata.StataReader ``).
37343730
37353731The parameter ``convert_missing `` indicates whether missing value
37363732representations in Stata should be preserved. If ``False `` (the default),
37373733missing values are represented as ``np.nan ``. If ``True ``, missing values are
37383734represented using ``StataMissingValue `` objects, and columns containing missing
3739- values will have ``dtype `` set to `` object ``.
3735+ values will have ``` object `` data type .
37403736
3741- The StataReader supports .dta Formats 104, 105, 108, 113-115 and 117.
3742- Alternatively, the function :func: ` ~pandas.io.stata.read_stata ` can be used
3737+ :func: ` ~pandas.read_stata ` and :class: ` ~pandas.io.stata. StataReader` supports .dta
3738+ formats 104, 105, 108, 113-115 (Stata 10-12) and 117 (Stata 13+).
37433739
37443740.. note ::
37453741
3746- Setting ``preserve_dtypes=False `` will upcast all integer data types to
3747- ``int64 `` and all floating point data types to ``float64 ``. By default,
3742+ Setting ``preserve_dtypes=False `` will upcast to the standard pandas data types:
3743+ ``int64 `` for all integer types and ``float64 `` for floating poitn data . By default,
37483744 the Stata data types are preserved when importing.
37493745
37503746.. ipython :: python
@@ -3775,14 +3771,13 @@ is lost when exporting.
37753771
37763772Labeled data can similarly be imported from *Stata * data files as ``Categorical ``
37773773variables using the keyword argument ``convert_categoricals `` (``True `` by default).
3778- By default, imported ``Categorical `` variables are ordered according to the
3779- underlying numerical data. However, setting ``order_categoricals=False `` will
3780- import labeled data as ``Categorical `` variables without an order.
3774+ The keyword argument ``order_categoricals `` (``True `` by default) determines
3775+ whether imported ``Categorical `` variables are ordered.
37813776
37823777.. note ::
37833778
37843779 When importing categorical data, the values of the variables in the *Stata *
3785- data file are not generally preserved since ``Categorical `` variables always
3780+ data file are not preserved since ``Categorical `` variables always
37863781 use integer data types between ``-1 `` and ``n-1 `` where ``n `` is the number
37873782 of categories. If the original values in the *Stata * data file are required,
37883783 these can be imported by setting ``convert_categoricals=False ``, which will
@@ -3795,7 +3790,7 @@ import labeled data as ``Categorical`` variables without an order.
37953790
37963791.. note ::
37973792
3798- *Stata * suppots partially labeled series. These series have value labels for
3793+ *Stata * supports partially labeled series. These series have value labels for
37993794 some but not all data values. Importing a partially labeled series will produce
38003795 a ``Categorial `` with string categories for the values that are labeled and
38013796 numeric categories for values with no label.
0 commit comments