The geospatial IT industry made huge advances since then, yet the Shapefile is still most common file format for sharing vector data.
Shapefile was a great format in it's time, and the fact that it's still used today, is proof that its design was forward thinking.
Yet we believe, that the time of the Shapefile has passed and we, the Geospatial IT Industry, should replace it with more modern concepts. Shapefile is now an outdated format and should be abandoned in the future for sharing of geospatial vector data.
Not all Shapefile features do make it a bad (or unmodern) format. There are a couple of reasons why Shapefile still beats most other formats:
Why is Shapefile so bad? Here are several reasons why the Shapefile is a bad format and you should avoid its usage:
The Shapefile format uses at least 3 files, but there are up to 10 other file extensions, which can be distributed along with the Shapefile. Nearly every geospatial software package is adding its own extension to try to overcome some of the limits Shapefile has.
This is highly impractical for distribution of data. Users usually have to zip all the files into one archive, and unzip them on the other end of the distribution chain. Custom additions are not supported by other tools and limit interoperability.
Attribute names are limited to 10 characters max. Longer names are usually automatically shortened. This is limiting to some applications and leads to bad data arrangement.
There can be only 255 attribute fields in the database file. For some applications this is limiting, especially in combination with the flat table structure.
Float, integer, date and character string data types are supported. Floating point numbers can be stored as text, but there is no support for big integers (thus the format is not usable, you have data with big integer identifiers, such as cadastral maps) and the text is limited to only 254 characters.
There is no support for more advanced data fields such as blobs, images or arrays.
There is no way to specify the character set used in the database. Many applications are using the old Windows-* or ISO-* data encodings, while nowadays we are tending to use UTF-8 more. Still there is no way to specify this in file header.
The support for Unicode characters is also very limited.
The size of both .shp and .dbf component files cannot exceed 2 GB. GDAL Shapefile driver overcomes this limit, but
The Shapefile format explicitly uses 32bit offsets and so cannot go over 8GB (it actually uses 32bit offsets to 16bit words), but the OGR shapefile implementation has a limitation of 4GB.
For compatibility with other software implementations, it is not recommended to use a file size over 2GB for both .SHP and .DBF files.
So 4GB is all you can have in single Shapefile. This sounds enough, but not for all cases.
Shapefile is simple-feature format. There is no way to store more complex geometry relationships.
Each file can be only one of the supported geometry formats (Point, Line, Polygon and others). Mixed geometry features are not possible.
The data structure is limited to flat tables with no hierarchies, relations or tree structure.
Shapefile can't store material definitions nor textures (images with texture coordinates). 3D models are stored as a triangle or polygon soup, with no watertight models or parametric geometries being supported.
Shapefiles use Esri WKT definitions, which are often incompatible with standard definitions in EPSG or other sources regarding aspects such as axis order or unit definitions. Furthermore, they often miss parameters required for reprojection ("Missing Bursa Wolf Parameters", anyone?)
Do you know about more limits or do you want to extend existing ones? Please do so via pull-request or comment in the repository.
What are the alternatives to the Shapefile format? To be honest, no alternative format has overthrown Shapefiles hegemony yet. Some formats nearly took over (KML, GML, GeoJSON), but their usage was limited to relatively narrow use cases only.
Although there are more then 80 vector data formats in use out there, only a few can be considered as candidates for Shapefile replacement.List of some Shapefile alternatives
GeoJSON isn't a shapefile replacement.OGC GeoPackage is one of the most promising formats, designed for today's modern applications. GeoPackage is published as standard by the Open Geospatial Consortium.
-- Sean Gillies
GeoPackage is an open, standards-based, platform-independent, portable, self-describing, compact format for transferring geospatial information.
The GeoPackage Encoding Standard describes a set of conventions for storing the following within an SQLite database:
There are several published extensions for GeoPackage, which make this format even more powerful.
GeoPackage is now (2017) supported in most GIS software packages.
A major downside is that, being based on SQLite, data is in a complex binary format that can't be streamed. It typically must be written to disk before opening.
We believe that GeoPackage is a candidate for Shapefile replacement for editing data locally.
GeoJSON is very simple, human-readable, text-based format. Although it is technically possible to use it with more coordinate reference systems, the specification states clearly, that WGS84 is the only system, which should be used. It can handle complex vector data features and build complex hierarchical data models.
Since GeoJSON is based on JSON it is very easy to parse, additonally it can be streamed (Features are dealt with as they come in without waiting for the whole file to load)
The problem with GeoJSON is that not all geometries can be represented and advanced coordinate reference systems are not well supported.
We recomend GeoJSON as a Shapefile replacement for most data interchange. Datasets that must be tranfered in a non WGS-84 coordinate reference system or have geometry not representable in GeoJSON, might use GML. However for the vast majority of datasets, GML is overkill.
Another OGC Standard.
GML was picked as the main distribution vector data format the European INSPIRE initiative. It's a very complex format, and its direct usage in GIS software is limited. Its main use is as a data exchange format that needs to be ingested into the user's system (e.g. into a database) to be fully useable.
GML is currently often used for open data datasets, since it's technology neutral and a supported OGC Standard.
A major downside to GML is that it is an insanely complex standard, few pieces of software support all parts of the standard, and different pieces of software sometimes support different parts of the standard.
We believe that GML is a candidate for Shapefile replacement for data interchange in situations where data is too complex to be represented by GeoJSON.
SpatiaLite is popular database, file based data storage.
SpatiaLite is an open source library intended to extend the SQLite core to support fully fledged Spatial SQL capabilities. SQLite is intrinsically simple and lightweight:
To us, SpatiaLite seems to be the second best option after the GeoPackage format. They do build on top of the same technology, SQLite.
Compared to GeoPackage, it lacks support for extensions and support for raster data. That is certainly not a must-have feature, but it's good. If we take this into consideration, like GeoPackage, it suffers the same issues inherint to SQLite that make it a poor choice for data interchange.
Some people tend to use comma separated files for storing geospatial data.
Among non-geospatial people, CSV is very popular, but for most geospatial applications it's an unusable format.
At least two reasons for not using CSV as Shapefile replacement: It's not standardized (there are many dialects out there), and support for non-point geospatial data is complicated.
Originally provided by Google OGC KML used to be very popular vector data format.
Some years back, the KML format was very popular, but the geospatial community hit upon its limits. Since it is XML based, it's not suitable for bigger datasets. It combines cartography along with the data geometry in one file, which does not seem to be a good solution. And it officially supports only the WGS-84 coordinate reference system.
At its most basic level, an ArcGIS geodatabase is a collection of geographic datasets of various types held in a common file system folder, a Microsoft Access database, or a multiuser relational DBMS (such as Oracle, Microsoft SQL Server, PostgreSQL, Informix, or IBM DB2).
GeoDatabase is very often used in the ArcGIS environment as the main exchange data format. It's features are very complex and advanced.
On the other hand, it is a proprietary closed format, which is used exclusively in the environment of ESRI products, not implemented in other software packages. It's a strong candidate for replacing Shapefiles for local data editing in an ArcGIS enviroment.
Last modification: 2017-10-08
Initially created by: Jachym Cepicky, OpenGeoLabs s.r.o.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License
Contribute: On GitHub