"Below is an example of a diff showing changes someone made to a shape file. Notice we can see the commit message, but there is no way to know what point was added, or what date was changed without doing some serious searching in QGIS or ArcGIS.\n",
"Below is an example of a diff showing changes someone made to a shape file, a non-text-based format. Notice we can see the commit message, but there is no way to know what point was added, or what date was changed without doing some serious searching in QGIS or ArcGIS.\n",
" </font>"
]
},
...
...
%% Cell type:markdown id: tags:
<h1>Why we are using git and GitLab</h1>
%% Cell type:markdown id: tags:
<h2align="left"> We need:</h2>
<ul>
<fontsize="3">
1. a way to store and share both data and code<br>
2. a collaborative space <br>
3. precise version control <br>
</font>
</ul>
%% Cell type:markdown id: tags:
<fontsize="3">
<ul>
Using git and GitLab is one way to do these three things. Together, they can:<br>
1. store and share data and code<br>
2. enable users to see others work, edit, and comment<br>
3. enable distributed version control of files<br>
<br>
</font>
<fontsize="3">
Some quick definitions: <i>Git</i> is the software or command-line system that keeps track of files and file changes. <i>GitLab</i> is a server with a user interface that allows us to easily view and share a repository in the cloud. It also adds many more features like password protection and wikis.
</font>
%% Cell type:markdown id: tags:
<h2>Why we are using git</h2>
<fontsize="3">
We make changes to files all the time. Many times this is not an issue, such as if you are writing a letter in Microsoft Word. You can pick up right where you left off when you double click your document. You can edit the file, save it, and if you see yourself make a mistake you can "undo". You are at least mostly aware of how your Word document has changed over time as you were the only one editing it.
<br>
<br>
For our workflow this paradigm is turned upside-down. There are a lot of different people who want to pick up right where you left off. When you edit geospatial datasets it overwrites files without an "undo". Files have long vesion histories that are nearly impossible for one person to keep track of. This is where git comes in.
<br>
<br>
Git tracks all changes people make to files. A change to a file in git is called a <i>commit.</i> With every change someone makes to a file, git requires you to write a little message about the change called a "commit message". Everyone can see every single commit and commit message, as well as download or revert to old versions. Everyone can also "pull" new changes to their files or jump forward to newer versions. Git also can show you exactly what has changed between versions of a file in what git calls a <i>diff.</i>
</font>
<br>
<fontsize="3">
Below is an example of a diff showing changes someone made to a text file called index.md. This diff shows us the updating_the_geo_nodes.md line was <fontcolor='red'> deleted </font> and the version_specific_updates.md line was <fontcolor='green'> added </font>.
Git was origially designed to deal with code for software development, so filetypes associated with most programming languages work great in git. For example, git will track commits and diffs for files like...<br>
<ul>
<li>Matlab files (.m)</li>
<li>R scripts (.r)</li>
<li>Text files (.txt)</li>
<li>Python programs (.py)</li>
<li>HTML files (.html)</li>
<li>and many more... </li>
</ul>
</font>
%% Cell type:markdown id: tags:
<fontsize="3">
<halign="left"> Git works really well with those files because they are <i><b>flat, text-based</b></i> formats that git easily reads in when someone makes a commit and understands when analyzing diffs. </h>
However, git <i><b>can't</b></i> read the contents of <i><b>non-text-based</b></i> files. In other words, when someone makes changes to files that <i>are not</i> text-based, git doesn't understand the diff. These filetypes include...
<ul>
<li>Rendered Documents (.pdf)</li>
<li>Binary images (.jpg)</li>
<li>Formatted word docs (.doc)</li>
<li>Tabular data with multiple sheets (.xlsx)</li>
</ul>
<i>If we make a commit with these files, we won't be able to see the diff!</i>
<br>
<br>
Geospatial data has text-based formats and non-text-based formats.
Below is an example of a diff showing changes someone made to a shape file. Notice we can see the commit message, but there is no way to know what point was added, or what date was changed without doing some serious searching in QGIS or ArcGIS.
Below is an example of a diff showing changes someone made to a shape file, a non-text-based format. Notice we can see the commit message, but there is no way to know what point was added, or what date was changed without doing some serious searching in QGIS or ArcGIS.
The GeoJSON format is the standard for text-based geospatial data formats. It is an open format, meaning it will work on most operating systems and GIS software. It has small filesizes, and as it is built on top of the already well-established JSON format, it has a number of supporting packages in programming languages such as Python and R.
</font>
%% Cell type:markdown id: tags:
<h1>In conclusion</h1>
<fontsize="3">
We are using git and GitLab with the text-based geospatial data format GeoJSON.
<br>
<br>
<i>"But I have always used shape files. Is there some sort of disadvantage?"</i><br>
<ul>
No. You should convert to GeoJSON. Besides, you can always convert back if you ever needed to!
<br>
Also, unlike shape files which have dependent files (.shx, .dbf, .prj) at risk of getting misplaced or left untracked, with GeoJSON there is only one file to track (.geojson). GeoJSON files are just text, so we can visibly see any edits made to the files in GitLab as they are made. We can also easily manipulate the files using programming languages like Python or R! These are a few reasons we ask that any geospatial data you have be converted to GeoJSON before pushing it to the repository. Conversion is painless, and there is a Jupyter Notebook in this folder to help.
</font>
%% Cell type:markdown id: tags:
<h3>Helpful Links:</h3>
<br>
<h4>Git desktop client</h4>
Free: https://www.sourcetreeapp.com
Free demo/free for academics: https://www.git-tower.com
<br>
<br>
<h4>GeoJSON Web Visualization</h4>
Github has native visualization: https://github.com/paultag/dc/blob/master/coffee.geojson