@@ -24,8 +24,13 @@ before, is another website and online platform similar to GitLab.)
Why we are using git
----------------------------------------------
We make changes to files all the time. Most changes we make (even mistakes) are not a big issue when a single person is in control of the file and its management. For example, when you write a report in Microsoft Word, you can pick up where you last saved the document. You can edit the file, save it, and if you see yourself make a mistake, you can “undo”. You are fully aware of how your Word document has changed over time.
We make changes to files all the time. Most changes we make (even mistakes) are not a big issue when a single person is in control of the file and its management. For example, when you write a report in Microsoft Word, you can pick up where you last saved the document. You can edit the file, save it, and if you see yourself make a mistake, you can “undo”. You are fully aware of how your Word document has changed over time. Moving to a file-sharing system with multiple users and organizations reveals some key problems with the familiar workflow presented above. In a distributed, simultaneous setting, there are a lot of different people who may want to edit your file. They may not know what edits you made in the past and they may make significant edits to your file that you may not notice or understand. It is also likely multiple people could work on the same file and make conflicting changes that do not work well together (e.g., simultaneous edits to the same sentence or paragraph in a report). In worst-case situations, one person may overwrite a file and lose important information without knowing and with no power to “undo”. This is where git comes in. In general, git forces users to be deliberate about the file editing process by tracking all changes people make to files. Git tracks files by creating a snapshot of a file (or multiple files), called a commit, every time users write a specific command. With every commit, git also requires users to write a message about the change they are making called a commit message. Every commit and commit message helps explain what happened between each version of the same file. Everyone with access to the file can see every commit and commit message, as well as download or revert to old versions of the file at any time. Git also enables users to pull other people’s changes into their files (i.e., update their files with newer versions).
Moving to a file-sharing system with multiple users and organizations reveals some key problems with the familiar workflow presented above. In a distributed, simultaneous setting, there are a lot of different people who may want to edit your file. They may not know what edits you made in the past and they may make significant edits to your file that you may not notice or understand. It is also likely multiple people could work on the same file and make conflicting changes that do not work well together (e.g., simultaneous edits to the same sentence or paragraph in a report). In worst-case situations, one person may overwrite a file and lose important information without knowing and with no power to “undo”.
This is where git comes in.
In general, git forces users to be deliberate about the file editing process by tracking all changes people make to files. Git tracks files by creating a snapshot of a file (or multiple files), called a commit, every time users write a specific command. With every commit, git also requires users to write a message about the change they are making called a commit message. Every commit and commit message helps explain what happened between each version of the same file. Everyone with access to the file can see every commit and commit message, as well as download or revert to old versions of the file at any time. Git also enables users to pull other people’s changes into their files (i.e., update their files with newer versions).
Diffs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
...
@@ -33,8 +38,8 @@ Diffs
Git also automatically produces an output called a diff that shows the line-by-line edits made in each commit. Diffs pinpoint the exact differences between file versions and provide a powerful way to understand version details and commit messages left by users. Below is an example of a diff showing changes someone made to a text file called index.md. This diff shows us the updating_the_geo_nodes.md line was deleted and the version_specific_updates.md line was added . Notice how
this diff identifies only the exact part of the line that changed, as the “ing” stays the same between commits!
Diffs are only visible with git for certain filetypes. Git was origially designed to deal with code for software development, so filetypes associated with most programming languages work great in git. For example, git will track commits and diffs for files like…
...
...
@@ -48,7 +53,7 @@ Diffs are only visible with git for certain filetypes. Git was origially designe
- Non-Tabular, text-based data (e.g., .json)
- and many more…
Git works really well with those files because they are flat, text-based formats that git easily reads when someone makes a commit and can analyze when generating diffs. However, git can’t read the contents of non-text-based files. In other words, when someone makes changes to files that are not text-based, git doesn’t understand the diff. These filetypes include…
Git works really well with those files because they are flat, text-based formats that git easily reads when someone makes a commit and can analyze when generating diffs. However, git can’t read the contents of non-text-based files. In other words, when someone makes changes to files that are not text-based, git doesn’t understand the diff. These filetypes include…
- Rendered Documents (.pdf)
- Binary images (.jpg)
...
...
@@ -62,8 +67,8 @@ If we make a commit with these files, we won’t be able to see the diff! Below
commit history. However, the diff cannot show what changed. The user is stuck with figuring out the differences between the original file and
this new version. The only information provided is the commit message, “Added point and changed date”. For large files, this limited information may not be useful enough to track changes. It is often difficult to manage large, distributed projects without seeing the diffs.
@@ -72,13 +77,13 @@ What text-based filetypes should we use?
Storing and Sharing Code
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In general, git will work with any computer code we write in any format. Git was designed for software development, and works great with essentially any program we write from scratch. It doesn’t matter if you write your program in python, C, C++, C#, java, javascript, r, matlab, or any other language. All programs are essentially text-based. For special files that may be required to run something, git has built-in functionality to help manage these issues.
In general, git will work with any computer code we write in any format. Git was designed for software development, and works great with essentially any program we write from scratch. It doesn’t matter if you write your program in python, C, C++, C#, java, javascript, r, matlab, or any other language. All programs are essentially text-based. For special files that may be required to run something, git has built-in functionality to help manage these issues.
Storing and Sharing Data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The two biggest constraints when using git to manage data is: git requires limited file size and only text-based filetypes generate diffs.
The two biggest constraints when using git to manage data is: git requires limited file size and only text-based filetypes generate diffs.
Data File Size: When storing code and and data on GitLab, the files are stored in repositories. Because git and GitLab were designed for managing text-based code (which often has very small file size), git repositories tend to have limitations on their total storage. However, there is generally no limitation on the number of repositories one can have. So, this issue is usually circumvented by keeping each repository as small as possible. When using git, its important to know the maximum storage capacity on a repository, and try and segment projects and data into separate repositories when possible.
...
...
@@ -106,4 +111,4 @@ text-based filetypes, we will get useful diffs. We highlight GeoJSON, because th
GeoJSON
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The GeoJSON format is the standard for text-based geospatial data formats. GeoJSON files are just text, so we can visibly see any edits made to the files in GitLab as they are made. It is an open format, meaning it will work on all operating systems and with most GIS software. It is built on top of the already well-established JSON format, and has a number of helpful supporting packages that can be employed in programming languages such as Python and R. An important distinction between the GeoJSON format and other geospatial data formats, is that the coordinate reference system (CRS) of every GeoJSON is WGS 84 EPSG:4326. This means any file you convert to GeoJSON will have its CRS automatically converted to EPSG:4326, which is perfect for analysis. The above are some of the reasons we ask that any geospatial data you have be converted to GeoJSON before pushing it to GitLab. It is easy to convert geospatial data through either QGIS or the CID utility `GeoConvert <https://gitlab.nps.edu/CID/gis-utilities/geoconvert>`__.
\ No newline at end of file
The GeoJSON format is the standard for text-based geospatial data formats. GeoJSON files are just text, so we can visibly see any edits made to the files in GitLab as they are made. It is an open format, meaning it will work on all operating systems and with most GIS software. It is built on top of the already well-established JSON format, and has a number of helpful supporting packages that can be employed in programming languages such as Python and R. An important distinction between the GeoJSON format and other geospatial data formats, is that the coordinate reference system (CRS) of every GeoJSON is WGS 84 EPSG:4326. This means any file you convert to GeoJSON will have its CRS automatically converted to EPSG:4326, which is perfect for analysis. The above are some of the reasons we ask that any geospatial data you have be converted to GeoJSON before pushing it to GitLab. It is easy to convert geospatial data through either QGIS or the CID utility `GeoConvert <https://gitlab.nps.edu/CID/gis-utilities/geoconvert>`__.