From fc29fb1380d0a2414160984ba5df2c074607aeeb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Anna=20St=C3=B6riko?= <a.storiko@tudelft.nl> Date: Wed, 11 Sep 2024 16:16:59 +0200 Subject: [PATCH] Check and revise PA_1_3 - Fixed a few typos and slightly reformulated sentences I found unclear. - Renamed the variable `boolean` to `temperature_is_nan` to make the name more meaningful. - Ran all code cells (including solutions) to make sure that I do not get an error. - Adjusted whitespace in a few places to align it with Python conventions. --- ...a_Cleaning_and_Boosting_Productivity.ipynb | 80 +++++++++---------- 1 file changed, 40 insertions(+), 40 deletions(-) diff --git a/content/week_1_3/PA_1_3_Data_Cleaning_and_Boosting_Productivity.ipynb b/content/week_1_3/PA_1_3_Data_Cleaning_and_Boosting_Productivity.ipynb index 57fc87c1..35fdbbf8 100644 --- a/content/week_1_3/PA_1_3_Data_Cleaning_and_Boosting_Productivity.ipynb +++ b/content/week_1_3/PA_1_3_Data_Cleaning_and_Boosting_Productivity.ipynb @@ -28,7 +28,7 @@ "source": [ "This PA consists of two parts:\n", " - Data Cleaning, with task 1.1 - 2.6\n", - " - Boosting Productivity, with task ..." + " - Boosting Productivity, with task 3.1 - 3.10" ] }, { @@ -67,7 +67,7 @@ "source": [ "### Task 1: Importing and Cleaning the array\n", "\n", - "In a previous week we looked at how to read in data from a csv, plot a nice graph and even find the $R^2$ of the data. This week an eager botany student, Johnathan, has asked us to help him analyze some data: 1000 measurements have just been completed over the 100m of greenhouse and are ready to use in `data_2.csv`. Johnathan happens to have a lot of free time but not that much experience taking measurements. Thus, there is some noise in the data and some problematic data that are a result of an error in the measurement device. Let's help them out!" + "In a previous week we looked at how to read in data from a csv, plot a nice graph and even find the $R^2$ of the data. This week, an eager botany student, Johnathan, has asked us to help him analyze some data: 1000 measurements have just been completed over the 100 m of greenhouse and are ready to use in `data_2.csv`. Jonathan happens to have a lot of free time but not that much experience taking measurements. Thus, there is some noise in the data and some problematic data that are a result of an error in the measurement device. Let's help them out!" ] }, { @@ -124,7 +124,7 @@ "<div style=\"background-color:#AABAB2; color: black; vertical-align: middle; padding:15px; margin: 10px; border-radius: 10px\">\n", "<p>\n", "<b>Task 1.3:</b> \n", - " Check by defining a variable <code>boolean</code> using the numpy method <code>isnan</code>, which returns a boolean vector (False if it is not a NaN, and True if it is a NaN). The code block below will also help you inspect the results.\n", + " Check if there are NaN (not a number) values in the temperature array. You can use the numpy method <code>isnan</code>, which returns a boolean vector (False if it is not a NaN, and True if it is a NaN). Save the result in the variable <code>temperature_is_nan</code>. The code block below will also help you inspect the results.\n", "</p>\n", "</div>" ] @@ -136,10 +136,10 @@ "metadata": {}, "outputs": [], "source": [ - "boolean = YOUR_CODE_HERE #np.isnan(temperature)\n", + "temperature_is_nan = YOUR_CODE_HERE #np.isnan(temperature)\n", "\n", - "print(\"The first 10 values are:\", boolean[0:10])\n", - "print(f\"There are {boolean.sum()} NaNs in array temperature\")" + "print(\"The first 10 values are:\", temperature_is_nan[0:10])\n", + "print(f\"There are {temperature_is_nan.sum()} NaNs in array temperature\")" ] }, { @@ -147,7 +147,7 @@ "id": "c9f8994b", "metadata": {}, "source": [ - "Let's slice the array using the `boolean` array we just found to eliminate the NaNs. We can use the symbol `~`, which denotes the opposite: we want to keep those where np.isnan gives False as an answer." + "Let's slice the array using the `temperature_is_nan` array we just found to eliminate the NaNs. We can use the symbol `~`, which denotes the opposite: we want to keep those where np.isnan gives False as an answer." ] }, { @@ -157,7 +157,7 @@ "metadata": {}, "outputs": [], "source": [ - "temperature = temperature[~boolean]" + "temperature = temperature[~temperature_is_nan]" ] }, { @@ -200,7 +200,7 @@ "metadata": {}, "outputs": [], "source": [ - "distance.size==temperature.size" + "distance.size == temperature.size" ] }, { @@ -208,7 +208,7 @@ "id": "8a80b2d6", "metadata": {}, "source": [ - "Also, we don't know what the index of the removed values were, since we over-wrote `temperature`! Luckily we have our `boolean` array, which records the indices with Nans, which we can also use to update our `distance` array." + "Also, we don't know what the index of the removed values were, since we over-wrote `temperature`! Luckily we have our `temeprature_is_nan` array, which records the indices with Nans, which we can also use to update our `distance` array." ] }, { @@ -231,7 +231,7 @@ "metadata": {}, "outputs": [], "source": [ - "distance = YOUR_CODE_HERE #distance[~boolean]\n", + "distance = YOUR_CODE_HERE #distance[~temperature_is_nan]\n", "distance.size==temperature.size" ] }, @@ -252,10 +252,10 @@ "metadata": {}, "outputs": [], "source": [ - "plt.plot(distance, temperature, \"ok\", label = 'Temperature')\n", + "plt.plot(distance, temperature, \"ok\", label=\"Temperature\")\n", "plt.title(\"Super duper greenhouse\")\n", - "plt.xlabel('Distance')\n", - "plt.ylabel('Temperature')\n", + "plt.xlabel(\"Distance\")\n", + "plt.ylabel(\"Temperature\")\n", "plt.show()" ] }, @@ -283,9 +283,9 @@ "id": "9e8d396a", "metadata": {}, "source": [ - "The values are suspcious since they are +/-999...this is a common error code with some sensors, so we can assume that they can be removed from the dataset. We can easily remove these erroneous values of temperature, but this time we will use a different method than before. The explamation mark before an equals sig, `!=`, denotes \"not equal to.\" We can use this as a logic operator to directly eliminate the values in one line. For example:\n", + "The values are suspcious since they are +/-999...this is a common error code with some sensors, so we can assume that they can be removed from the dataset. We can easily remove these erroneous values of temperature, but this time we will use a different method than before. The exclamation mark before an equal sign, `!=`, denotes \"not equal to.\" We can use this as a logic operator to directly eliminate the values in one line. For example:\n", "```\n", - "array_1 = array_1[array_2!=-999]\n", + "array_1 = array_1[array_2 != -999]\n", "```" ] }, @@ -309,8 +309,8 @@ "metadata": {}, "outputs": [], "source": [ - "YOUR_CODE_HERE #distance = distance[temperature!=-999]\n", - "YOUR_CODE_HERE #temperature = temperature[temperature!=-999]" + "YOUR_CODE_HERE #distance = distance[temperature != -999]\n", + "YOUR_CODE_HERE #temperature = temperature[temperature != -999]" ] }, { @@ -328,7 +328,7 @@ "metadata": {}, "outputs": [], "source": [ - "print(distance.size==temperature.size)\n", + "print(distance.size == temperature.size)\n", "temperature.size" ] }, @@ -368,7 +368,7 @@ "metadata": {}, "outputs": [], "source": [ - "mask = YOUR_CODE_HERE #temperature!=999\n", + "mask = YOUR_CODE_HERE # (temperature != 999)\n", "distance = distance[mask]\n", "temperature = temperature[mask]" ] @@ -378,7 +378,7 @@ "id": "830e00fd", "metadata": {}, "source": [ - "The array is names \"mask\" because this process utilizes **masked arrays**...you can read more about it [here](https://python.plainenglish.io/numpy-masks-in-python-d8c13509fbc8)." + "The array is named \"mask\" because this process utilizes **masked arrays**...you can read more about it [here](https://python.plainenglish.io/numpy-masks-in-python-d8c13509fbc8)." ] }, { @@ -396,10 +396,10 @@ "metadata": {}, "outputs": [], "source": [ - "plt.plot(distance, temperature, \"ok\", label = 'Temperature')\n", + "plt.plot(distance, temperature, \"ok\", label=\"Temperature\")\n", "plt.title(\"Super duper greenhouse\")\n", - "plt.xlabel('Distance')\n", - "plt.ylabel('Temperature')\n", + "plt.xlabel(\"Distance\")\n", + "plt.ylabel(\"Temperature\")\n", "plt.show()" ] }, @@ -408,7 +408,7 @@ "id": "999da984", "metadata": {}, "source": [ - "Looks good! But wait---there also appear to be some values in the array that are not physically possible! We know for sure that there was nothing cold in the greenhouse during the measurements; also it's very likely that a \"0\" value could have come from an error in the sensor.\n", + "Looks good! But wait—there also appear to be some values in the array that are not physically possible! We know for sure that there was nothing cold in the greenhouse during the measurements; also it's very likely that a \"0\" value could have come from an error in the sensor.\n", "\n", "See if you can apply the `numpy` method `nonzero` to remove zeros from the array. Hint: it works in a very similar way to `isnan`, which we used above." ] @@ -465,8 +465,8 @@ "metadata": {}, "outputs": [], "source": [ - "YOUR_CODE_HERE #distance = distance[temperature<50]\n", - "YOUR_CODE_HERE #temperature = temperature[temperature<50]" + "YOUR_CODE_HERE #distance = distance[temperature < 50]\n", + "YOUR_CODE_HERE #temperature = temperature[temperature < 50]" ] }, { @@ -484,10 +484,10 @@ "metadata": {}, "outputs": [], "source": [ - "plt.plot(distance, temperature, \"ok\", label = 'Temperature')\n", + "plt.plot(distance, temperature, \"ok\", label=\"Temperature\")\n", "plt.title(\"Super duper greenhouse\")\n", - "plt.xlabel('Distance')\n", - "plt.ylabel('Temperature')\n", + "plt.xlabel(\"Distance\")\n", + "plt.ylabel(\"Temperature\")\n", "plt.show()" ] }, @@ -496,7 +496,7 @@ "id": "30bce92b", "metadata": {}, "source": [ - "Let's pretend there is a systematic error in our measurement device because it was not properly calibrated. It causes all observations below 15 degrees need to be corrected dividing the multiplying the measurement by 1.5. Numpy actually makes it very easy to change the contents of an array conditionally by replacement using the `where` method!" + "Let's pretend that there is a systematic error in our measurement device because it was not calibrated properly. As a result, all observations below 15 degrees need to be corrected by multiplying the measurement by 1.5. Numpy actually makes it very easy to replace the contents of an array based on a condition using the `where` method!" ] }, { @@ -507,7 +507,7 @@ "<div style=\"background-color:#AABAB2; color: black; vertical-align: middle; padding:15px; margin: 10px; border-radius: 10px\">\n", "<p>\n", "<b>Task 2.5:</b> \n", - " Play with the cell below to understand what the <code>where</code> method does (i.e., replacement)---it's very useful to know about!\n", + " Play with the cell below to understand what the <code>where</code> method does (i.e., replacement)—it's very useful to know about!\n", "</p>\n", "</div>" ] @@ -519,7 +519,7 @@ "metadata": {}, "outputs": [], "source": [ - "temperature = np.where(temperature>15, temperature, temperature*1.5)" + "temperature = np.where(temperature > 15, temperature, temperature * 1.5)" ] }, { @@ -539,10 +539,10 @@ "metadata": {}, "outputs": [], "source": [ - "plt.plot(distance, temperature, \"ok\", label = 'Temperature')\n", + "plt.plot(distance, temperature, \"ok\", label=\"Temperature\")\n", "plt.title(\"Super duper greenhouse\")\n", - "plt.xlabel('Distance')\n", - "plt.ylabel('Temperature')\n", + "plt.xlabel(\"Distance\")\n", + "plt.ylabel(\"Temperature\")\n", "plt.show()" ] }, @@ -609,7 +609,7 @@ "<div style=\"background-color:#AABAB2; color: black; vertical-align: middle; padding:15px; margin: 10px; border-radius: 10px\">\n", "<p>\n", "<b>Task 3.1:</b> \n", - " Download, install and login in the Visual Studio Live Share Extension from the Visual Studio Marketplace as explained in the <a href=\"https://mude.citg.tudelft.nl/2024/book/external/learn-programming/book/install/ide/vsc/vs_live_share.html\">book</a>\n", + " Download, install and login in the Visual Studio Live Share Extension from the Visual Studio Marketplace as explained in the <a href=\"https://mude.citg.tudelft.nl/2024/book/external/learn-programming/book/install/ide/vsc/vs_live_share.html\">book</a>.\n", "</p>\n", "</div>" ] @@ -619,7 +619,7 @@ "id": "2751b89a", "metadata": {}, "source": [ - "After installing and signing into Visual Studio Live Share, you'll share a project with yourself to test the collaboration session" + "After installing and signing into Visual Studio Live Share, you'll share a project with yourself to test the collaboration session." ] }, { @@ -656,7 +656,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "An invitation link will be automatically copied to your clipboard. You'll use this link to interact with yourself in this assignment. If you want to collaborate with other, you can share this link with other to open up the project in their browser on own VS Code.\n", + "An invitation link will be automatically copied to your clipboard. You'll use this link to interact with yourself in this assignment. If you want to collaborate with others, you can share this link with them to open up the project in their browser or own VS Code.\n", "\n", "You'll also see the **Live Share** status bar item change to represent the session state." ] @@ -720,7 +720,7 @@ "<div style=\"background-color:#AABAB2; color: black; vertical-align: middle; padding:15px; margin: 10px; border-radius: 10px\">\n", "<p>\n", "<b>Task 3.7:</b> \n", - " Go back to your browser version of VS code and try running the cell. Note that this requires Requesting access in the browser participant. Request that access and approve it in the desktop participant of VS code. In the desktop participant you now need to select your python environment. Does the cell run? Do you see the output in both participants? Make sure the output doesn't show an error!\n", + " Go back to your browser version of VS code and try running the cell. Note that this requires the browser participant to request access. Request that access and approve it in the desktop participant of VS code. In the desktop participant you now need to select your Python environment. Does the cell run? Do you see the output in both participants? Make sure the output doesn't show an error!\n", "</p>\n", "</div>" ] -- GitLab